SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning

Size: px
Start display at page:

Download "SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning"

Transcription

1 PREPRINT SmoothOut: Smoothing Out Shrp Minim to Improve Generliztion in Deep Lerning Wei Wen, Yndn Wng, Feng Yn, Member, IEEE, Cong Xu, Chunpeng Wu, Yirn Chen, Fellow, IEEE nd Hi (Helen) Li, Fellow, IEEE rxiv: v3 [cs.lg] 2 Dec 28 Abstrct In Deep Lerning, Stochstic Grdient Descent (SGD) is usully selected s trining method becuse of its efficiency; however, recently, problem in SGD gins reserch interest: shrp minim in Deep Neurl Networks (DNNs) hve poor generliztion; especilly, lrge-btch SGD tends to converge to shrp minim. It becomes n open question whether escping shrp minim cn improve the generliztion. To nswer this question, we propose SmoothOut frmework to smooth out shrp minim in DNNs nd thereby improve generliztion. In nutshell, SmoothOut perturbs multiple copies of the DNN by noise injection nd verges these copies. Injecting noises to SGD is widely used in the literture, but SmoothOut differs in lots of wys: () de-noising process is pplied before prmeter updting; (2) noise strength is dpted to filter norm; (3) n lterntive interprettion on the dvntge of noise injection, from the perspective of shrpness nd generliztion; (4) usge of uniform noise insted of Gussin noise. We prove tht SmoothOut cn eliminte shrp minim. Trining multiple DNN copies is inefficient, we further propose n unbised stochstic SmoothOut which only introduces the overhed of noise injecting nd de-noising per btch. An dptive vrint of SmoothOut, AdSmoothOut, is lso proposed to improve generliztion. In vriety of experiments, SmoothOut nd AdSmoothOut consistently improve generliztion in both smll-btch nd lrge-btch trining on the top of stte-of-the-rt solutions. Index Terms Deep Lerning, Neurl Networks, Shrp Minim, Generliztion, SGD. I. INTRODUCTION Stochstic Grdient Descent (SGD) is the dominnt optimiztion method used to trin Deep Neurl Networks (DNNs). However, the generliztion of DNNs needs more understnding. Recently, one observtion is tht lrge-btch SGD hs worse generliztion thn smll-btch SGD [][2][3]. The ccurcy difference between smll-btch trining nd lrgebtch trining is the well known generliztion gp [2]. Resons behind the generliztion gp re still under ctive reserch. Hoffer et l. [4] hypothesizes tht the process of SGD is similr to rndom wlk on rndom potentil [5]. This hypothesis ttributes generliztion gp to the limited number of prmeter updtes, nd suggests to trin more itertions. Lerning Rte Scling (LRS) ws lso proposed to mtch wlk sttistics to close the gp. Inspired by this hypothesis, prcticl techniques re proposed [3][6][7][8]. W. Wen is with the Deprtment of Electricl nd Computer Engineering, Duke Univerisity, Durhm, NC, 2778 USA e-mil: (see Y. Wng is with University of Pittsburgh. F. Yn is with University of Nevd Reno. C. Xu is with HP Lbs. C. Wu, H. Li nd Y. Chen re with Duke Univerisity. Mnuscript for review. Another ppeling hypothesis, which rouses recent reserch interest, is tht the generliztion is ttributed to the fltness of minim [9][]; tht is, flt minim hve good generliztion while shrp minim cn worsen it. The hypothesis cn be pplied to both smll-btch nd lrge-btch SGD, but lrge-btch SGD tends to converge to shrper minim, ending up with the generliztion gp. Shrp minim hve bd generliztion due to their high over-fitting to trining dt [9][] nd high sensitivity to noises []. Jstrzębski et l. [2] showed the connection between these two hypotheses: LRS motivted by rndom wlk leds to fltter minim nd helps to improve the generliztion. Our pproch is bsed on the second hypothesis, trgeting on escping shrp minim for better generliztion in both smll-btch nd lrge-btch SGD. Moreover, our pproch cn enhnce techniques inspired by the first hypothesis nd further improve generliztion. Keskr et l. [2] ttempted to escpe shrp minim through dt ugmenttion, conservtive trining nd dversril trining. However, ll trils do not completely remedy the problem [2], leving how to void shrp minim s n open question. We propose SmoothOut to smooth out shrp minim nd guide the convergence of SGD to fltter regions. SmoothOut slightly perturbs DNN function by noise injecting or function reshping, then verges ll perturbed DNNs. Becuse shrp minim re sensitive to perturbtion, slight perturbtion cn result in significnt function increse t ech shrp minimum, which mens the verged vlue will be high. In this wy, shrp minim cn be eliminted. Conversely, smll perturbtion only influences the mrgin of ech flt region nd the flt bottom still ligns well with the originl bottom. Averging ligned bottoms cn mintin the originl minimum. Beyond this intuition, we prove tht SmoothOut under uniform noises cn eliminte shrp minim while mintining flt minim. Note tht we mjorly use uniform noise for study s it is well motivted, but other noise types like Gussin noise cn fit into our new SmoothOut frmework. Moreover, trining over mny perturbed DNNs for verging is computtion intensive. We propose Stochstic SmoothOut, which injects noise per itertion during SGD. We prove tht Stochstic SmoothOut is equivlent to the originl SmoothOut in expecttion. Adptive SmoothOut AdSmoothOut, is lso proposed to further improve generliztion by dpting noise strength to filter norm. Our experiments show tht SmoothOut nd AdSmoothOut cn help to escpe shrp minim nd improve the generliztion. SmoothOut nd AdSmoothOut re esy to be implemented nd our code is t

2 PREPRINT 2 II. RELATED WORK Shrp Minim nd Generliztion. Why deep neurl networks generlize well still needs deeper understnding [3]. As forementioned, one hypothesis is tht SGD finds flt minim which cn generlize well [9][2]. As Hochreiter et l. [9] pointed out, bsed on Minimum Description (Messge) Length [][4] theory, fltter minimum cn be encoded in fewer bits which indictes simpler DNN model for better generliztion. Alterntive explntions re bsed on Byesin lerning [5][6]. Dinh et l. [7] further rgue tht current definitions of shrpness re problemtic nd redefinition is required for explntion. However, Keskr et l. [2] indeed found tht lrge-btch trining sticks to shrp minim nd hs bd generliztion. Different from previous work, we focus on new vrints of SGD to escpe shrp minim. We find tht our method not only cn escpe shrp minim during lrge-btch trining but lso guide smll-btch trining to fltter ones, therefore, improving generliztion in both cses. Chudhri et l. [] proposed Entropy-SGD which mximizes locl entropy to bis SGD to flt minim ( wide vlleys ). Locl entropy ws constructed by building connections between Gibbs distribution nd optimiztion problems. The grdients of locl entropy ws estimted by Lngevin dynmics [8], which is computtion intensive. Compred with Entropy-SGD, our SmoothOut is more efficient since noise injection nd de-noising is the only overhed. This enbles SmoothOut to scle to lrger dtset like ImgeNet [9]. Moreover, SmoothOut consistently improves generliztion in ll experiments, while Entropy-SGD chieved comprble generliztion error. Izmilov et l. [2] proposed Stochstic Weight Averging (SWA), which records the newest prmeter points long the trjectory of SGD nd then simply verges them to get the finl optimum. Compring with SWA, SmoothOut performs stochstic verging over perturbed models; SWA relies on pre-trining to converge ner to minim, while SmoothOut cn trin from scrtch; moreover, it is unnswered if SWA cn improve generliztion in lrge-btch SGD, while SmoothOut cn improve it both experimentlly nd theoreticlly. Noise Injection. Noise injection is commonly used method in SGD [2][22][23][24][25][26] nd Byesin neurl networks [27][28][29], where usully Gussin noises re injected to prmeters or grdients for explortion or distribution pproximtion. Differently, our method is motivted by eliminting shrp minim, which leds to some key differences: () de-noising process is pplied before prmeter updting; (2) noise strength is dpted to filter norm; (3) n lterntive interprettion on the dvntge of noise injection, from the perspective of shrpness nd generliztion; (4) usge of uniform noise insted of Gussin noise. Our experiments will show tht uniform noise is superior to Gussin noise. Moreover, our SmoothOut frmework is gnostic noise types Any noise type cn fit to the frmework nd my chieve the gol. We dopt uniform noise s mjor study becuse it is well motivted s will be shown in Section III. Dropout [3] is populr method to void over-fitting nd include uncertinty [3] by rndomly drop neurons, however, lrgebtch trining with Dropout still hs the generliztion gp s shown experimentlly. The reson is: s Keskr et l. [2] observed, shrp region only expnds in smll dimensionl subspce nd most directions re flt; however, Dropout only perturbs subspce such tht the shrp directions cnnot be frequently perturbed; conversely, our method effectively perturbs the whole spce including shrp direction. We will explin the connections between Dropout nd our method. Lrge-btch SGD. Lrge-btch SGD is loosely relted work becuse SmoothOut is generl SGD pproch. However, s shrp minim in lrge-btch SGD become severer [2] nd ccurcy loss is generlly observed, n ctive line of reserch focuses on overcoming the generliztion gp (ccurcy loss). Hoffer et l. [4] suggest to trin more epochs, however, trining more epochs consumes more time. Some heuristic techniques were proposed to close the gp without prolonging epochs. Those techniques include liner lerning rte scling [3], wrm-up trining [3][8], Lyerwise Adptive Rte Scling [7] nd others [32]. However, without theoreticl support, it is uncler to wht extent those methods cn generlize. For exmple, liner lerning rte scling nd wrm-up trining cnnot generlize to CIFAR- [4] nd other rchitectures on ImgeNet [7]. Compred with those techniques, our SmoothOut is n interpretble solution supported by the shrp minim hypothesis. More importntly, our experiments show tht SmoothOut cn further improve the ccurcy when combined with those stte-of-thert techniques. III. SmoothOut: PRINCIPLES, THEORY AND IMPLEMENTATION We first introduce our SmoothOut method nd its principles in Section III-A. To reduce computtion complexity, Stochstic SmoothOut is proposed in Section III-B; we prove tht Stochstic SmoothOut is n unbised pproximtion of deterministic SmoothOut. Section III-C implements Stochstic SmoothOut in bck-propgtion of DNNs. At lst, n dptive vrint AdSmoothOut, is introduced. A. Principles: Averging Perturbed Models Smooths Out Shrp Minim As [2] studied, shrp minim hve lrge generliztion gps, becuse smll distortion/shift of testing function from trining function cn significntly increse testing loss even though current prmeter is minimum of the trining function. Our optimiztion gol is to encourge convergence to flt minim for more robust models. Our solution is derived from the sensitivity nture of shrp minim. We intentionlly inject noises into the model to smooth out shrp minim. The concept is illustrted in Figure ()(b). We define w s point in the prmeter spce, C(w) s the trining loss function nd C(w; Θ) s perturbtion of C(w). C(w; Θ) is prmeterized by both w nd Θ, where Θ is rndom vector to generte the perturbtion. Insted of minimizing C(w), we propose to minimize { } C(w) = E C(w; Θ) N C(w; θ i ) () N i= Figure in their pper [2] illustrtes this conception.

3 PREPRINT 3 Trining Function !(#) Flt Shrp 5 Bsis Prmeter spce w () Trining loss function !&(#; ( ) )! (#) Mintined Smoothed out 5 Averged Prmeter spce w (b) SmoothOut Dtset Dtset Duplicte!(# + % & )!(# + % ( )!(# + % ) ) 2 N (c) The first version of SmoothOut Dt x t Averging *(+,, # + %, ) (d) The stochstic version of SmoothOut Figure : Illustrtion nd frmework of SmoothOut. () The trining loss function of the bsis model w.r.t. prmeter w. (b) Ech thin curve represents perturbed model; there re totlly 24 perturbtions, but only four re plotted for clener visuliztion; the perturbtion is done by slightly shifting the bsis model in (); the shift distnce is rndomly drwn from uniform distribution; the green curve is new model by verging over ll perturbtions. (c) The first version of proposed SmoothOut frmework in Eq. (). (d) The Stochstic SmoothOut which rndomly perturbs prmeter w t ech btch. to find optiml w for C(w), where θ i is smple of Θ nd N is the number of smples. For simplicity, we ssume C(w) hs one flt minimum w f nd one shrp minimum w s, but the discussion cn be generlized to C(w) with multiple flt nd shrp minim. Our gol is to design n uxiliry function C(w) such tht its minimum within the originl flt region cn pproximte w f, by stisfying the Flt Constrint ( ) rg min C(w) wf ϕ, (2) w D(w f,τ) menwhile the shrp w s is smoothed out, by stisfying the Shrp Constrint ( ) min C(w) mx (C(w)) D(w s,ε) D(w s,ε) ( ) (3) > min C(w), D(w f,τ) where D(w, ς) represents region round w, being constrined s D(w, ς) = {w R m : (w w) i ς, i {...m}}. (4) When ϕ is smll nd τ is lrge, Inequlity (2) ensures tht the uxiliry function C(w) mintins the minimlity of C(w) in the flt region; in the extreme cse of ϕ = nd τ, the minimum of C(w) is exctly wf. Conversely, ner the originl shrp region, Inequlity (3) ensures tht minimlity of C(w) is eliminted when ε is reltively lrge, becuse mx D(ws,ε) (C(w)), the lower bound of C(w), increses rpidly by slightly incresing ε round the shrp minimum; in the extreme cse of ε, the lower bound is the mximum of C(w). In nutshell, good design of C(w) llow smll ϕ, lrge τ nd lrge ε. In this wy, minimiztion process of C(w) will skip w s nd converge to w f. It is infesible to find n optiml C(w) which minimizes ϕ nd mximizes τ nd ε, especilly when C(w) is deep neurl network. However, we find tht, under the Uniform Perturbtion C(w; Θ) = C(w + Θ) where Θ i i.i.d. U(, ) nd i {, 2,..., m}, (5) C(w) cn well perform the purpose. U(, ) is uniform distribution within rnge of [, ]. In this cse, we hve bused C(w; ) s C(w) in the nottion for simplicity. In the Appendix A, we prove tht, under Uniform Perturbtion, pproprite ϕ, τ nd ε cn be found to stisfy Flt Constrint nd Shrp Constrint: Theorem. When C(w) is symmetric in D(w f, τ) τ>, the minimum of ϕ is to stisfy the Flt Constrint when C(w) is generted under the Uniform Perturbtion. Theorem 2. Suppose C(w) is high dimensionl (w R m, m ) nd is symmetric nd strictly monotonic in D(w s, b) b>, then such tht Shrp Constrint is stisfied with ε when C(w) is generted under the Uniform Perturbtion. In theorems, the symmetry is ssumed only ner minim, nd the loss surfce does not hve to be symmetric in the whole spce. By referring to the visuliztion of loss lndscpes of neurl nets in [33], it is resonble to mke this ssumption ner minim. Besides the rigor proof in the Appendix A, SmoothOut cn be explined from the perspective of signl processing: imgining the prmeter spce s time domin nd the function s signls, then verging is low-pss filter which elimintes high-frequency signls (shrp regions) while mintins lowfrequency signls (flt regions). Figure (c) illustrtes the frmework of the proposed SmoothOut in SGD. All models shre the sme prmeter w. Before trining strts, the i-th model is independently perturbed by θ i ; during trining, ll θ i re fixed nd n identicl btch of dt is sent to ll models for trining. Becuse lrge N is required for pproximtion in Eq. (), the computtion complexity nd memory usge will be very high, especilly when C(w) is deep neurl network. In the next section, the Stochstic SmoothOut will be proposed to solve this issue. B. Theory: Stochstic SmoothOut is Unbised To reduce the computtion complexity nd memory usge of SmoothOut in Figure (c), Stochstic SmoothOut is proposed

4 PREPRINT 4. Vl loss Trin loss Vl ccurcy Trin ccurcy α () Shrpness visuliztion SB LB Trin loss SB Vl loss SB Trin loss LB Vl loss LB 6 Trin ccu. SB Vl ccu. SB 5 Trin ccu. LB Vl ccu. LB (b) Sensitivity to noise Figure 2: Nottion: SB : Smll Btch (256); LB : Lrge Btch (5); ccu. : ccurcy. () loss nd ccurcy vs. α, which controls w long the direction from SB minimum (w f ) to LB minimum (w s ); (b) loss nd ccurcy under influence of different strengths of noise. Dtset: CIFAR-. Network: C in [2] implemented by [4]. Optimizer: Adm with. initil lerning rte. in this section s shown in Figure (d). Insted of using multiple perturbed models to lern from identicl dt, only one model is trined. At the t-th btch of trining dt x t, the prmeter w t is first perturbed to w t + θ t nd then x t is fed into the model to clculte the loss function. We cn prove tht, in both frmeworks, the outputs cn pproximte C(w) without bis. Formlly, in Figure (c), the expecttion of the output is { } N E θ...n C(w + θ i ) N (6) i= = E Θ {C(w + Θ)} = C(w). In online lerning systems [34] like Figure (d), the dt x t is independently generted from rndom distribution nd its online loss is obtined by model Q(x t, w); the finl loss function to minimize is the expecttion of online loss under dt distribution, i.e., Accurcy (%) Accurcy % C(w) E X {Q(x, w)}. (7) Therefore, in Figure (d), the expecttion of the output is E {Q(x t, w + θ t )} = E Θ {E X {Q(x, w + Θ)}} = E Θ {C(w + Θ)} = C(w). Consequently, both frmeworks in Figure cn pproximte C(w) = E{ C(w; Θ)} in Eq. (), but Stochstic SmoothOut (8) is much more computtion efficient. The only overhed of Stochstic SmoothOut is noise injection nd denoising s will be shown. In the following sections, without explicit clrifiction, SmoothOut will refer to the stochstic version in Figure (d). The reson why SmoothOut cn eliminte shrp minim is tht C(w s ) is more sensitive to noise thn C(w f ), nd we expect C(ws ) increses fster thn C(w f ) s the noise strength increses from. To verify this, we first trin DNN under smll btch size to get flt minimum w f ; second, w = w f is deployed into the frmework in Figure (d); third, the whole trin/vlidtion dtset is fed to the frmework in btch size of, nd t ech btch, the prmeter is perturbed to w = w f + θ t ; finlly, the losses re verged over ll btches to estimte C(w f ). The sme process is done using lrge btch size for the sme DNN to estimte C(w s ). We scn in rnge to test the sensitivity of C(wf ) nd C(w s ) to perturbtion. Figure 2() visulizes the shrpness of C(w) round w f nd w s, using the technique dopted in [2] which ws originlly proposed in [35]. In Figure 2(), ech point on the loss curve is (w +, C(w + )) where w + = α w s + ( α) w f. The visuliztion is consistent with [2], which concluded tht lrge-btch trining converges to shrp minim. Figure 2(b) nlyzes the sensitivity. For both trining nd vlidtion dtsets, C(ws ) indeed increses fster thn C(w f ) s increses. The ccurcy curves hve similr trend. Sensitivity nlyses of more DNNs nd more dtsets re included in the Appendix B. Therefore, side outcome of this work is tht we cn use s = ( C(w ; ) ) (9) s metric to mesure the shrpness of C(w) t minimum w. A lrger s mens shrper minimum. At lst, under our frmework, we cn view Dropout s n noise under Bernoulli distribution dpting its noise strength to the corresponding weight. Concretely, in Figure (d), θ ti = w i with probbility p nd θ ti = with probbility p, where p is the dropout rtio. Under this view, Dropout cn fit into our frmework, but it cnnot guide the convergence to shrp minim, becuse the strength of noise θ ti = w i is too lrge. C. Implementtion: Bck-propgtion with Perturbtion nd Denoising Algorithm SmoothOut in Bck Propgtion Input : Trining dtset X, totl itertions T, model Q(x, w) with initil prmeter w = w : for t {,..., T } do 2: Rndomly smple btch dt x t from X i.i.d. 3: Perturbtion: w t = w t + θ t where θ ti U(, ) 4: Bck-propgtion: g t = Q(x t,w t ) w t 5: Denoising: w t = w t θ t 6: Updting: w t+ = w t η t g t 7: end for Output: The model Q(x, w) with finl prmeter w = w T

5 PREPRINT 5 w " = w " + θ t x t Q x t,w " Q x t,w " g t w " () Perturbtion (2) Bck-propgtion itertion t (4) Updting (3) Denoising fixing the strength. Mthemticlly, suppose w (i) is vector of prmeters in filter i nd θ (i) is noise vector, then dpted noise ˆθ (i) will be ˆθ (i) = w(i) 2 θ (i) 2 θ (i), (2) where controls the strength of noises. Adptive noise is nother key difference from noise injection in previous work. Our bltion study will show dptive noise is more effective in improving generliztion. w ",- = w " η " g t w " = w " θ t Figure 3: SmoothOut in BP. In Figure (d), the grdient to updte prmeter t itertion t is g t = Q(x t, w + θ t ) w w=wt = w Q(x t, w) w=wt+θ t (w + θ () t) w. Therefore, the prmeter is updted s w t+ = w t η t w Q(x t, w) w=wt+θ t, () where η t is the lerning rte nd the grdient is obtined by bck propgtion when the prmeter vlue is w t + θ t. Thus, SmoothOut cn be implemented s Algorithm s illustrted in Figure 3. This revels pitfll in implementtion tht the noise θ t dded to w t must be denoised before pplying the grdient, which is lso key difference from existing noise injection pproches [2][22][23][24]. As shown in Figure 3, the only overhed of SmoothOut is dding nd subtrcting noises, which is much more efficient thn trining multiple DNNs in Figure (c). Note tht, lthough Algorithm is proposed in the context of vnill SGD, it cn be extended to SGD vrints by simply utilizing the grdient g t for momentum ccumultion, lerning rte dpttion, nd so on. D. Adptive SmoothOut AdSmoothOut Due to the fct tht the weight distributions cross ll lyers vry lot, dding noise with constnt strength to ll weights my over-perturb the lyers with smll weights while underperturb others. The vrying distribution is lso the source of problem in visulizing the shrpness s pointed out in [33]. To overcome this, [33] proposed filter normliztion nd chieved more ccurte visuliztion. Inspired by filter normliztion, in SmoothOut, the noises dded to filter re linerly scled by l 2 norm of the filter. In fully-connected lyers, the noises re scled per neuron, i.e., ll input connections of ech neuron form vector nd noises re divided by l 2 norm of the vector. We cll it Adptive SmoothOut (AdSmoothOut) becuse it dpts the strength of noises to the filters insted of IV. EXPERIMENTS We evlute SmoothOut in MNIST [36], CIFAR- [37], CIFAR- [37] nd ImgeNet [9] dtset. SmoothOut nd AdSmoothOut re evluted in smll-btch ( SB ) SGD nd lrge-btch ( LB ) SGD. (C ɛ, A)-shrpness [2] is utilized to mesure the shrpness of minimum, which is solved using L- BFGS-B lgorithm [38]. In solving (C ɛ, A)-shrpness, the fullspce (i.e., A = I n ) in the bounding box C ɛ (with ɛ = 5 4 ) is explored to find the mximum for mesurement. As L- BFGS-B is n estimtion lgorithm nd my fil to find the exct mximum vlue, vrince in mesurements is observed. We run 5 experiments for ech mesurement, nd use the mximum s the finl shrpness metric. Unlike [2] which verged over 5 runs, ours is more resonble becuse (C ɛ, A)- shrpness is bsed on mesuring the mximum vlue round the box. In trining with SmoothOut, is the only dditionl hyper-prmeter to tune, which controls the strength of noise. is very robust becuse of the width of flt minim. More concretely, is.375 in ll experiments of SmoothOut in Tble I nd Tble II. We believe the vlue of is network rchitecture dependent (i.e., loss function dependent). We cross-vlidte it in smll-btch SGD nd directly use it in lrge-btch SGD without further tuning, nd it generlizes well nd improves ccurcy in both smll-btch SGD nd lrgebtch SGD. A. Convergence to Fltter Minim We first dopt benchmrks by [2] to verify tht SmoothOut cn effectively guide both SB nd LB SGD to fltter minim nd thus improve the generliztion (ccurcy). The comprison is in Tble I. Figure 4 visulizes nd compres the shrpness of bseline (C 3 ) nd SmoothOut. Similr visuliztion results for F nd C cn be found in the Appendix B. Note tht Keskr et l. [2] did not trget on chieving stte-of-thert ccurcy but studying the chrcteristics of minim, nd we simply follow this purpose. Comprison in stte-of-the-rt models will be covered in Section IV-B. In Tble I nd Figure 4, we observed consistency mong shrpness, visuliztion, nd generliztion, tht is, smller (C ɛ, A)-shrpness, then fltter region in the visuliztion nd higher ccurcy. More importntly, the results indicte tht () compring with SB trining, LB trining converges to shrper minim with worse generliztion, but SmoothOut cn guide it to converge to fltter minim nd closes the gp or even improves the ccurcy; (2) the shrp minim problem lso exist in SB trining s shown in Figure 4(),

6 PREPRINT 6 Tble I: Shrpness reduction nd generliztion improvement for DNNs in [2]. DNN Dtset Btch size Bseline SmoothOut Improvement F C C 3 MNIST CIFAR- CIFAR- (C ɛ, A)-shrpness chnge bseline SmoothOut % 98.6%.2% % 98.42%.4% % 8.72% 2.5% % 8.34% 3.4% % 5.7% 3.8% % 48.43% 4.6% () SB bseline SB SmoothOut () Accurcy (%). 2 Noise Bseline trin loss..2.3 AdSmoothOut trin loss Bseline trin ccurcy AdSmoothOut trin ccurcy (b) LB bseline Figure 4: Shrpness of bseline nd SmoothOut in () SB trining nd (b) LB trining of C 3. LB SmoothOut (b) Bseline vl loss..2.3 AdSmoothOut vl loss Bseline vl ccurcy AdSmoothOut vl ccurcy Figure 5: Shrpness visuliztion by filter normliztion [33] using () trining dtset nd (b) vlidtion dtset. The DNN is ResNet44 trined by CIFAR- using bseline [4] nd AdSmoothOut but SmoothOut cn reduce the shrpness nd improve the ccurcy; (3) shrp minim problem is severer in LB trining such tht SmoothOut cn improve more. At lst, we rgue tht the convergence of our method is stble lthough noises re injected; tht is, different runs converge to similr ccurcy under the sme strength of injected noises. More specific, for C in Tble I, ccurcy stndrd devition is ±.33%, ±.2%, ±.24% nd ±.3% in smll-btch bseline, smll-btch SmoothOut, lrge-btch bseline nd lrge-btch SmoothOut, respectively. B. Improving Generliztion on the Top of Stte-of-the-rt Solutions In this section, we evlute our method by stte-of-thert DNNs, including ResNet44 on CIFAR- nd CIFAR-, AlexNet [39] nd ResNet8 [4] on ImgeNet. Tble II summrizes ll the results. As generliztion issue is severer in LB trining, we focus on LB trining in this section. There re lots of proposed techniques to relieve the generliztion issue in LB trining [4][3][32][8][7][6], however, our method is orthogonl nd we simply pply SmoothOut on the top of them to verify if SmoothOut cn be combined with those stte-of-the-rt solutions. We re not ble to duplicte ll of those techniques, but we select Lerning Rte Scling (LRS), Ghost Btch Normliztion (GBN) nd Trining Longer (TL) techniques [4][3] s the representtives. For LRS, [3] used liner LRS (i.e. lerning rte is scled linerly w.r.t. the btch size), while [4] used squre root LRS. The preferble LRS rule is dependent on the dtset nd DNN [4]. In our experiments, liner LRS 2 is preferble for ResNet44 on CIFAR- nd squre root LRS is preferble for the others. For TL, we simply double the trining epochs for ech lerning rte. In the experiments of ResNet44 on CIFAR- nd CIFAR-, we pplied GBN, TL (4 epochs), liner LRS or squre root LRS in the bselines, so tht we cn diversify the setups to evlute our method. In ll setups, SmoothOut improves generliztion on the top of the GBN, TL nd LRS, verifying tht our method is orthogonl to stte-of-the-rt solutions. More importntly, the AdSmoothOut vrint hs the best generliztion in ll experiments on CIFAR- nd CIFAR- 2 Wrm up pre-trining is not dopted in our experiments for net evlution.

7 PREPRINT 7 Tble II: SmoothOut nd AdSmoothOut improve the stte-of-the-rt bselines. DNN Dtset Btch size Epochs LRS Method Accurcy ResNet44 CIFAR squre root ResNet44 CIFAR- 24 AlexNet ImgeNet squre root 4 squre root 4 liner 6 squre root 2 squre root ResNet8 ImgeNet squre root Bseline 9.2% SmoothOut 9.95% AdSmoothOut 92.63% Bseline 67.23% SmoothOut 68.68% AdSmoothOut 7.9% Bseline 68.62% SmoothOut 7.% AdSmoothOut 72.39% Bseline 7.2% SmoothOut 7.67% AdSmoothOut 72.85% Bseline 47.64% AdSmoothOut 52.53% Bseline 54.24% AdSmoothOut 55.5% Bseline 66.75% AdSmoothOut 67.% Tble III: SmoothOut without de-noising tested on CIFAR-. DNN Btch size SmoothOut SmoothOut w/o de-noising C % 36.52% % 46.5% ResNet % 27.57% Noise strength is.375 in ll experiments., showing the necessity of dptive noises. Therefore, we choose AdSmoothOut s the representtive in ImgeNet for fster development. As AdSmoothOut is one type of regulriztions by stochstic model verging, the regulriztions by weight decy nd dropout re not dopted in trining ImgeNet, such tht we cn reduce the number of hyperprmeters. The top- ccurcy of AlexNet in SB trining is 56.5% with the btch size of 256, however, in LB trining with the btch size of 6384, the ccurcy drops to 47.64% if trined by the sme epochs. TL indeed cn improve the generliztion to 54.24%. More importntly, AdSmoothOut improves the ccurcy in both cses, i.e., improving 4.89% when TL is not pplied nd improving.27% on the top of TL. Lst but not the lest, our method lso chieve improvement on ResNet8 on the ImgeNet. At the end, we visulize the shrpness of minim by filter normliztion visuliztion [33], AdSmoothOut indeed converges to fltter region s shown in Figure 5. C. Abltion Study ) The necessity of de-noising: One of our contributions is the de-noising process. We perform n bltion study by removing the de-noising process to test its necessity. We rerun ll CIFAR- SmoothOut experiments in Tble I nd Tble II, but without de-noising. We use the sme noise strength for comprison. The results re summrized in Tble III. Without Tble IV: Accurcy with nd without de-noising tested by ResNet44 on CIFAR- with the sme setting in Tble II. Method Accurcy Bseline 9.2% SmoothOut % SmoothOut w/o de-noising. 9.5% AdSmoothOut % AdSmoothOut w/o de-noising % de-noising, the ccurcy significntly drops. The reson is strightforwrd: strong noises mke originl prmeters nd grdients less ccurte nd deteriorte convergence, but our grdients re exctly the grdients of uxiliry function C(w) nd perturbed prmeters re recovered before pplying grdients. For fir comprison, we further crefully tune the noise strength in SmoothOut nd AdSmoothOut w/o de-noising to get ner optiml ccurcy. More specific, we hve to decrese s SGD is more sensitive to noises when denoising is not pplied. The results re summrized in Tble IV. Without de-noising, ccurcy is lower. Note tht the de-noising process is nturlly generted by our frmework nd theory in Section III; without de-noising, it will not fit into our frmework nd the optimiztion trget will not be the uxiliry function C(w). 2) Gussin noise vs. uniform noise: Another contribution is the generic SmoothOut nd AdSmoothOut frmework, which is gnostic to the type of noises. We mjorly used uniform noise for study s it is well motivted, but ny type of noises cn fit into our frmework. As Gussin noise is brodly used in the literture [2][22][23][24][25][26][29], we perform n bltion study here by replcing uniform noise with Gussin noise, for the purpose of verifying our frmework is

8 PREPRINT 8 Tble V: Comprison between uniform nd Gussin noises tested on CIFAR- with the sme setting in Tble II. Method Accurcy Bseline 9.2% SmoothOut (uniform) % SmoothOut (Gussin) % AdSmoothOut (uniform) % AdSmoothOut (Gussin) % Uniform noise only. 9.5% Gussin noise only % noise gnostic nd nswering how performnce chnges when the noise type lters. The results re in Tble V, where, in injecting Gussin noises, is the stndrd derivtion. Tble V indictes tht both uniform nd Gussin noises improve generliztion in SmoothOut nd AdSmoothOut, verifying they re gnostic to noise types; uniform noise is superior to Gussin noise. A intuitive explntion is tht Gussin distribution gives high probbility to verge over vlues ner the minimum, nd thus hs smller probbility to smooth out shrp minim. However, uniform distribution evenly trets vlues round the minimum, nd cn eliminte the minimum when it is shrp. We do not ggressively conclude tht uniform noise will lwys be superior in ll settings, but leving noise selection s n building block when using our frmework. smller generliztion improvement is observed if only injecting noises into prmeters without using our frmework (s shown by the noise only experiments). V. CONCLUSION In this pper, we propose SmoothOut nd AdSmoothOut frmework to escpe shrp minim during SGD trining of Deep Neurl Networks, for better generliztion. SmoothOut nd AdSmoothOut build n uxiliry optimiztion function without shrp minim, utilizing noise injection. Although noise injection ws brodly used in the literture, we interpret the dvntge of noise injection from new perspective of generliztion nd shrpness. Moreover, our frmework dvnces in multiple wys: () de-noising is pplied fter noise injection; (2) noise strength is dptive to filter norm in AdSmoothOut; (3) uniform noise is mjorly dopted for study nd cn be superior to Gussin noise in some cses. A comprehensive bltion study is conducted to prove the necessity of those three dvnces. In the future, we will extend SmoothOut nd AdSmoothOut to Recurrent Neurl Networks, ttention-bsed models nd very deep Convolutionl Neurl Networks. APPENDIX A PROOF OF THEOREM AND THEOREM 2 Proof of Theorem : Proof. As D(w, ) is defined s box centering t w with size 2, i.e., D(w, ) = {w R m : (w w) i, i {...m}}, (3) then, under Uniform Perturbtion, { } C(w; Θ) C(w) = E = (2) m D(w,) = E {C(w + Θ)} C(w )dw... dw m C(w) = w i (2) m ( C(w ) w i =wi+ C(w ) ) w i =w i dw \i, D(w \i,) where nd (4) (5) w \i [w,, w i, w i+,, w m ] T R m (6) dw \i dw... dw i dw i+... dw m (7) When C(w) is symmetric bout w f in D(w f, τ) such tht, i, cut long w i = (w f ) i + nd cut long w i = (w f ) i get the sme function in the subspce w \i, then C(w f ) = ; tht is, the Flt Constrint stisfies with ϕ =. The optiml ϕ nd τ re determined by the symmetry of the flt region. ϕ my be relxed to lrger vlue when the symmetry is broken; however, within flt region, lrger ϕ my only slightly increse C(w ). Proof of Theorem 2: Proof. Suppose C (s) ε minimum, i.e., is the mximum vlue ner the shrp C (s) ε = mx (C(w)), (8) D(w s,ε ) s C(w) is strictly monotonic in D(w s, b), we hve, ε < < b, min (C(w)) > C(s) D(w s,)\d(w s,ε ε, (9) ) where D(w s, )\D(w s, ε ) is Set Difference, notting domin within D(w s, ) but outside of D(w s, ε ). Then, follow the proof of Theorem, we hve ( ) min C(w) = D(w s,ε) ε<b (2) m ( (2) m ( ( ε = Becuse of lim m (( (2) m C (s) ε ) m ) C (s) ε + ( ε ( ε C(w )dw... dw m D(w s,) ( )) (2ε ) m C (s) ε C(w s) ) m C(w s ). (2) ) m ) C (s) ε + ( ε ) m C(w s )) = C (s) ε, (2)

9 PREPRINT 9. Vl loss Trin loss Vl ccurcy Trin ccurcy 9 SB α LB () Shrpness visuliztion Accurcy (%) Trin loss SB Trin loss LB Trin ccu. SB Trin ccu. LB Vl loss SB Vl loss LB Vl ccu. SB Vl ccu. LB (b) Sensitivity to noise Figure 6: Nottion: SB : Smll Btch (256); LB : Lrge Btch (5); ccu. : ccurcy. () loss nd ccurcy vs. α, which controls w long the direction from SB minimum (w f ) to LB minimum (w s ); (b) loss nd ccurcy under influence of different strengths of noise. Dtset: CIFAR-. Network: C 3. The optimizer is Adm with. initil lerning rte Accurcy % Trin loss SB Trin loss LB Trin ccu. SB Trin ccu. LB Vl loss SB Vl loss LB Vl ccu. SB Vl ccu. LB () Sensitivity to noise (ResNet-44 on Cifr-) Accurcy (%) Trin loss SB Trin loss LB Trin ccu. SB Trin ccu. LB Vl loss SB Vl loss LB Vl ccu. SB Vl ccu. LB (b) Sensitivity to noise (ResNet-44 on Cifr-) Figure 7: nd ccurcy of ResNet-44 under influence of different strengths of noise on () CIFAR- nd (b) CIFAR-. The optimizer is SGD with momentum.9. Nottion: SB : Smll Btch (28); LB : Lrge Btch (248 for CIFAR- nd 24 for CIFAR-); ccu. : ccurcy Accurcy (%) in high dimensionl models (like deep neurl networks), we cn find ε ε (left limit) to stisfy ( ) min C(w) > C (s) ε mx (C(w)). (22) D(w s,ε) D(w s,ε) In the flt region, min D(w f,τ) ( C(w) ) = (2) m < (2) m D(w f,) C(w )dw... dw m ( ) (2) m mx (C(w)) D(w f,) = mx (C(w)) C(f) D(w f,) (23) Assuming C(w s ) C(w f ), s grows, C ε (s) ε increses fst in the shrp region while C (f) increses slowly in the flt region; therefore, such tht C (s) ε According to Inequlity (22)(23)(24), min D(w s,ε) > C (f). (24) ( C(w) ) > mx D(w s,ε) (C(w)) > mx (C(w)) D(w f,) ( ) > min C(w) D(w f,τ) which stisfies the Shrp Constrint. (25) APPENDIX B SENSITIVITY ANALYSES AND SHARPNESS VISUALIZATION We provide more sensitivity nlyses in Figure 6 nd Figure 7 s tested on different DNNs nd dtsets. More shrpness comprison between bseline nd SmoothOut is visulized in Figure 8 nd Figure 9. ACKNOWLEDGMENT This work ws supported in prt by DOE de-sc864, NSF nd Any opinions, findings, conclusions or recommendtions expressed in this mteril re those of the uthors nd do not necessrily reflect the views of DOE, NSF or their contrctors. () SB bseline SB SmoothOut Figure 8: Shrpness of bseline nd SmoothOut in () SB trining nd (b) LB trining of C. (b) REFERENCES [] G. Dimos, S. Sengupt, B. Ctnzro, M. Chrznowski, A. Cotes, E. Elsen, J. Engel, A. Hnnun, nd S. Stheesh, Persistent rnns: Stshing recurrent weights on-chip, in Interntionl Conference on Mchine Lerning, 26, pp [2] N. S. Keskr, D. Mudigere, J. Nocedl, M. Smelynskiy, nd P. T. P. Tng, On lrge-btch trining for deep lerning: Generliztion gp nd shrp minim, in Interntionl Conference on Lerning Representtions, 27. [3] P. Goyl, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrol, A. Tulloch, Y. Ji, nd K. He, Accurte, lrge minibtch sgd: trining imgenet in hour, rxiv preprint rxiv: , 27. [4] E. Hoffer, I. Hubr, nd D. Soudry, Trin longer, generlize better: closing the generliztion gp in lrge btch trining of neurl networks, in Advnces in Neurl Informtion Processing Systems, 27, pp [5] J.-P. Bouchud nd A. Georges, Anomlous diffusion in disordered medi: sttisticl mechnisms, models nd physicl pplictions, Physics reports, vol. 95, no. 4-5, pp , 99. [6] S. L. Smith, P.-J. Kindermns, nd Q. V. Le, Don t decy the lerning rte, increse the btch size, in Interntionl Conference on Lerning Representtions, 28. [7] Y. You, I. Gitmn, nd B. Ginsburg, Scling sgd btch size to 32k for imgenet trining, rxiv preprint rxiv: , 27. () SB bseline SB SmoothOut Figure 9: Shrpness of bseline nd SmoothOut in () SB trining nd (b) LB trining of F. (b) LB bseline LB bseline LB SmoothOut LB SmoothOut

10 PREPRINT [8] T. Akib, S. Suzuki, nd K. Fukud, Extremely lrge minibtch sgd: Trining resnet-5 on imgenet in 5 minutes, rxiv preprint rxiv:7.4325, 27. [9] S. Hochreiter nd J. Schmidhuber, Flt minim, Neurl Computtion, vol. 9, no., pp. 42, 997. [] P. Chudhri, A. Choromnsk, S. Sotto, nd Y. LeCun, Entropy-sgd: Bising grdient descent into wide vlleys, in Interntionl Conference on Lerning Representtions, 27. [] J. Rissnen, Modeling by shortest dt description, Automtic, vol. 4, no. 5, pp , 978. [2] S. Jstrzębski, Z. Kenton, D. Arpit, N. Blls, A. Fischer, Y. Bengio, nd A. Storkey, Finding fltter minim with sgd, 28. [3] C. Zhng, S. Bengio, M. Hrdt, B. Recht, nd O. Vinyls, Understnding deep lerning requires rethinking generliztion, rxiv preprint rxiv:6.353, 26. [4] C. S. Wllce nd D. M. Boulton, An informtion mesure for clssifiction, The Computer Journl, vol., no. 2, pp , 968. [5] D. J. McKy, A prcticl byesin frmework for bckpropgtion networks, Neurl computtion, vol. 4, no. 3, pp , 992. [6] S. L. Smith nd Q. V. Le, A byesin perspective on generliztion nd stochstic grdient descent, in Proceedings of Second workshop on Byesin Deep Lerning (NIPS 27), 27. [7] L. Dinh, R. Pscnu, S. Bengio, nd Y. Bengio, Shrp minim cn generlize for deep nets, in Interntionl Conference on Mchine Lerning, 27, pp [8] M. Welling nd Y. W. Teh, Byesin lerning vi stochstic grdient lngevin dynmics, in Proceedings of the 28th Interntionl Conference on Mchine Lerning (ICML-), 2, pp [9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, nd L. Fei-Fei, Imgenet: A lrge-scle hierrchicl imge dtbse, in Computer Vision nd Pttern Recognition, 29. CVPR 29. IEEE Conference on. IEEE, 29, pp [2] P. Izmilov, D. Podoprikhin, T. Gripov, D. Vetrov, nd A. G. Wilson, Averging weights leds to wider optim nd better generliztion, rxiv:83.547, 28. [2] A. Neelkntn, L. Vilnis, Q. V. Le, I. Sutskever, L. Kiser, K. Kurch, nd J. Mrtens, Adding grdient noise improves lerning for very deep networks, rxiv:5.687, 25. [22] H. Mobhi, Trining recurrent neurl networks by diffusion, rxiv:6.44, 26. [23] M. Fortunto, M. G. Azr, B. Piot, J. Menick, I. Osbnd, A. Grves, V. Mnih, R. Munos, D. Hssbis, O. Pietquin et l., Noisy networks for explortion, rxiv:76.295, 27. [24] M. Plppert, R. Houthooft, P. Dhriwl, S. Sidor, R. Y. Chen, X. Chen, T. Asfour, P. Abbeel, nd M. Andrychowicz, Prmeter spce noise for explortion, rxiv:76.95, 27. [25] Y. Li nd F. Liu, Whiteout: Gussin dptive noise regulriztion in feedforwrd neurl networks, rxiv:62.49, 26. [26] K. Ho, C.-s. Leung, nd J. Sum, On weight-noise-injection trining, in Interntionl Conference on Neurl Informtion Processing. Springer, 28, pp [27] R. M. Nel, Byesin lerning for neurl networks. Springer Science & Business Medi, 22, vol. 8. [28] G. Hinton nd D. vn Cmp, Keeping neurl networks simple by minimising the description length of weights. 993, in Proceedings of COLT-93, pp [29] D. P. Kingm nd M. Welling, Auto-encoding vritionl byes, rxiv:32.64, 23. [3] N. Srivstv, G. Hinton, A. Krizhevsky, I. Sutskever, nd R. Slkhutdinov, Dropout: simple wy to prevent neurl networks from overfitting, The Journl of Mchine Lerning Reserch, vol. 5, no., pp , 24. [3] Y. Gl nd Z. Ghhrmni, Dropout s byesin pproximtion: Representing model uncertinty in deep lerning, in interntionl conference on mchine lerning, 26, pp [32] X. Ji, S. Song, W. He, Y. Wng, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yng, L. Yu et l., Highly sclble deep lerning trining system with mixed-precision: Trining imgenet in four minutes, rxiv:87.25, 28. [33] H. Li, Z. Xu, G. Tylor, nd T. Goldstein, Visulizing the loss lndscpe of neurl nets, rxiv preprint rxiv:72.993, 27. [34] L. Bottou, Online lerning nd stochstic pproximtions, On-line lerning in neurl networks, vol. 7, no. 9, p. 42, 998. [35] I. J. Goodfellow, O. Vinyls, nd A. M. Sxe, Qulittively chrcterizing neurl network optimiztion problems, rxiv: , 24. [36] Y. LeCun, L. Bottou, Y. Bengio, nd P. Hffner, Grdient-bsed lerning pplied to document recognition, Proceedings of the IEEE, vol. 86, no., pp , 998. [37] A. Krizhevsky nd G. Hinton, Lerning multiple lyers of fetures from tiny imges, 29. [38] R. H. Byrd, P. Lu, J. Nocedl, nd C. Zhu, A limited memory lgorithm for bound constrined optimiztion, SIAM Journl on Scientific Computing, vol. 6, no. 5, pp. 9 28, 995. [39] A. Krizhevsky, I. Sutskever, nd G. E. Hinton, Imgenet clssifiction with deep convolutionl neurl networks, in Advnces in Neurl Informtion Processing Systems, 22, pp [4] K. He, X. Zhng, S. Ren, nd J. Sun, Deep residul lerning for imge recognition, in Proceedings of the IEEE Conference on Computer Vision nd Pttern Recognition, 26, pp

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Sufficient condition on noise correlations for scalable quantum computing

Sufficient condition on noise correlations for scalable quantum computing Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 16-0, Orlndo FL. (Minor errors in published version corrected.) A Signl-Level Fusion Model for Imge-Bsed

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

NOTES ON HILBERT SPACE

NOTES ON HILBERT SPACE NOTES ON HILBERT SPACE 1 DEFINITION: by Prof C-I Tn Deprtment of Physics Brown University A Hilbert spce is n inner product spce which, s metric spce, is complete We will not present n exhustive mthemticl

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Neural network LM. CS 690N, Spring 2018 Advanced Natural Language Processing

Neural network LM. CS 690N, Spring 2018 Advanced Natural Language Processing Neurl network LM CS 690N, Spring 2018 Advnced Nturl Lnguge Processing http://peoplecsumssedu/brenocon/nlp2018/ Brendn O Connor College nformtion nd Computer Sciences Universy Msschusetts Amrst Pper presenttions

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS F. Tkeo 1 nd M. Sk 1 Hchinohe Ntionl College of Technology, Hchinohe, Jpn; Tohoku University, Sendi, Jpn Abstrct:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Actor-Critic. Hung-yi Lee

Actor-Critic. Hung-yi Lee Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Learning Moore Machines from Input-Output Traces

Learning Moore Machines from Input-Output Traces Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Lecture 21: Order statistics

Lecture 21: Order statistics Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for

More information

Chapter 3 Polynomials

Chapter 3 Polynomials Dr M DRAIEF As described in the introduction of Chpter 1, pplictions of solving liner equtions rise in number of different settings In prticulr, we will in this chpter focus on the problem of modelling

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

A New Grey-rough Set Model Based on Interval-Valued Grey Sets

A New Grey-rough Set Model Based on Interval-Valued Grey Sets Proceedings of the 009 IEEE Interntionl Conference on Systems Mn nd Cybernetics Sn ntonio TX US - October 009 New Grey-rough Set Model sed on Intervl-Vlued Grey Sets Wu Shunxing Deprtment of utomtion Ximen

More information

Predict Global Earth Temperature using Linier Regression

Predict Global Earth Temperature using Linier Regression Predict Globl Erth Temperture using Linier Regression Edwin Swndi Sijbt (23516012) Progrm Studi Mgister Informtik Sekolh Teknik Elektro dn Informtik ITB Jl. Gnesh 10 Bndung 40132, Indonesi 23516012@std.stei.itb.c.id

More information

Hybrid Group Acceptance Sampling Plan Based on Size Biased Lomax Model

Hybrid Group Acceptance Sampling Plan Based on Size Biased Lomax Model Mthemtics nd Sttistics 2(3): 137-141, 2014 DOI: 10.13189/ms.2014.020305 http://www.hrpub.org Hybrid Group Acceptnce Smpling Pln Bsed on Size Bised Lomx Model R. Subb Ro 1,*, A. Ng Durgmmb 2, R.R.L. Kntm

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations ME 3600 Control ystems Chrcteristics of Open-Loop nd Closed-Loop ystems Importnt Control ystem Chrcteristics o ensitivity of system response to prmetric vritions cn be reduced o rnsient nd stedy-stte responses

More information

Abstract inner product spaces

Abstract inner product spaces WEEK 4 Abstrct inner product spces Definition An inner product spce is vector spce V over the rel field R equipped with rule for multiplying vectors, such tht the product of two vectors is sclr, nd the

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

NUMERICAL INTEGRATION

NUMERICAL INTEGRATION NUMERICAL INTEGRATION How do we evlute I = f (x) dx By the fundmentl theorem of clculus, if F (x) is n ntiderivtive of f (x), then I = f (x) dx = F (x) b = F (b) F () However, in prctice most integrls

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

Emission of K -, L - and M - Auger Electrons from Cu Atoms. Abstract

Emission of K -, L - and M - Auger Electrons from Cu Atoms. Abstract Emission of K -, L - nd M - uger Electrons from Cu toms Mohmed ssd bdel-rouf Physics Deprtment, Science College, UEU, l in 17551, United rb Emirtes ssd@ueu.c.e bstrct The emission of uger electrons from

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils

More information

Estimation of Binomial Distribution in the Light of Future Data

Estimation of Binomial Distribution in the Light of Future Data British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future

More information

Research on Modeling and Compensating Method of Random Drift of MEMS Gyroscope

Research on Modeling and Compensating Method of Random Drift of MEMS Gyroscope 01 4th Interntionl Conference on Signl Processing Systems (ICSPS 01) IPCSIT vol. 58 (01) (01) IACSIT Press, Singpore DOI: 10.7763/IPCSIT.01.V58.9 Reserch on Modeling nd Compensting Method of Rndom Drift

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Online Short Term Load Forecasting by Fuzzy ARTMAP Neural Network

Online Short Term Load Forecasting by Fuzzy ARTMAP Neural Network Online Short Term Lod Forecsting by Fuzzy ARTMAP Neurl Network SHAHRAM JAVADI Electricl Engineering Deprtment AZAD University Tehrn Centrl Brnch Moshnir Power Electric Compny IRAN Abstrct: This pper presents

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

2008 Mathematical Methods (CAS) GA 3: Examination 2

2008 Mathematical Methods (CAS) GA 3: Examination 2 Mthemticl Methods (CAS) GA : Exmintion GENERAL COMMENTS There were 406 students who st the Mthemticl Methods (CAS) exmintion in. Mrks rnged from to 79 out of possible score of 80. Student responses showed

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Minimum Energy State of Plasmas with an Internal Transport Barrier

Minimum Energy State of Plasmas with an Internal Transport Barrier Minimum Energy Stte of Plsms with n Internl Trnsport Brrier T. Tmno ), I. Ktnum ), Y. Skmoto ) ) Formerly, Plsm Reserch Center, University of Tsukub, Tsukub, Ibrki, Jpn ) Plsm Reserch Center, University

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

Best Approximation. Chapter The General Case

Best Approximation. Chapter The General Case Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

and that at t = 0 the object is at position 5. Find the position of the object at t = 2. 7.2 The Fundmentl Theorem of Clculus 49 re mny, mny problems tht pper much different on the surfce but tht turn out to be the sme s these problems, in the sense tht when we try to pproimte solutions we

More information

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a). The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples

More information

Week 10: Line Integrals

Week 10: Line Integrals Week 10: Line Integrls Introduction In this finl week we return to prmetrised curves nd consider integrtion long such curves. We lredy sw this in Week 2 when we integrted long curve to find its length.

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information

Markscheme May 2016 Mathematics Standard level Paper 1

Markscheme May 2016 Mathematics Standard level Paper 1 M6/5/MATME/SP/ENG/TZ/XX/M Mrkscheme My 06 Mthemtics Stndrd level Pper 7 pges M6/5/MATME/SP/ENG/TZ/XX/M This mrkscheme is the property of the Interntionl Bcclurete nd must not be reproduced or distributed

More information

Probabilistic Investigation of Sensitivities of Advanced Test- Analysis Model Correlation Methods

Probabilistic Investigation of Sensitivities of Advanced Test- Analysis Model Correlation Methods Probbilistic Investigtion of Sensitivities of Advnced Test- Anlysis Model Correltion Methods Liz Bergmn, Mtthew S. Allen, nd Dniel C. Kmmer Dept. of Engineering Physics University of Wisconsin-Mdison Rndll

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

Learning A Task-Specific Deep Architecture For Clustering

Learning A Task-Specific Deep Architecture For Clustering Lerning A Tsk-Specific Deep Architecture For Clustering Zhngyng Wng Shiyu Chng Jiyu Zhou Meng Wng Thoms S. Hung Abstrct While sprse coding-bsed clustering methods hve shown to be successful, their bottlenecks

More information

Electron Correlation Methods

Electron Correlation Methods Electron Correltion Methods HF method: electron-electron interction is replced by n verge interction E HF c E 0 E HF E 0 exct ground stte energy E HF HF energy for given bsis set HF Ec 0 - represents mesure

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

CBE 291b - Computation And Optimization For Engineers

CBE 291b - Computation And Optimization For Engineers The University of Western Ontrio Fculty of Engineering Science Deprtment of Chemicl nd Biochemicl Engineering CBE 9b - Computtion And Optimiztion For Engineers Mtlb Project Introduction Prof. A. Jutn Jn

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information