Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab
|
|
- Dominic Henry
- 5 years ago
- Views:
Transcription
1 To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber Cristoper M. Bisop Neural Computing Researc Group Aston University, Birmingam, B4 7ET, U.K. ttp:// Abstract Te tecniques of Bayesian inference ave been applied wit great success to many problems in neural computing including evaluation of regression functions, determination of error bars on predictions, and te treatment of yper-parameters. However, te problem of model comparison is a muc more callenging one for wic current tecniques ave signicant limitations. In tis paper we sow ow an extended form of Markov cain Monte Carlo, called caining, is able to provide eective estimates of te relative probabilities of dierent models. We present results from te robot arm problem and compare tem wit te corresponding results obtained using te standard Gaussian approximation framework. Bayesian Model Comparison In a Bayesian treatment of statistical inference, our state of knowledge of te values of te parameters w in a model M is described in terms of a probability distribution function. Initially tis is cosen to be some prior distribution p(wjm), wic can be combined wit a likeliood function p(djw; M) using Bayes' teorem to give a posterior distribution p(wjd; M) in te form p(wjd; M) = p(djw; M)p(wjM) p(djm) () were D is te data set. Predictions of te model are obtained by performing integrations weigted by te posterior distribution.
2 Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probabilities p(m i ) to give p(m i jd) p(m j jd) = p(djm i)p(m i ) () p(djm j )p(m j ) and so requires tat we be able to evaluate te model evidence p(djm i ), wic corresponds to te denominator in (). Te relative probabilities of dierent models can be used to select te single most probable model, or to form a committee of models, weiged by teir probabilities. It is convenient to write te numerator of () in te form expf E(w)g, were E(w) is an error function. Normalization of te posterior distribution ten requires tat p(djm) = Z expf E(w)g dw: (3) Generally, it is straigtforward to evaluate E(w) for a given value of w, altoug it is extremely dicult to evaluate te corresponding model evidence using (3) since te posterior distribution is typically very small except in narrow regions of te ig-dimensional parameter space, wic are unknown a-priori. Standard numerical integration tecniques are terefore inapplicable. One approac is based on a local Gaussian approximation around a mode of te posterior (MacKay, 99). Unfortunately, tis approximation is expected to be accurate only wen te number of data points is large in relation to te number of parameters in te model. In fact it is for relatively complex models, or problems for wic data is scarce, tat Bayesian metods ave te most to oer. Indeed, Neal, R. M. (996) as argued tat, from a Bayesian perspective, tere is no reason to limit te number of parameters in a model, oter tan for computational reasons. We terefore consider an approac to te evaluation of model evidence wic overcomes te limitations of te Gaussian framework. For additional tecniques and references to Bayesian model comparison, see Gilks et al. (995) and Kass and Raftery (995). Caining Suppose we ave a simple model M 0 for wic we can evaluate te evidence analytically, and for wic we can easily generate a sample w l (were l = ; : : : ; L) from te corresponding distribution p(wjd; M 0 ). Ten te evidence for some oter model M can be expressed in te form p(djm) p(djm 0 ) = ' Z L expf E(w) + E 0 (w)gp(wjd; M 0 ) dw LX l= expf E(w l ) + E 0 (w l )g: (4) Unfortunately, te Monte Carlo approximation in (4) will be poor if te two error functions are signicantly dierent, since te exponent is dominated by regions were E is relatively small, for wic tere will be few samples unless E 0 is also small in tose regions. A simple Monte Carlo approac will terefore yield poor results. Tis problem is equivalent to te evaluation of free energies in statistical pysics,
3 wic is known to be a callenging problem, and were a number of approaces ave been developed Neal (993). Here we discuss one suc approac to tis problem based on a cain of K successive models M i wic interpolate between M 0 and M, so tat te required evidence can be written as p(djm) = p(djm 0 ) p(djm ) p(djm ) p(djm 0 ) p(djm ) : : : p(djm) p(djm K ) : (5) Eac of te ratios in (5) can be evaluated using (4). Te goal is to devise a cain of models suc tat eac successive pair of models as probability distributions wic are reasonably close, so tat eac of te ratios in (5) can be evaluated accurately, wile keeping te total number of links in te cain fairly small to limit te computational costs. We ave cosen te tecnique of ybrid Monte Carlo (Duane et al., 987; Neal, 993) to sample from te various distributions, since tis as been sown to be eective for sampling from te complex distributions arising wit neural network models (Neal, R. M., 996). Tis involves introducing Hamiltonian equations of motion in wic te parameters w are augmented by a set of ctitious `momentum' variables, wic are ten integrated using te leapfrog metod. At te end of eac trajectory te new parameter vector is accepted wit a probability governed by te Metropolis criterion, and te momenta are replaced using Gibbs sampling. As a ceck on our software implementation of caining, we ave evaluated te evidence for a mixture of two non-isotropic Gaussian distributions, and obtained a result wic was witin 0% of te analytical solution. 3 Application to Neural Networks We now consider te application of te caining metod to regression problems involving neural network models. Te network corresponds to a function y(x; w), and te data set consists of N pairs of input vectors x n and corresponding targets t n were n = ; : : : ; N. Assuming Gaussian noise on te target data, te likeliood function takes te form p(djw; M) = N= exp( NX n= ky(x n ; w) t n k ) were is a yper-parameter representing te inverse of te noise variance. We consider networks wit a single idden layer of `tan' units, and linear output units. Following Neal, R. M. (996) we use a diagonal Gaussian prior in wic te weigts are divided into groups w k, were k = ; : : : ; 4 corresponding to input-to-idden weigts, idden-unit biases, idden-to-output weigts, and output biases. Eac group is governed by a separate `precision' yper-parameter k, so tat te prior takes te form ( ) p(wjf k g) = exp X k wk T w k (7) Z W were Z W is te normalization coecient. Te yper-parameters f k g and are temselves eac governed by yper-priors given by Gamma distributions of te form k (6) p() / s exp( s=!) (8)
4 in wic te mean! and variance! =s are cosen to give very broad yper-priors in reection of our limited prior knowledge of te values of te yper-parameters. We use te ybrid Monte Carlo algoritm to sample from te joint distribution of parameters and yper-parameters. For te evaluation of evidence ratios, owever, we consider only te parameter samples, and perform te integrals over yperparameters analytically, using te fact tat te gamma distribution is conjugate to te Gaussian. In order to apply caining to tis problem, we coose te prior as our reference distribution, and ten dene a set of intermediate distributions based on a parameter wic governs te eective contribution from te data term, so tat E(; w) = (w) + E 0 (w) (9) were (w) arises from te likeliood term (6) wile E 0 (w) corresponds to te prior (7). We select a set of 8 values of wic interpolate between te reference distribution ( = 0) and te desired model distribution ( = ). Te evidence for te prior alone is easily evaluated analytically. 4 Gaussian Approximation As a comparison against te metod of caining, we consider te framework of MacKay (99) based on a local Gaussian approximation to te posterior distribution. Tis approac makes use of te evidence approximation in wic te integration over yper-parameters is approximated by setting tem to specic values wic are temselves determined by maximizing teir evidence functions. Tis leads to a ierarcical treatment as follows. At te lowest level, te maximum bw of te posterior distribution over weigts is found for xed values of te yperparameters by minimizing te error function. Periodically te yper-parameters are re-estimated by evidence maximization, were te evidence is obtained analytically using te Gaussian approximation. Tis gives te following re-estimation formulae := N NX n= ky(x n ; bw) t n k k := k bw T k bw k (0) were k = W k k Tr k P (A ), W k is te total number of parameters in group k, A = rre(bw), = k k, and Tr k () denotes te trace over te kt group of parameters. Te weigts are updated in an inner loop by minimizing te error function using a conjugate gradient optimizer, wile te yper-parameters are periodically re-estimated using (0). Once training is complete, te model evidence is evaluated by making a Gaussian approximation around te converged values of te yper-parameters, and integrating over tis distribution analytically. Tis gives te model log evidence as ln p(djm) = E(bw) ln jaj + N ln + ln! + ln + X k X k W k ln k + ln (= k ) + ln (=(N )) : () Note tat we are assuming tat te yper-priors (8) are suciently broad tat tey ave no eect on te location of te evidence maximum and can terefore be neglected.
5 Here is te number of idden units, and te terms ln! + ln take account of te many equivalent modes of te posterior distribution arising from sign-ip and idden unit intercange symmetries in te network model. A derivation of tese results can be found in Bisop (995; pages 434{436). Te result () corresponds to a single mode of te distribution. If we initialize te weigt optimization algoritm wit dierent random values we can nd distinct solutions. In order to compute an overall evidence for te particular network model wit a given number of idden units, we make te assumption tat we ave found all of te distinct modes of te posterior distribution precisely once eac, and ten sum te evidences to arrive at te total model evidence. Tis neglects te possibility tat some of te solutions found are related by symmetry transformations (and terefore already taken into account) or tat we ave missed important modes. Wile some attempt could be made to detect degenerate solutions, it will be dicult to do muc better tan te above witin te framework of te Gaussian approximation. 5 Results: Robot Arm Problem As an illustration of te evaluation of model evidence for a larger-scale problem we consider te modelling of te forward kinematics for a two-link robot arm in a two-dimensional space, as introduced by MacKay (99). Tis problem was cosen as MacKay reports good results in using te Gaussian approximation framework to evaluate te evidences, and provides a good opportunity for comparison wit te caining approac. Te task is to learn te mapping (x ; x )! (y ; y ) given by y = :0 cos(x ) + :3 cos(x + x ) y = :0 sin(x ) + :3 sin(x + x ) () were te data set consists of 00 input-output pairs wit outputs corrupted by zero mean Gaussian noise wit standard deviation = 0:05. We ave used te original training data of MacKay, but generated our own test set of 000 points using te same prescription. Te evidence is evaluated using bot caining and te Gaussian approximation, for networks wit various numbers of idden units. In te caining metod, te particular form of te gamma priors for te precision variables are as follows: for te input-to-idden weigts and idden-unit biases,! =, s = 0:; for te idden-to-output weigts,! =, s = 0:; for te output biases,! = 0:, s =. Te noise level yper-parameters were! = 400, s = 0:. Tese settings follow closely tose used by Neal, R. M. (996) for te same problem. Te idden-to-output precision scaling was cosen by Neal suc tat te limit of an innite number of idden units is well dened and corresponds to a Gaussian process prior. For eac evidence ratio in te cain, te rst 00 samples from te ybrid Monte Carlo run, obtained wit a trajectory lengt of 50 leapfrog iterations, are omitted to give te algoritm a cance to reac te equilibrium distribution. Te next 600 samples are obtained using a trajectory lengt of 300 and are used to evaluate te evidence ratio. In Figure (a) we sow te error values of te sampling stage for 4 idden units, were we see tat te errors are largely uncorrelated, as required for eective Monte Carlo sampling. In Figure (b), we plot te values of lnfp(djm i )=p(djm i )g against i i = ::8. Note tat tere is a large cange in te evidence ratios at te beginning of te cain, were we sample close to te reference distribution. For tis
6 (a) Figure : (a) error E( = 0:6;w) for = 4, plotted for 600 successive Monte Carlo samples. (b) Values of te ratio lnfp(djm i )=p(djm i )g for i = ; : : : ; 8 for = 4. (b) reason, we coose te i to be dense close to = 0. We are currently researcing more principled approaces to te partitioning selection. Figure (a) sows te log model evidence against te number of idden units. Note tat te caining approac is computationally expensive: for =4, a complete cain takes 48 ours in a Matlab implementation running on a Silicon Grapics Callenge L. We see tat tere is no decline in te evidence as te number of idden units grows. Correspondingly, in Figure (b), we see tat te test error performance does not degrade as te number of idden units increases. Tis indicates tat tere is no over-tting wit increasing model complexity, in accordance wit Bayesian expectations. Te corresponding results from te Gaussian approximation approac are sown in Figure 3. We see tat tere is a caracteristic `Occam ill' wereby te evidence sows a peak at around =, wit a strong decrease for smaller values of and a slower decrease for larger values. Te corresponding test set errors similarly sow a minimum at around =, indicating tat te Gaussian approximation is becoming increasingly inaccurate for more complex models. 6 Discussion We ave seen tat te use of caining allows te eective evaluation of model evidences for neural networks using Monte Carlo tecniques. In particular, we nd tat tere is no peak in te model evidence, or te corresponding test set error, as te number of idden units is increased, and so tere is no indication of over- tting. Tis is in accord wit te expectation tat model complexity sould not be limited by te size of te data set, and is in marked contrast to te conventional (a).4.3 (b) Figure : (a) Plot of ln p(djm) for dierent numbers of idden units. (b) Test error against te number of idden units. Here te teoretical minimum value is.0. For = 64 te test error is.
7 850 (a) 3.5 (b) Figure 3: (a) Plot of te model evidence for te robot arm problem versus te number of idden units, using te Gaussian approximation framework. Tis clearly sows te caracteristic `Occam ill' sape. Note tat te evidence is computed up to an additive constant, and so te origin of te vertical axis as no signicance. (b) Corresponding plot of te test set error versus te number of idden units. Individual points correspond to particular modes of te posterior weigt distribution, wile te line sows te mean test set error for eac value of. maximum likeliood viewpoint. It is also consistent wit te result tat, in te limit of an innite number of idden units, te prior over network weigts leads to a well-dened Gaussian prior over functions (Williams, 997). An important advantage of being able to make accurate evaluations of te model evidence is te ability to compare quite distinct kinds of model, for example radial basis function networks and multi-layer perceptrons. Tis can be done eiter by caining bot models back to a common reference model, or by evaluating normalized model evidences explicitly. Acknowledgements We would like to tank Cris Williams and Alastair Bruce for a number of useful discussions. Tis work was supported by EPSRC grant GR/J7545: Novel Developments in Learning Teory for Neural Networks. References Bisop, C. M. (995). Neural Networks for Pattern Recognition. Oxford University Press. Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Rowet (987). Hybrid Monte Carlo. Pysics Letters B 95 (), 6{. Gilks, W. R., S. Ricardson, and D. J. Spiegelalter (995). Markov Cain Monte Carlo in Practice. Capman and Hall. Kass, R. E. and A. E. Raftery (995). Bayes factors. J. Am. Statist. Ass. 90, 773{795. MacKay, D. J. C. (99). A practical Bayesian framework for back-propagation networks. Neural Computation 4 (3), 448{47. Neal, R. M. (993). Probabilistic inference using Markov cain Monte Carlo metods. Tecnical Report CRG-TR-93-, Department of Computer Science, University of Toronto, Cananda. Neal, R. M. (996). Bayesian Learning for Neural Networks. New York: Springer. Lecture Notes in Statistics 8. Williams, C. K. I. (997). Computing wit innite networks. In "NIPS9". Tis volume.
A = h w (1) Error Analysis Physics 141
Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.
More informationWidths. Center Fluctuations. Centers. Centers. Widths
Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.
More informationGaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.
In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing
More informationRegularized Regression
Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc
More informationNumerical Differentiation
Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x
More informationA MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES
A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties
More informationOverdispersed Variational Autoencoders
Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,
More informationOrder of Accuracy. ũ h u Ch p, (1)
Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical
More informationLIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT
LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as
More informationMAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016
MAT244 - Ordinary Di erential Equations - Summer 206 Assignment 2 Due: July 20, 206 Full Name: Student #: Last First Indicate wic Tutorial Section you attend by filling in te appropriate circle: Tut 0
More informationDeep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy
Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer
More informationLecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.
Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative
More informationLIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION
LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y
More informationSpike train entropy-rate estimation using hierarchical Dirichlet process priors
publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu
More informationChapter 5 FINITE DIFFERENCE METHOD (FDM)
MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain
More informationYishay Mansour. AT&T Labs and Tel-Aviv University. design special-purpose planning algorithms that exploit. this structure.
A Sparse Sampling Algoritm for Near-Optimal Planning in Large Markov Decision Processes Micael Kearns AT&T Labs mkearns@researc.att.com Yisay Mansour AT&T Labs and Tel-Aviv University mansour@researc.att.com
More informationMinimizing D(Q,P) def = Q(h)
Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact
More information2.3 Product and Quotient Rules
.3. PRODUCT AND QUOTIENT RULES 75.3 Product and Quotient Rules.3.1 Product rule Suppose tat f and g are two di erentiable functions. Ten ( g (x)) 0 = f 0 (x) g (x) + g 0 (x) See.3.5 on page 77 for a proof.
More informationExam 1 Review Solutions
Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),
More informationlecture 26: Richardson extrapolation
43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)
More informationThe Verlet Algorithm for Molecular Dynamics Simulations
Cemistry 380.37 Fall 2015 Dr. Jean M. Standard November 9, 2015 Te Verlet Algoritm for Molecular Dynamics Simulations Equations of motion For a many-body system consisting of N particles, Newton's classical
More informationLong Term Time Series Prediction with Multi-Input Multi-Output Local Learning
Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles
More informationFlavius Guiaş. X(t + h) = X(t) + F (X(s)) ds.
Numerical solvers for large systems of ordinary differential equations based on te stocastic direct simulation metod improved by te and Runge Kutta principles Flavius Guiaş Abstract We present a numerical
More informationEDML: A Method for Learning Parameters in Bayesian Networks
: A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu
More informationQuantum Numbers and Rules
OpenStax-CNX module: m42614 1 Quantum Numbers and Rules OpenStax College Tis work is produced by OpenStax-CNX and licensed under te Creative Commons Attribution License 3.0 Abstract Dene quantum number.
More informationInf sup testing of upwind methods
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Met. Engng 000; 48:745 760 Inf sup testing of upwind metods Klaus-Jurgen Bate 1; ;, Dena Hendriana 1, Franco Brezzi and Giancarlo
More information[db]
Blind Source Separation based on Second-Order Statistics wit Asymptotically Optimal Weigting Arie Yeredor Department of EE-Systems, el-aviv University P.O.Box 3900, el-aviv 69978, Israel Abstract Blind
More informationContinuity and Differentiability Worksheet
Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;
More informationNotes on Neural Networks
Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat
More informationTaylor Series and the Mean Value Theorem of Derivatives
1 - Taylor Series and te Mean Value Teorem o Derivatives Te numerical solution o engineering and scientiic problems described by matematical models oten requires solving dierential equations. Dierential
More informationCopyright c 2008 Kevin Long
Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula
More informationLogarithmic functions
Roberto s Notes on Differential Calculus Capter 5: Derivatives of transcendental functions Section Derivatives of Logaritmic functions Wat ou need to know alread: Definition of derivative and all basic
More informationTeaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line
Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,
More information3.1 Extreme Values of a Function
.1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find
More informationSolution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.
December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need
More informationFast Exact Univariate Kernel Density Estimation
Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis
More informationCS522 - Partial Di erential Equations
CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its
More informationKernel Density Based Linear Regression Estimate
Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency
More informationEfficient algorithms for for clone items detection
Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire
More informationHOMEWORK HELP 2 FOR MATH 151
HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,
More information5.1 We will begin this section with the definition of a rational expression. We
Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go
More informationMaterial for Difference Quotient
Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient
More informationThe Basics of Vacuum Technology
Te Basics of Vacuum Tecnology Grolik Benno, Kopp Joacim January 2, 2003 Basics Many scientific and industrial processes are so sensitive tat is is necessary to omit te disturbing influence of air. For
More informationThe Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationCombining functions: algebraic methods
Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)
More informationHow to Find the Derivative of a Function: Calculus 1
Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te
More informationEFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS
Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,
More informationDefinition of the Derivative
Te Limit Definition of te Derivative Tis Handout will: Define te limit grapically and algebraically Discuss, in detail, specific features of te definition of te derivative Provide a general strategy of
More informationDerivatives. By: OpenStaxCollege
By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator
More informationNONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL. Georgeta Budura
NONLINEAR SYSTEMS IDENTIFICATION USING THE VOLTERRA MODEL Georgeta Budura Politenica University of Timisoara, Faculty of Electronics and Telecommunications, Comm. Dep., georgeta.budura@etc.utt.ro Abstract:
More information5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems
5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we
More informationSolution for the Homework 4
Solution for te Homework 4 Problem 6.5: In tis section we computed te single-particle translational partition function, tr, by summing over all definite-energy wavefunctions. An alternative approac, owever,
More informationThe derivative function
Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative
More informationDifferential Calculus (The basics) Prepared by Mr. C. Hull
Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit
More informationGaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong
More information4.2 - Richardson Extrapolation
. - Ricardson Extrapolation. Small-O Notation: Recall tat te big-o notation used to define te rate of convergence in Section.: Definition Let x n n converge to a number x. Suppose tat n n is a sequence
More information(4.2) -Richardson Extrapolation
(.) -Ricardson Extrapolation. Small-O Notation: Recall tat te big-o notation used to define te rate of convergence in Section.: Suppose tat lim G 0 and lim F L. Te function F is said to converge to L as
More informationHOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS
HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3
More informationHARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS
HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS V Gosbell University of Wollongong Department of Electrical, Computer & Telecommunications Engineering, Wollongong, NSW 2522, Australia
More informationestimate results from a recursive sceme tat generalizes te algoritms of Efron (967), Turnbull (976) and Li et al (997) by kernel smooting te data at e
A kernel density estimate for interval censored data Tierry Ducesne and James E Staord y Abstract In tis paper we propose a kernel density estimate for interval-censored data It retains te simplicity andintuitive
More informationExercises for numerical differentiation. Øyvind Ryan
Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can
More informationFINITE ELEMENT STOCHASTIC ANALYSIS
FINITE ELEMENT STOCHASTIC ANALYSIS Murray Fredlund, P.D., P.Eng., SoilVision Systems Ltd., Saskatoon, SK ABSTRACT Numerical models can be valuable tools in te prediction of seepage. Te results can often
More information(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?
Solutions to Test 1 Fall 016 1pt 1. Te grap of a function f(x) is sown at rigt below. Part I. State te value of eac limit. If a limit is infinite, state weter it is or. If a limit does not exist (but is
More informationArtificial Neural Network Model Based Estimation of Finite Population Total
International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony
More informationTwo Spirals Two Gaussians Letters
12 1 8 6 4 2 Two Spirals Two Gaussians Letters Figure 8: Number of examples needed for average error to reac.3. From left to rigt: random, uncertainty, maximal distance and lookaead sampling metods. contains
More information1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)
Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of
More informationOptimal parameters for a hierarchical grid data structure for contact detection in arbitrarily polydisperse particle systems
Comp. Part. Mec. 04) :357 37 DOI 0.007/s4057-04-000-9 Optimal parameters for a ierarcical grid data structure for contact detection in arbitrarily polydisperse particle systems Dinant Krijgsman Vitaliy
More informationSECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES
(Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,
More informationLecture 21. Numerical differentiation. f ( x+h) f ( x) h h
Lecture Numerical differentiation Introduction We can analytically calculate te derivative of any elementary function, so tere migt seem to be no motivation for calculating derivatives numerically. However
More informationPreface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser
More informationChapter 2 Limits and Continuity
4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(
More informationBootstrap prediction intervals for Markov processes
arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA
More informationCS340: Bayesian concept learning. Kevin Murphy Based on Josh Tenenbaum s PhD thesis (MIT BCS 1999)
CS340: Bayesian concept learning Kevin Murpy Based on Jos Tenenbaum s PD tesis (MIT BCS 1999) Concept learning (binary classification) from positive and negative examples Concept learning from positive
More informationDifferentiation in higher dimensions
Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends
More informationMVT and Rolle s Theorem
AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state
More informationc [2016] Bud B. Coulson ALL RIGHTS RESERVED
c 206 Bud B. Coulson ALL RIGHTS RESERVED AN AFFINE WEYL GROUP INTERPRETATION OF THE MOTIVATED PROOFS OF THE ROGERS-RAMANUJAN AND GORDON-ANDREWS-BRESSOUD IDENTITIES BY BUD B. COULSON A dissertation submitted
More informationContinuity and Differentiability of the Trigonometric Functions
[Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te
More informationWork and Energy. Introduction. Work. PHY energy - J. Hedberg
Work and Energy PHY 207 - energy - J. Hedberg - 2017 1. Introduction 2. Work 3. Kinetic Energy 4. Potential Energy 5. Conservation of Mecanical Energy 6. Ex: Te Loop te Loop 7. Conservative and Non-conservative
More informationRobotic manipulation project
Robotic manipulation project Bin Nguyen December 5, 2006 Abstract Tis is te draft report for Robotic Manipulation s class project. Te cosen project aims to understand and implement Kevin Egan s non-convex
More information= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)
Paper 1: Pure Matematics 1 Mark Sceme 1(a) (i) (ii) d d y 3 1x 4x x M1 A1 d y dx 1.1b 1.1b 36x 48x A1ft 1.1b Substitutes x = into teir dx (3) 3 1 4 Sows d y 0 and states ''ence tere is a stationary point''
More informationLecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines
Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to
More informationA Nonparametric Prior for Simultaneous Covariance Estimation
WEB APPENDIX FOR A Nonparametric Prior for Simultaneous Covariance Estimation J. T. Gaskins and M. J. Daniels Appendix : Derivation of Teoretical Properties Tis appendix contains proof for te properties
More informationFinancial Econometrics Prof. Massimo Guidolin
CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis
More informationDigital Filter Structures
Digital Filter Structures Te convolution sum description of an LTI discrete-time system can, in principle, be used to implement te system For an IIR finite-dimensional system tis approac is not practical
More informationAn Empirical Bayesian interpretation and generalization of NL-means
Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation
More informationEstimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution
International Journal of Clinical Medicine Researc 2016; 3(5): 76-80 ttp://www.aascit.org/journal/ijcmr ISSN: 2375-3838 Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution
More informationMath 31A Discussion Notes Week 4 October 20 and October 22, 2015
Mat 3A Discussion Notes Week 4 October 20 and October 22, 205 To prepare for te first midterm, we ll spend tis week working eamples resembling te various problems you ve seen so far tis term. In tese notes
More informationSin, Cos and All That
Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives
More informationRecall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if
Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions
More informationThe Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA
Te Krewe of Caesar Problem David Gurney Souteastern Louisiana University SLU 10541, 500 Western Avenue Hammond, LA 7040 June 19, 00 Krewe of Caesar 1 ABSTRACT Tis paper provides an alternative to te usual
More informationNew families of estimators and test statistics in log-linear models
Journal of Multivariate Analysis 99 008 1590 1609 www.elsevier.com/locate/jmva ew families of estimators and test statistics in log-linear models irian Martín a,, Leandro Pardo b a Department of Statistics
More informationDedicated to the 70th birthday of Professor Lin Qun
Journal of Computational Matematics, Vol.4, No.3, 6, 4 44. ACCELERATION METHODS OF NONLINEAR ITERATION FOR NONLINEAR PARABOLIC EQUATIONS Guang-wei Yuan Xu-deng Hang Laboratory of Computational Pysics,
More informationPractice Problem Solutions: Exam 1
Practice Problem Solutions: Exam 1 1. (a) Algebraic Solution: Te largest term in te numerator is 3x 2, wile te largest term in te denominator is 5x 2 3x 2 + 5. Tus lim x 5x 2 2x 3x 2 x 5x 2 = 3 5 Numerical
More informationTHE STURM-LIOUVILLE-TRANSFORMATION FOR THE SOLUTION OF VECTOR PARTIAL DIFFERENTIAL EQUATIONS. L. Trautmann, R. Rabenstein
Worksop on Transforms and Filter Banks (WTFB),Brandenburg, Germany, Marc 999 THE STURM-LIOUVILLE-TRANSFORMATION FOR THE SOLUTION OF VECTOR PARTIAL DIFFERENTIAL EQUATIONS L. Trautmann, R. Rabenstein Lerstul
More informationHandling Missing Data on Asymmetric Distribution
International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan
More information1 2 x Solution. The function f x is only defined when x 0, so we will assume that x 0 for the remainder of the solution. f x. f x h f x.
Problem. Let f x x. Using te definition of te derivative prove tat f x x Solution. Te function f x is only defined wen x 0, so we will assume tat x 0 for te remainder of te solution. By te definition of
More information