Stat 451 Lecture Notes Numerical Integration
|
|
- Chrystal Lester
- 6 years ago
- Views:
Transcription
1 Stat 451 Lecture Notes Numerical Integration Ryan Martin UIC 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, / 29
2 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 2 / 29
3 Motivation While many statistics problems rely on optimization, there are also some that require numerical integration. Bayesian statistics is almost exclusively integration. data admits a likelihood function L(θ); θ unknown, so assign it a weight function π(θ); combine prior and data using Bayes s formula π(θ x) = L(θ)π(θ) Θ L(θ )π(θ ) dθ. Need to compute probabilities and expectations integrals! Some non-bayesian problems may involve integration, e.g., random- or mixed-effects models. Other approaches besides Bayesian and frequentist... 3 / 29
4 Intuition There are a number of classical numerical integration techniques, simple and powerful. Think back to calculus class where integral was defined: approximate function by a constant on small intervals; compute area of rectangles and sum them up; integral defined as the limit of this sum as mesh 0. Numerical integration, or quadrature, is based on this definition and refinements thereof. Basic principle: 3 approximate the function on a small interval by a nice one that you know how to integrate. Works well for one- or two-dim integrals; for higher-dim integrals, other tools are needed. 3 This was essentially the same principle that motivated the various methods we discussed for optimization! 4 / 29
5 Notation Suppose that f (x) is a function that we d like to integrate over an interval [a, b]. Take n relatively large and set h = (b a)/n. Let x i = a + ih, i = 0,..., n 1. Key point: if f (x) is nice, then it can be approximated by a simple function on the small interval [x i, x i+1 ]. A general strategy is to approximate the integral by b a n 1 f (x) dx = i=0 xi+1 x i f (x) for appropriately chosen A ij s and m. n 1 i=0 j=0 m A ij f (x ij ), 5 / 29
6 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 6 / 29
7 Polynomial approximation Consider the following sequence of polynomials: Then p ij (x) = k j p i (x) = x x ik x ij x ik, j = 0,..., m. m p ij (x)f (x ij ) j=0 is an m th degree polynomial that interpolates f (x) at the nodes x i0,..., x im. Furthermore, xi+1 x i f (x) dx xi+1 x i p(x) dx = j=0 xi+1 m p ij (x) dx f (x ij ). x } i {{ } A ij 7 / 29
8 Riemann rule: m = 0 Approximate f (x) on [x i, x i+1 ] by a constant. Here x i0 = x i and p i0 (x) 1, so b a n 1 n 1 f (x) dx f (x i )(x i+1 x i ) = h f (x i ). i=0 Features of Riemann s rule: Very easy to program only need f (x 0 ),..., f (x n ). Can be slow to converge, i.e., lots of x i s may be needed to get a good approximation. i=0 8 / 29
9 Trapezoid rule: m = 1 Approximate f (x) on [x i, x i+1 ] by a linear function. In this case: x i0 = x i and x i1 = x i+1. A i0 = A i1 = (x i+1 x i )/2 = h/2. Therefore, b a f (x) dx h 2 n 1 { f (xi ) + f (x i+1 ) }. i=0 Still only requires function evaluations at the x i s. More accurate then Riemann because the linear approximation is more flexible than constant. Can derive bounds on the approximation error... 9 / 29
10 Trapezoid rule (cont.) A general tool which we can use to study the precision of the trapezoid rule is the Euler Maclaurin formula. Suppose that g(x) is twice differentiable; then n g(t) t=0 n 0 g(t) dt {g(0) + g(n)} + C 1 g (t) n 0, where LHS RHS C 2 n 0 g (t) dt. How does this help? Trapezoid rule is T (h) := h { 1 2 g(0) + g(1) + + g(n 1) g(n)} where g(x) = f (a + h t). 10 / 29
11 Trapezoid rule (cont.) Apply Euler Maclaurin to T (h): T (h) = h h = h n t=0 n 0 b a g(t) h 2 {g(0) + g(n)} g(t) dt + h C 1 { g (t) n 0 1 h f (x) dx + h C 1 {hf (b) hf (a)}. Therefore, b T (h) f (x) dx = O(h 2 ), h 0. a } 11 / 29
12 Trapezoid rule (cont.) Can trapezoid error O(h 2 ) be improved? Our derivation above is not quite precise; the next smallest term in the expansion is O(h 4 ). Romberg recognized that a manipulation of T (h) will cancel the O(h 2 ) term, leaving only the O(h 4 ) term! Romberg s rule is 4T ( h 2 ) T (h) 3 = b a f (x) dx + O(h 4 ), h 0. Can be iterated to improve further; see Sec. 5.2 in G&H. 12 / 29
13 Simpson rule: m = 2 Approximate f (x) on [x i, x i+1 ] by a quadratic function. Similar arguments as above gives the x i s and A ij s. Simpson s rule approximation is b a f (x) dx h n 1 { f (x i ) + 4f 6 i=0 ( xi + x ) } i+1 + f (x i+1 ). 2 More accurate than the trapezoid rule error is O(n 4 ). If n is taken to be even, then the formula simplifies a bit; see Equation (5.20) in G&H and my R code. 13 / 29
14 Remarks This approach works for generic m and the approximation improves as m increases. Can be extended to functions of more than one variable, but details get complicated real fast. In R, integrate does one-dimensional integration. Numerical methods and corresponding software work very well, but care is still needed see Section 5.4 in G&H. 14 / 29
15 Example: Bayesian analysis of binomial Suppose X Bin(n, θ) with n known and θ unknown. Prior for θ is the so-called semicircle distribution with density π(θ) = 8π 1{ 1 4 (θ 1 2 )2} 1/2, θ [0, 1]. The posterior density is then π x (θ) = θ x (1 θ) n x{ 1 4 (θ 1 2 )2} 1/2 1 0 ux (1 u) n x{ 1 4 (θ 1 2 )2}. 1/2 du Calculating the Bayes estimate of θ, the posterior mean, requires a numerical integration. 15 / 29
16 Example: mixture densities Mixture distributions are very common models, flexible. Useful for density estimation and heavy-tailed modeling. General mixture model looks like p(y) = k(y x)f (x) dx, where kernel k(y x) is a pdf (or pmf) in y for each x f (x) is a pdf (or pmf). Easy to check that p(y) is a pdf (or pmf) depending on k. Evaluation of p(y) requires integration for each y. 16 / 29
17 Example 5.1 in G&H Generalized linear mixed model: ind Y ij Pois(λ ij ), λ ij = e γ i e β 0+β 1 j, { i = 1,..., n j = 1,..., J iid where γ 1,..., γ n N(0, σγ). 2 Model parameters are (β 0, β 1, σ 2 γ). Marginal likelihood for θ = (β 0, β 1, σ 2 γ) is n L(θ) = i=1 J Pois(Y ij e γ i e β 0+β 1 j )N(γ i 0, σγ) 2 dγ i. j=1 Goal is to maximize over θ / 29
18 Example 5.1 in G&H (cont.) Taking log we get l(θ) = n i=1 log J Pois(Y ij e γ i e β 0+β 1 j )N(γ i 0, σγ) 2 dγ i. j=1 G&H consider evaluating [ J j=1 } {{ } L i (θ) ] j(y 1j e γ 1 e β 0+β 1 j ) j=1 β 1 L 1 (θ), or [ J ] Pois(Y 1j e γ 1 e β 0+β 1 j ) N(γ 1 0, σγ) 2 dγ 1. Reproduce Tables using R codes. 18 / 29
19 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 19 / 29
20 Very brief summary Gaussian quadrature is an alternative Newton Cotes methods. Useful primarily in problems where integration is with respect to a non-uniform measure, e.g., an expectation. Basic idea is that the measure identifies a sequence of orthogonal polynomials. Approximations of f via these polynomials turns out to be better than Newton Cotes approximations, at least as far as integration is concerned. Book gives minimal details, and we won t get into it here. 20 / 29
21 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 21 / 29
22 Setup The Laplace approximation is a tool that allows us to approximate certain integrals based on optimization! The type of integrals to be considered are J n := b a f (x)e ng(x) dx, n endpoints a < b can be finite or infinite; f and g are sufficiently nice functions; g has a unique maximizer ˆx = arg max g(x) in interior of (a, b). Claim: when n is large, the major contribution to the integral comes from a neighborhood of ˆx, the maximizer of g. 4 4 For a proof of this claim, see Section 4.7 in Lange. 22 / 29
23 Formula Assuming the claim, it suffices to restrict the range of integration to a small neighborhood around ˆx, where g(x) g(ˆx) + ġ(ˆx)(x ˆx) + 1 }{{} 2 g(ˆx)(x ˆx)2. =0 Plug this into integral: J n f (x)e n{g(ˆx)+ 1 2 n g(ˆx)(x ˆx)2} dx nbhd = e ng(ˆx) f (x)e 1 2 [ n g(ˆx)](x ˆx)2 dx. nbhd 23 / 29
24 Formula (cont.) From previous slide: J n e ng(ˆx) nbhd Two observations: since ˆx is a maximizer, g(ˆx) < 0; on small nbhd, f (x) f (ˆx). Therefore, J n f (ˆx)e ng(ˆx) f (x)e 1 2 [ n g(ˆx)](x ˆx)2 dx. nbhd e 1 2 [ n g(ˆx)](x ˆx)2 dx (2π) 1/2 f (ˆx)e ng(ˆx) { n g(ˆx)} 1/2. 24 / 29
25 Example: Stirling s formula Stirling s formula is a useful approximation of factorials. Starts by writing factorial as a gamma function n! = Γ(n + 1) = Make a change of variable x = z/n to get 0 z n e z dz. n! = n n+1 e n g(x) dx, g(x) = log x x. 0 g(x) has maximizer ˆx = 1 in interior of (0, ). For large n, Laplace approximation gives: n! n n+1 (2π) 1/2 e n g(1) { n g(1)} 1/2 = (2π) 1/2 n n+1/2 e n. 25 / 29
26 Example: Bayesian posterior expectations Recall the Bayesian ingredients: L(θ) is the likelihood based on n iid samples π(θ) a prior density. Then a posterior expectation looks like E{h(θ) data} = h(θ)l(θ)π(θ) dθ L(θ)π(θ) dθ. When n is large, applying Laplace to both numerator and denominator gives where ˆθ is the MLE. E{h(θ) data} h(ˆθ), So, previous binomial example that showed posterior mean close to MLE was not a coincidence / 29
27 Remarks Can be shown that the error in Laplace approx is O(n 1 ). 5 The basic principle of the Laplace approximation is that locally the integrals look like Gaussian integrals. This principle extends to integrals over more than one dimension this multivariate version is most useful. There is also a version of the Laplace approximation for the case when the maximizer of g is on the boundary. Then the principle is to make integral looks like exponential or gamma integrals. Details of this version can be found in Sec. 4.6 of Lange. 5 Can be improved with some extra care. 27 / 29
28 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 28 / 29
29 Remarks Quadrature methods are very powerful. In principle, these methods can be developed for integrals of any dimension, but they only work well in 1 2 dimensions. Curse of dimensionality if the dimension is large, then one needs so many grid points to get good approximations. Laplace approximation can work in high-dimensions, but only for certain kinds of integrals fortunately, the stat-related integrals are often of this form. For higher dimensions, Monte Carlo methods are preferred: generally very easy to do approximation accuracy is independent of dimension. We will talk in detail later about Monte Carlo. 29 / 29
Stat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 12: Monday, Apr 16. f(x) dx,
Panel integration Week 12: Monday, Apr 16 Suppose we want to compute the integral b a f(x) dx In estimating a derivative, it makes sense to use a locally accurate approximation to the function around the
More informationStat 451 Lecture Notes Simulating Random Variables
Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationFitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation
Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationScientific Computing: Numerical Integration
Scientific Computing: Numerical Integration Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Fall 2015 Nov 5th, 2015 A. Donev (Courant Institute) Lecture
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationEM Algorithm & High Dimensional Data
EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationCalculus of Variations Summer Term 2014
Calculus of Variations Summer Term 2014 Lecture 12 26. Juni 2014 c Daria Apushkinskaya 2014 () Calculus of variations lecture 12 26. Juni 2014 1 / 25 Purpose of Lesson Purpose of Lesson: To discuss numerical
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationApril 15 Math 2335 sec 001 Spring 2014
April 15 Math 2335 sec 001 Spring 2014 Trapezoid and Simpson s Rules I(f ) = b a f (x) dx Trapezoid Rule: If [a, b] is divided into n equally spaced subintervals of length h = (b a)/n, then I(f ) T n (f
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationIntegration, differentiation, and root finding. Phys 420/580 Lecture 7
Integration, differentiation, and root finding Phys 420/580 Lecture 7 Numerical integration Compute an approximation to the definite integral I = b Find area under the curve in the interval Trapezoid Rule:
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationCS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationPhysics 115/242 Romberg Integration
Physics 5/242 Romberg Integration Peter Young In this handout we will see how, starting from the trapezium rule, we can obtain much more accurate values for the integral by repeatedly eliminating the leading
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More information8.3 Partial Fraction Decomposition
8.3 partial fraction decomposition 575 8.3 Partial Fraction Decomposition Rational functions (polynomials divided by polynomials) and their integrals play important roles in mathematics and applications,
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationNumerical Analysis for Statisticians
Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method
More informationBayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics
More informationBayesian analysis in finite-population models
Bayesian analysis in finite-population models Ryan Martin www.math.uic.edu/~rgmartin February 4, 2014 1 Introduction Bayesian analysis is an increasingly important tool in all areas and applications of
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCOURSE Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method
COURSE 7 3. Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method The presence of derivatives in the remainder difficulties in applicability to practical problems
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationChapter 8.8.1: A factorization theorem
LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationCalculus of Variations Summer Term 2016
Calculus of Variations Summer Term 2016 Lecture 12 Universität des Saarlandes 17. Juni 2016 c Daria Apushkinskaya (UdS) Calculus of variations lecture 12 17. Juni 2016 1 / 32 Purpose of Lesson Purpose
More information8.3 Numerical Quadrature, Continued
8.3 Numerical Quadrature, Continued Ulrich Hoensch Friday, October 31, 008 Newton-Cotes Quadrature General Idea: to approximate the integral I (f ) of a function f : [a, b] R, use equally spaced nodes
More informationCalculus Favorite: Stirling s Approximation, Approximately
Calculus Favorite: Stirling s Approximation, Approximately Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu August 6, 2011 Introduction Stirling
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationChapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program
Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationDifferentiation and Integration
Differentiation and Integration (Lectures on Numerical Analysis for Economists II) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 12, 2018 1 University of Pennsylvania 2 Boston College Motivation
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationLecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )
LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data
More informationNumerical Integration
Numerical Integration Sanzheng Qiao Department of Computing and Software McMaster University February, 2014 Outline 1 Introduction 2 Rectangle Rule 3 Trapezoid Rule 4 Error Estimates 5 Simpson s Rule 6
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationAdaptive Monte Carlo Methods for Numerical Integration
Adaptive Monte Carlo Methods for Numerical Integration Mark Huber 1 and Sarah Schott 2 1 Department of Mathematical Sciences, Claremont McKenna College 2 Department of Mathematics, Duke University 8 March,
More informationReview. Numerical Methods Lecture 22. Prof. Jinbo Bi CSE, UConn
Review Taylor Series and Error Analysis Roots of Equations Linear Algebraic Equations Optimization Numerical Differentiation and Integration Ordinary Differential Equations Partial Differential Equations
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationContents. I Basic Methods 13
Preface xiii 1 Introduction 1 I Basic Methods 13 2 Convergent and Divergent Series 15 2.1 Introduction... 15 2.1.1 Power series: First steps... 15 2.1.2 Further practical aspects... 17 2.2 Differential
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationCS 257: Numerical Methods
CS 57: Numerical Methods Final Exam Study Guide Version 1.00 Created by Charles Feng http://www.fenguin.net CS 57: Numerical Methods Final Exam Study Guide 1 Contents 1 Introductory Matter 3 1.1 Calculus
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationError analysis for efficiency
Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationNumerical integration and differentiation. Unit IV. Numerical Integration and Differentiation. Plan of attack. Numerical integration.
Unit IV Numerical Integration and Differentiation Numerical integration and differentiation quadrature classical formulas for equally spaced nodes improper integrals Gaussian quadrature and orthogonal
More information