Stat 451 Lecture Notes Numerical Integration

Size: px
Start display at page:

Download "Stat 451 Lecture Notes Numerical Integration"

Transcription

1 Stat 451 Lecture Notes Numerical Integration Ryan Martin UIC 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, / 29

2 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 2 / 29

3 Motivation While many statistics problems rely on optimization, there are also some that require numerical integration. Bayesian statistics is almost exclusively integration. data admits a likelihood function L(θ); θ unknown, so assign it a weight function π(θ); combine prior and data using Bayes s formula π(θ x) = L(θ)π(θ) Θ L(θ )π(θ ) dθ. Need to compute probabilities and expectations integrals! Some non-bayesian problems may involve integration, e.g., random- or mixed-effects models. Other approaches besides Bayesian and frequentist... 3 / 29

4 Intuition There are a number of classical numerical integration techniques, simple and powerful. Think back to calculus class where integral was defined: approximate function by a constant on small intervals; compute area of rectangles and sum them up; integral defined as the limit of this sum as mesh 0. Numerical integration, or quadrature, is based on this definition and refinements thereof. Basic principle: 3 approximate the function on a small interval by a nice one that you know how to integrate. Works well for one- or two-dim integrals; for higher-dim integrals, other tools are needed. 3 This was essentially the same principle that motivated the various methods we discussed for optimization! 4 / 29

5 Notation Suppose that f (x) is a function that we d like to integrate over an interval [a, b]. Take n relatively large and set h = (b a)/n. Let x i = a + ih, i = 0,..., n 1. Key point: if f (x) is nice, then it can be approximated by a simple function on the small interval [x i, x i+1 ]. A general strategy is to approximate the integral by b a n 1 f (x) dx = i=0 xi+1 x i f (x) for appropriately chosen A ij s and m. n 1 i=0 j=0 m A ij f (x ij ), 5 / 29

6 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 6 / 29

7 Polynomial approximation Consider the following sequence of polynomials: Then p ij (x) = k j p i (x) = x x ik x ij x ik, j = 0,..., m. m p ij (x)f (x ij ) j=0 is an m th degree polynomial that interpolates f (x) at the nodes x i0,..., x im. Furthermore, xi+1 x i f (x) dx xi+1 x i p(x) dx = j=0 xi+1 m p ij (x) dx f (x ij ). x } i {{ } A ij 7 / 29

8 Riemann rule: m = 0 Approximate f (x) on [x i, x i+1 ] by a constant. Here x i0 = x i and p i0 (x) 1, so b a n 1 n 1 f (x) dx f (x i )(x i+1 x i ) = h f (x i ). i=0 Features of Riemann s rule: Very easy to program only need f (x 0 ),..., f (x n ). Can be slow to converge, i.e., lots of x i s may be needed to get a good approximation. i=0 8 / 29

9 Trapezoid rule: m = 1 Approximate f (x) on [x i, x i+1 ] by a linear function. In this case: x i0 = x i and x i1 = x i+1. A i0 = A i1 = (x i+1 x i )/2 = h/2. Therefore, b a f (x) dx h 2 n 1 { f (xi ) + f (x i+1 ) }. i=0 Still only requires function evaluations at the x i s. More accurate then Riemann because the linear approximation is more flexible than constant. Can derive bounds on the approximation error... 9 / 29

10 Trapezoid rule (cont.) A general tool which we can use to study the precision of the trapezoid rule is the Euler Maclaurin formula. Suppose that g(x) is twice differentiable; then n g(t) t=0 n 0 g(t) dt {g(0) + g(n)} + C 1 g (t) n 0, where LHS RHS C 2 n 0 g (t) dt. How does this help? Trapezoid rule is T (h) := h { 1 2 g(0) + g(1) + + g(n 1) g(n)} where g(x) = f (a + h t). 10 / 29

11 Trapezoid rule (cont.) Apply Euler Maclaurin to T (h): T (h) = h h = h n t=0 n 0 b a g(t) h 2 {g(0) + g(n)} g(t) dt + h C 1 { g (t) n 0 1 h f (x) dx + h C 1 {hf (b) hf (a)}. Therefore, b T (h) f (x) dx = O(h 2 ), h 0. a } 11 / 29

12 Trapezoid rule (cont.) Can trapezoid error O(h 2 ) be improved? Our derivation above is not quite precise; the next smallest term in the expansion is O(h 4 ). Romberg recognized that a manipulation of T (h) will cancel the O(h 2 ) term, leaving only the O(h 4 ) term! Romberg s rule is 4T ( h 2 ) T (h) 3 = b a f (x) dx + O(h 4 ), h 0. Can be iterated to improve further; see Sec. 5.2 in G&H. 12 / 29

13 Simpson rule: m = 2 Approximate f (x) on [x i, x i+1 ] by a quadratic function. Similar arguments as above gives the x i s and A ij s. Simpson s rule approximation is b a f (x) dx h n 1 { f (x i ) + 4f 6 i=0 ( xi + x ) } i+1 + f (x i+1 ). 2 More accurate than the trapezoid rule error is O(n 4 ). If n is taken to be even, then the formula simplifies a bit; see Equation (5.20) in G&H and my R code. 13 / 29

14 Remarks This approach works for generic m and the approximation improves as m increases. Can be extended to functions of more than one variable, but details get complicated real fast. In R, integrate does one-dimensional integration. Numerical methods and corresponding software work very well, but care is still needed see Section 5.4 in G&H. 14 / 29

15 Example: Bayesian analysis of binomial Suppose X Bin(n, θ) with n known and θ unknown. Prior for θ is the so-called semicircle distribution with density π(θ) = 8π 1{ 1 4 (θ 1 2 )2} 1/2, θ [0, 1]. The posterior density is then π x (θ) = θ x (1 θ) n x{ 1 4 (θ 1 2 )2} 1/2 1 0 ux (1 u) n x{ 1 4 (θ 1 2 )2}. 1/2 du Calculating the Bayes estimate of θ, the posterior mean, requires a numerical integration. 15 / 29

16 Example: mixture densities Mixture distributions are very common models, flexible. Useful for density estimation and heavy-tailed modeling. General mixture model looks like p(y) = k(y x)f (x) dx, where kernel k(y x) is a pdf (or pmf) in y for each x f (x) is a pdf (or pmf). Easy to check that p(y) is a pdf (or pmf) depending on k. Evaluation of p(y) requires integration for each y. 16 / 29

17 Example 5.1 in G&H Generalized linear mixed model: ind Y ij Pois(λ ij ), λ ij = e γ i e β 0+β 1 j, { i = 1,..., n j = 1,..., J iid where γ 1,..., γ n N(0, σγ). 2 Model parameters are (β 0, β 1, σ 2 γ). Marginal likelihood for θ = (β 0, β 1, σ 2 γ) is n L(θ) = i=1 J Pois(Y ij e γ i e β 0+β 1 j )N(γ i 0, σγ) 2 dγ i. j=1 Goal is to maximize over θ / 29

18 Example 5.1 in G&H (cont.) Taking log we get l(θ) = n i=1 log J Pois(Y ij e γ i e β 0+β 1 j )N(γ i 0, σγ) 2 dγ i. j=1 G&H consider evaluating [ J j=1 } {{ } L i (θ) ] j(y 1j e γ 1 e β 0+β 1 j ) j=1 β 1 L 1 (θ), or [ J ] Pois(Y 1j e γ 1 e β 0+β 1 j ) N(γ 1 0, σγ) 2 dγ 1. Reproduce Tables using R codes. 18 / 29

19 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 19 / 29

20 Very brief summary Gaussian quadrature is an alternative Newton Cotes methods. Useful primarily in problems where integration is with respect to a non-uniform measure, e.g., an expectation. Basic idea is that the measure identifies a sequence of orthogonal polynomials. Approximations of f via these polynomials turns out to be better than Newton Cotes approximations, at least as far as integration is concerned. Book gives minimal details, and we won t get into it here. 20 / 29

21 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 21 / 29

22 Setup The Laplace approximation is a tool that allows us to approximate certain integrals based on optimization! The type of integrals to be considered are J n := b a f (x)e ng(x) dx, n endpoints a < b can be finite or infinite; f and g are sufficiently nice functions; g has a unique maximizer ˆx = arg max g(x) in interior of (a, b). Claim: when n is large, the major contribution to the integral comes from a neighborhood of ˆx, the maximizer of g. 4 4 For a proof of this claim, see Section 4.7 in Lange. 22 / 29

23 Formula Assuming the claim, it suffices to restrict the range of integration to a small neighborhood around ˆx, where g(x) g(ˆx) + ġ(ˆx)(x ˆx) + 1 }{{} 2 g(ˆx)(x ˆx)2. =0 Plug this into integral: J n f (x)e n{g(ˆx)+ 1 2 n g(ˆx)(x ˆx)2} dx nbhd = e ng(ˆx) f (x)e 1 2 [ n g(ˆx)](x ˆx)2 dx. nbhd 23 / 29

24 Formula (cont.) From previous slide: J n e ng(ˆx) nbhd Two observations: since ˆx is a maximizer, g(ˆx) < 0; on small nbhd, f (x) f (ˆx). Therefore, J n f (ˆx)e ng(ˆx) f (x)e 1 2 [ n g(ˆx)](x ˆx)2 dx. nbhd e 1 2 [ n g(ˆx)](x ˆx)2 dx (2π) 1/2 f (ˆx)e ng(ˆx) { n g(ˆx)} 1/2. 24 / 29

25 Example: Stirling s formula Stirling s formula is a useful approximation of factorials. Starts by writing factorial as a gamma function n! = Γ(n + 1) = Make a change of variable x = z/n to get 0 z n e z dz. n! = n n+1 e n g(x) dx, g(x) = log x x. 0 g(x) has maximizer ˆx = 1 in interior of (0, ). For large n, Laplace approximation gives: n! n n+1 (2π) 1/2 e n g(1) { n g(1)} 1/2 = (2π) 1/2 n n+1/2 e n. 25 / 29

26 Example: Bayesian posterior expectations Recall the Bayesian ingredients: L(θ) is the likelihood based on n iid samples π(θ) a prior density. Then a posterior expectation looks like E{h(θ) data} = h(θ)l(θ)π(θ) dθ L(θ)π(θ) dθ. When n is large, applying Laplace to both numerator and denominator gives where ˆθ is the MLE. E{h(θ) data} h(ˆθ), So, previous binomial example that showed posterior mean close to MLE was not a coincidence / 29

27 Remarks Can be shown that the error in Laplace approx is O(n 1 ). 5 The basic principle of the Laplace approximation is that locally the integrals look like Gaussian integrals. This principle extends to integrals over more than one dimension this multivariate version is most useful. There is also a version of the Laplace approximation for the case when the maximizer of g is on the boundary. Then the principle is to make integral looks like exponential or gamma integrals. Details of this version can be found in Sec. 4.6 of Lange. 5 Can be improved with some extra care. 27 / 29

28 Outline 1 Introduction 2 Newton Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 28 / 29

29 Remarks Quadrature methods are very powerful. In principle, these methods can be developed for integrals of any dimension, but they only work well in 1 2 dimensions. Curse of dimensionality if the dimension is large, then one needs so many grid points to get good approximations. Laplace approximation can work in high-dimensions, but only for certain kinds of integrals fortunately, the stat-related integrals are often of this form. For higher dimensions, Monte Carlo methods are preferred: generally very easy to do approximation accuracy is independent of dimension. We will talk in detail later about Monte Carlo. 29 / 29

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 12: Monday, Apr 16. f(x) dx,

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 12: Monday, Apr 16. f(x) dx, Panel integration Week 12: Monday, Apr 16 Suppose we want to compute the integral b a f(x) dx In estimating a derivative, it makes sense to use a locally accurate approximation to the function around the

More information

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Scientific Computing: Numerical Integration

Scientific Computing: Numerical Integration Scientific Computing: Numerical Integration Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Fall 2015 Nov 5th, 2015 A. Donev (Courant Institute) Lecture

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

EM Algorithm & High Dimensional Data

EM Algorithm & High Dimensional Data EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Calculus of Variations Summer Term 2014

Calculus of Variations Summer Term 2014 Calculus of Variations Summer Term 2014 Lecture 12 26. Juni 2014 c Daria Apushkinskaya 2014 () Calculus of variations lecture 12 26. Juni 2014 1 / 25 Purpose of Lesson Purpose of Lesson: To discuss numerical

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

April 15 Math 2335 sec 001 Spring 2014

April 15 Math 2335 sec 001 Spring 2014 April 15 Math 2335 sec 001 Spring 2014 Trapezoid and Simpson s Rules I(f ) = b a f (x) dx Trapezoid Rule: If [a, b] is divided into n equally spaced subintervals of length h = (b a)/n, then I(f ) T n (f

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Integration, differentiation, and root finding. Phys 420/580 Lecture 7

Integration, differentiation, and root finding. Phys 420/580 Lecture 7 Integration, differentiation, and root finding Phys 420/580 Lecture 7 Numerical integration Compute an approximation to the definite integral I = b Find area under the curve in the interval Trapezoid Rule:

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Physics 115/242 Romberg Integration

Physics 115/242 Romberg Integration Physics 5/242 Romberg Integration Peter Young In this handout we will see how, starting from the trapezium rule, we can obtain much more accurate values for the integral by repeatedly eliminating the leading

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

8.3 Partial Fraction Decomposition

8.3 Partial Fraction Decomposition 8.3 partial fraction decomposition 575 8.3 Partial Fraction Decomposition Rational functions (polynomials divided by polynomials) and their integrals play important roles in mathematics and applications,

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Numerical Analysis for Statisticians

Numerical Analysis for Statisticians Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

More information

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics

More information

Bayesian analysis in finite-population models

Bayesian analysis in finite-population models Bayesian analysis in finite-population models Ryan Martin www.math.uic.edu/~rgmartin February 4, 2014 1 Introduction Bayesian analysis is an increasingly important tool in all areas and applications of

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

COURSE Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method

COURSE Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method COURSE 7 3. Numerical integration of functions (continuation) 3.3. The Romberg s iterative generation method The presence of derivatives in the remainder difficulties in applicability to practical problems

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Chapter 8.8.1: A factorization theorem

Chapter 8.8.1: A factorization theorem LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Calculus of Variations Summer Term 2016

Calculus of Variations Summer Term 2016 Calculus of Variations Summer Term 2016 Lecture 12 Universität des Saarlandes 17. Juni 2016 c Daria Apushkinskaya (UdS) Calculus of variations lecture 12 17. Juni 2016 1 / 32 Purpose of Lesson Purpose

More information

8.3 Numerical Quadrature, Continued

8.3 Numerical Quadrature, Continued 8.3 Numerical Quadrature, Continued Ulrich Hoensch Friday, October 31, 008 Newton-Cotes Quadrature General Idea: to approximate the integral I (f ) of a function f : [a, b] R, use equally spaced nodes

More information

Calculus Favorite: Stirling s Approximation, Approximately

Calculus Favorite: Stirling s Approximation, Approximately Calculus Favorite: Stirling s Approximation, Approximately Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu August 6, 2011 Introduction Stirling

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Differentiation and Integration

Differentiation and Integration Differentiation and Integration (Lectures on Numerical Analysis for Economists II) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 12, 2018 1 University of Pennsylvania 2 Boston College Motivation

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

Numerical Integration

Numerical Integration Numerical Integration Sanzheng Qiao Department of Computing and Software McMaster University February, 2014 Outline 1 Introduction 2 Rectangle Rule 3 Trapezoid Rule 4 Error Estimates 5 Simpson s Rule 6

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

Adaptive Monte Carlo Methods for Numerical Integration

Adaptive Monte Carlo Methods for Numerical Integration Adaptive Monte Carlo Methods for Numerical Integration Mark Huber 1 and Sarah Schott 2 1 Department of Mathematical Sciences, Claremont McKenna College 2 Department of Mathematics, Duke University 8 March,

More information

Review. Numerical Methods Lecture 22. Prof. Jinbo Bi CSE, UConn

Review. Numerical Methods Lecture 22. Prof. Jinbo Bi CSE, UConn Review Taylor Series and Error Analysis Roots of Equations Linear Algebraic Equations Optimization Numerical Differentiation and Integration Ordinary Differential Equations Partial Differential Equations

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Contents. I Basic Methods 13

Contents. I Basic Methods 13 Preface xiii 1 Introduction 1 I Basic Methods 13 2 Convergent and Divergent Series 15 2.1 Introduction... 15 2.1.1 Power series: First steps... 15 2.1.2 Further practical aspects... 17 2.2 Differential

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

CS 257: Numerical Methods

CS 257: Numerical Methods CS 57: Numerical Methods Final Exam Study Guide Version 1.00 Created by Charles Feng http://www.fenguin.net CS 57: Numerical Methods Final Exam Study Guide 1 Contents 1 Introductory Matter 3 1.1 Calculus

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Error analysis for efficiency

Error analysis for efficiency Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Numerical integration and differentiation. Unit IV. Numerical Integration and Differentiation. Plan of attack. Numerical integration.

Numerical integration and differentiation. Unit IV. Numerical Integration and Differentiation. Plan of attack. Numerical integration. Unit IV Numerical Integration and Differentiation Numerical integration and differentiation quadrature classical formulas for equally spaced nodes improper integrals Gaussian quadrature and orthogonal

More information