A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Size: px
Start display at page:

Download "A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring"


1 Lecture 8 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Applications: Bayesian inference: overview and examples Introduction to data mining in large-scale surveys Reading: Gregory chapters 5, 3, Lecture 10 (Thursday 26 Feb): Adam Brazier (Cornell Center for Advanced Computing) will talk about astronomy-survey workflows and the howto of databases

2 Topics for Lecture 10 next week Sensor data (e.g. telescope data) often requires further filtering and cross-comparisons of the global output. By storing output in a database we can query our data products efficiently and with a wide variety of qualifiers and filters. Databases, particularly relational databases, are used in many fields, including industry, to store information in a form that can be efficiently queried. We will introduce the relational database structure, how they can be queried, how they should be designed and how they can be incorporated into the scientific workflow.

3 Topics Plan Bayesian inference Detection problems Matched filtering and localization Modeling (linear, nonlinear) Cost functions Parameter estimation and errors Optimization methods Hill climbing, annealing, genetic algorithms MCMC variants (Gibbs, Hamiltonian) Generalized spectral analysis Lomb-Scargle Maximum entropy High resolution method Bayesian approaches Wavelets Principal components Cholesky decomposition Large scale surveys in astronomy Time domain Spectral line Images and image cubes Detection & characterization of events, sources, objects Known object types Unknown object types Current algorithms Data mining tools Databases Distributed processing

4 Gibbs sampling fedc_homepage/xplore/ebooks/html/csa/ node28.html tutorial/documents/gibbssampling.html pdf


6 Bayesian Inference Probability = a measure of our state of knowledge before/after acquiring data = frequency of occurrence. Let D = a vector of data points and θ = a vector of parameters for some model. The parameters might be those for a straight line for a more complex model (some have hundreds of parameters or more). The simplest form of Bayes law for model fitting (parameter estimation) is P (θ D) = Before acquiring data P (D θ) = sampling distribution P (θ)p (D θ) P (D) You can view the parameters as fixed and the data variable. After getting data, the unknown parameter values are a function of fixed data. We then rename P (D θ) (θ D) =likelihood function. Note that this form of Bayes theorem follows from conditional probabilities for a pair of propositions: P (AB) =P (A B)P (B) =P (B A)P (A) P (B A)P (A) = P (A B) = P (B) Let A θ and B D. 1

7 We infer the posterior probability (or PDF) of parameter values as P (θ D) = P (θ)(θ D) P (D) = Prior Likelihood function Normalization The normalization is simply the integral of the numerator if we want the posterior PDF to be normalized (which we often do) In the simplest case, we have no prior information so the posterior PDF is simply P (θ D) = (θ D) dθ (θ D) The normaliza-on is some-mes referred to as the prior predic)ve probability or the global likelihood 2

8 A Form for More Detailed Inference (model comparisons, hypothesis testing) Use 3-proposition probabilities written in two ways: P (ABC) =P (A BC)P (BC) =P (A BC)P (B C)P (C) and Equating we get P (ABC) =P (B AC)P (AC) =P (B AC)P (A C)P (C) P (A BC)P (B C)P (C) =P (B AC)P (A C)P (C) which gives Now let P (A BC) = P (A C)P (B AC) P (B C) A θ parameters of a model B D data C I background information (laws of physics, empirical results, wild guesses... (1) = P (θ DI) = P (θ I)P (D θi) P (D I) 3

9 What do we do with posterior probabilities or PDFs? Answer: the usual stuff: we characterize the quantity of interest according to what our goals are. Best value? mean, mode, median How well do we know it? variance, confidence or credible region. The credible region for a parameter is its range of values that cover X% of the PDF (e.g. 68%, 95%). These regions may or may not correspond to 1σ or 3σ regions, depending on how Gaussian-like the PDF is. Is it consistent with being Gaussian distributed? kurtosis, skewness If multiple parameters: Are they correlated or independent? There may be underlying physics or phenomena of interest Maybe only a subset of parameters is of interest. We then marginalize the uninteresting or nuisance parameters: Let θ =(φ, ψ) with ψ = nuisance parameters. We integrate the total posteriod PDF to get the PDF of the parameters of interest: P (φ DI) = dψ P (φ, ψ DI) 4

10 Sequential Learning Start we a prior P (θ I). Acquire first data point or set: Acquire second data point or set: D 1 = posterior 1 prior 1 1 D 2 = posterior 2 prior 2 2 posterior 2 posterior 1 2 posterior 2 prior D n = posterior n prior 1 n j=1 j 5

11 Examples Poisson event rate (photon counting) Gaussian mean and standard deviation

12 Example Data: {k i },i=1,...,n, i.i.d., drawn from Poisson process Poisson PDF: Want: an estimate of the mean of process P k = λk e λ k! FREQUENTIST APPROACH: We need an estimator for the mean; consider the likelihood f(λ) = n P (k i )= i=1 1 n i=1 k i! λ n i=1 k i e nλ. Maximizing, we obtain an estimator for the mean is df dλ =0=f(λ) n + λ 1 k = 1 n n k i. i=1 n k i i=1

13 BAYESIAN APPROACH: Likelihood (as before): P (D MI) = n P (k i )= i=1 1 n ı=1 k i! λ n i=1 k i e nλ. Prior: Assume Prior Predictive: P (D I) P (M I) =P (λ I) P (λ I)λ λ U(λ) dλ U(λ)P (D MI) = n n x n ı=1 k i! Γ(n x). Combining all the above, we find P (λ {k i }I) = nn x Γ(n x) λn x e nλ U(λ) Note that rather than getting a point estimate for the mean, we get a PDF for its value. For hypothesis testing, this is much more useful than a point estimate.

14 Issues Bayesian inference can look deceptively simple (especially for the examples given) Issues that arise: The underlying form for the likelihood function may not be known so an analytical form is not available The posterior PDF may not be easily integrated, especially if the dimensionality is high and its shape is not simple. Finding parameter values does not need normalization necessarily but comparison of models does Vast literature exists on how to sample and integrate the posterior PDF (e.g. MCMC and its variants)

15 Question How do we calculate the likelihood function if we do not know the underlying PDF for the data errors and cannot argue from the CLT that it is Gaussian?

16 Bayesian Priors: Art or Science? The prior PDF f(θ I) for a parameter vector θ is used to impose a priori information about parameter values, when known. If prior information is constraining (i.e. the prior PDF has a strong influence on the shape of the posterior PDF), it is said to be informative. When explicit constraints are not known, one often uses a non-informative prior. For example, suppose we have a parameter which is largely unconstrained and for which we want to calculate the posterior PDF while allowing a wide range of possible values for the parameter. We might, then, use a flat prior in the statistical inference. But, is a flat prior really the best one for expressing ignorance of the actual value for a parameter? The answer is, not necessarily. 1

17 To illustrate the issues, we will consider two kinds of parameters: a location parameter and a scale parameter. For example, consider data that we assume are described by a N(µ, σ 2 ) distribution whose parameters µ (the mean) and σ 2 (the variance) are not known and are not constrained a priori. What should we use as priors for these parameters? We can write the likelihood function as L = f(d θi) = di µ f σ θi, (1) i where {d i,i =1, N} are the data and f(x) =e x2 /2. Note that µ shifts the PDF while σ scales the PDF. 2

18 Choosing a prior for µ: We use translation invariance. Suppose we make a change of variable so that d i = d i + c. (2) Then d i µ d i (µ + c) = d i µ. (3) σ σ σ Since c is arbitrary, if we don t know µ and hence do not know µ + c, it is plausible that we should search uniformly in µ, i.e. the prior for µ should be flat. We can see this also by the following. Suppose the prior for µ is f µ (µ). Then the prior for µ = µ + c is f µ (µ )= f µ(µ c) dµ = f µ (µ c). (4) /dµ We would like the inference to be independent of any such change of variable, so the form of the prior for µ should be translation invariant. In order for the left-hand and right-hand sides of Eq.?? to be equal, the form of the prior needs to be independent of its argument, i.e. flat. 3

19 Thus an appropriate prior would be of the form 1 µ 2 µ,µ 1 1 µ µ 2 f µ (µ) = 0, otherwise, where µ 1,2 are chosen to encompass all plausible values of µ. Note that in calculating the posterior PDF, the 1/(µ 2 µ 1 ) factor drops out if the range µ 2 µ 1 is much wider than the likelihood function L(θ). An example of a noninformative prior is shown in Figure??. (5) 4

20 Figure 1: A noninformative prior for the mean, µ. In this case, a flat prior PDF, f µ (µ), is shown along with a likelihood function, L(µ), that is much narrower than the prior. The peak of L is the maximum likelihood estimate for µ and is the arithmetic mean of the data: ˆµ = N 1 i d i. For a case like this, the actual interval for the prior, [µ 1,µ 2 ] will drop out of the posterior PDF because it appears in both the numerator and denominator. 5

21 Choosing a prior for σ: Here we use scale invariance. Consider a change of variable Now d i µ σ d i /c µ σ d i = cd i. (6) = d i cµ cσ = d i µ cσ If the prior for σ is f σ (σ), then the prior for σ is = d i µ σ. (7) f σ (σ )= f σ(σ /c) dσ /dσ = 1 c f σ(σ /c). (8) We would like f σ and f σ to have the same shape. Consider a power-law form, f σ σ n. Then Eq.?? implies that σ n 1 σ n =, (9) c c which can be satisfied only for n =1. 6

22 Thus the scale-invariant prior for σ is σ 1, σ 1 σ σ 2 f σ (σ) 0, otherwise, where σ 1,2 are chosen to encompass all plausible values of σ. (10) 7

23 Reality check: we can show that the scale-invariant, non-informative prior for σ is reasonable by considering another change of variable. Suppose we want to use the reciprocal of σ as our parameter rather than σ: The prior for s is s = σ 1. (11) f s (s) = f σ(s 1 ) ds/dσ = dσ ds f σ(s 1 ) = s 2 f σ (s 1 ) = s 2 1 s 1 = s 1. (12) Thus, the prior has the same form for σ and its reciprocal. This is desirable because it would not be reasonable for the parameter inference to depend on which variable we used. Thus we can use either σ or s and then derive one from the other. 8

24 Some Stochastic Processes of Interest

25 Stochastic Processes II Useful Processes: A. Gaussian noise: n(t) is a gaussian random process if 1. f n (x) =1D Gaussian PDF 2. f n,n(t+τ) (x, y) =2D joint Gaussian PDF 3. All higher order PDFs, moments can be written in terms of the first and second moments. Note that Gaussian noise can be either stationary or nonstationary. For example, the mean X(t) and variance σx 2 (t) can both be time dependent. 1

26 B. White noise has a particular spectral shape (flat) but the 1D PDF is unspecified: The autocorrelation function is S n (f) = constant R(τ) = σ 2 n δ(τ) continuous case R(τ) = σ 2 n δ τ0 discrete case Thus, white noise need not be Gaussian noise and vice versa. However, white, Gaussian noise is often used or assumed. Example of white, non-gaussian noise constructed from white, gaussian noise: Let X k = white, Gaussian noise: X k X k = σ 2 x δ kk. Let Y k =sgn(x k )=±1 Then Y k is white noise but it is not Gaussian. The PDF of Y k is f Y (Y )= 1 2 [δ(y +1)+δ(Y 1)]. It may be shown that the autocorrelation function of Y is a function of the ACF of X: This relation (van Vleck relation) is the basis for autocorrelation spectrometers. 2

27 C. Shot Noise is associated with Poisson events, each having a shape h(t): x(t) = i h(t t i ). where events occur at a rate λ. If h(t) decays to zero as t ±, then x(t) has stationary statistics. If h(t) does not decay, x(t) has nonstationary statistics. C1. White noise: As h(t) δ(t), x(t) tends to white noise. C2. Bandlimited white noise: If h(t) has a power spectrum H(f) 2 that is low-pass in form (it goes to zero above some cutoff frequency f c, then x(t) will have a flat spectrum for f f c. Similar for bandpass noise, where the centroid frequency of the non-zero noise is at some frequency f = 0. 3

28 Figure 1: A single realization of Gaussian white noise and random walks derived from it. Since individual steps occur frequently, the random walks are termed dense. 4

29 Figure 2: A single realization of non-gaussian white noise (shot noise) and sparse random walks derived from it. 5

30 D. Autoregressive (AR) process: depends on past values + white noise: x t = n t M α j x t j, j=1 where n t = discrete white noise. M = order of AR model α = coefficients of AR model. AR processes play a role in maximum entropy spectral estimators. By taking the Fourier transform of the expression for x t we can solve for X f = 1+ j Ñ f α j e 2πijf 6

31 E. Moving average (MA) process: is a moving average of white noise: x t = N β j n t j. j=0 F. ARMA process: AR and MA combined. G. ARIMA process: An integrated ARMA process. 7

32 H. Markov chain: one whose present state depends probabilistically on some number p of previous values. A first-order Markov process has p =1, etc. For a chain with n states, e.g. S = {s 1,s 2,,s n } the probability of being in a given state at discrete time t is given by the state probability vector is the row vector P t =(p 1,p 2,,p n ) and the probability vector for time t +1is P t+1 = P t Q where Q is the transition matrix whose elements are the probabilities q ij of transitioning from the i th state to the j th state. The sum of the elements along a row of Q is unity because the chain has to be in some state at any time. A two-state chain, for example, has a transition matrix q 11 1 q 11 Q =. 1 q 22 q 22 8

33 I. Random Walks: Any integral of noise with stationary statistics leads to a process having nonstationary statistics, with random-walk-like behavior. E.g. where n(t) is white noise. x(t) = t 0 dt n(t ), J. Higher-order random walks: If white noise is integrated M times, the resultant process is an M th -order random walk. 9

34 Got to here 2015

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 9 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Applications: Comparison of Frequentist and Bayesian inference

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 (continued) Lecture 8 Key points in probability CLT CLT examples Prior vs Likelihood Box & Tiao

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Signal Modeling, Statistical Inference and Data Mining in Astrophysics ASTRONOMY 6523 Spring 2013 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Course Approach The philosophy of the course reflects that of the instructor, who takes a dualistic view

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

An example to illustrate frequentist and Bayesian approches

An example to illustrate frequentist and Bayesian approches Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Lecture 4 See web page later tomorrow Searching for Monochromatic Signals

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

More on nuisance parameters

More on nuisance parameters BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2.

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2. A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 2 Probability basics Fourier transform basics Typical problems Overall mantra: Discovery and cri@cal thinking with data + The

More information

Introduction to Bayesian Data Analysis

Introduction to Bayesian Data Analysis Introduction to Bayesian Data Analysis Phil Gregory University of British Columbia March 2010 Hardback (ISBN-10: 052184150X ISBN-13: 9780521841504) Resources and solutions This title has free Mathematica

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Stochastic Processes. A stochastic process is a function of two variables:

Stochastic Processes. A stochastic process is a function of two variables: Stochastic Processes Stochastic: from Greek stochastikos, proceeding by guesswork, literally, skillful in aiming. A stochastic process is simply a collection of random variables labelled by some parameter:

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Stochastic Processes

Stochastic Processes Stochastic Processes Stochastic Process Non Formal Definition: Non formal: A stochastic process (random process) is the opposite of a deterministic process such as one defined by a differential equation.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Lecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process

Lecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process Lecture Notes 7 Stationary Random Processes Strict-Sense and Wide-Sense Stationarity Autocorrelation Function of a Stationary Process Power Spectral Density Continuity and Integration of Random Processes

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Lecture 3: Statistical sampling uncertainty

Lecture 3: Statistical sampling uncertainty Lecture 3: Statistical sampling uncertainty c Christopher S. Bretherton Winter 2015 3.1 Central limit theorem (CLT) Let X 1,..., X N be a sequence of N independent identically-distributed (IID) random

More information

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables. Lecture 5 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Probability, CLT, CLT counterexamples, Bayes The PDF file of

More information

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset ASTR509-14 Detection William Sealey Gosset 1876-1937 Best known for his Student s t-test, devised for handling small samples for quality control in brewing. To many in the statistical world "Student" was

More information

Chapter 3 - Temporal processes

Chapter 3 - Temporal processes STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect

More information

Bayesian methods in the search for gravitational waves

Bayesian methods in the search for gravitational waves Bayesian methods in the search for gravitational waves Reinhard Prix Albert-Einstein-Institut Hannover Bayes forum Garching, Oct 7 2016 Statistics as applied Probability Theory Probability Theory: extends

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Applied Probability and Stochastic Processes

Applied Probability and Stochastic Processes Applied Probability and Stochastic Processes In Engineering and Physical Sciences MICHEL K. OCHI University of Florida A Wiley-Interscience Publication JOHN WILEY & SONS New York - Chichester Brisbane

More information


SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES This document is meant as a complement to Chapter 4 in the textbook, the aim being to get a basic understanding of spectral densities through

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for


More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Definition of stochastic process (random

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

SC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Course content (will be adapted to the background knowledge of the class):

Course content (will be adapted to the background knowledge of the class): Biomedical Signal Processing and Signal Modeling Lucas C Parra, parra@ccny.cuny.edu Departamento the Fisica, UBA Synopsis This course introduces two fundamental concepts of signal processing: linear systems

More information

Fundamentals of Applied Probability and Random Processes

Fundamentals of Applied Probability and Random Processes Fundamentals of Applied Probability and Random Processes,nd 2 na Edition Oliver C. Ibe University of Massachusetts, LoweLL, Massachusetts ip^ W >!^ AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch

More information

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Tessa L. Childers-Day February 8, 2013 1 Introduction Today s section will deal with topics such as: the mean function, the auto- and cross-covariance

More information

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009 Statistics for Particle Physics Kyle Cranmer New York University 91 Remaining Lectures Lecture 3:! Compound hypotheses, nuisance parameters, & similar tests! The Neyman-Construction (illustrated)! Inverted

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading: Chapter 10 = linear LSQ with Gaussian errors Chapter 11 = Nonlinear fitting Chapter 12 = Markov Chain Monte

More information

Statistics for Data Analysis a toolkit for the (astro)physicist

Statistics for Data Analysis a toolkit for the (astro)physicist Statistics for Data Analysis a toolkit for the (astro)physicist Likelihood in Astrophysics Denis Bastieri Padova, February 21 st 2018 RECAP The main product of statistical inference is the pdf of the model

More information

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition Preface Preface to the First Edition xi xiii 1 Basic Probability Theory 1 1.1 Introduction 1 1.2 Sample Spaces and Events 3 1.3 The Axioms of Probability 7 1.4 Finite Sample Spaces and Combinatorics 15

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 of Gregory (Frequentist Statistical Inference) Lecture 7 Examples of FT applications Simulating

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

1 Linear Difference Equations

1 Linear Difference Equations ARMA Handout Jialin Yu 1 Linear Difference Equations First order systems Let {ε t } t=1 denote an input sequence and {y t} t=1 sequence generated by denote an output y t = φy t 1 + ε t t = 1, 2,... with

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 Lecture 26 Localization/Matched Filtering (continued) Prewhitening Lectures next week: Reading Bases, principal

More information