A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
|
|
- Reynold Hudson
- 5 years ago
- Views:
Transcription
1 Lecture 8 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Applications: Bayesian inference: overview and examples Introduction to data mining in large-scale surveys Reading: Gregory chapters 5, 3, Lecture 10 (Thursday 26 Feb): Adam Brazier (Cornell Center for Advanced Computing) will talk about astronomy-survey workflows and the howto of databases
2 Topics for Lecture 10 next week Sensor data (e.g. telescope data) often requires further filtering and cross-comparisons of the global output. By storing output in a database we can query our data products efficiently and with a wide variety of qualifiers and filters. Databases, particularly relational databases, are used in many fields, including industry, to store information in a form that can be efficiently queried. We will introduce the relational database structure, how they can be queried, how they should be designed and how they can be incorporated into the scientific workflow.
3 Topics Plan Bayesian inference Detection problems Matched filtering and localization Modeling (linear, nonlinear) Cost functions Parameter estimation and errors Optimization methods Hill climbing, annealing, genetic algorithms MCMC variants (Gibbs, Hamiltonian) Generalized spectral analysis Lomb-Scargle Maximum entropy High resolution method Bayesian approaches Wavelets Principal components Cholesky decomposition Large scale surveys in astronomy Time domain Spectral line Images and image cubes Detection & characterization of events, sources, objects Known object types Unknown object types Current algorithms Data mining tools Databases Distributed processing
4 Gibbs sampling fedc_homepage/xplore/ebooks/html/csa/ node28.html tutorial/documents/gibbssampling.html pdf
5
6 Bayesian Inference Probability = a measure of our state of knowledge before/after acquiring data = frequency of occurrence. Let D = a vector of data points and θ = a vector of parameters for some model. The parameters might be those for a straight line for a more complex model (some have hundreds of parameters or more). The simplest form of Bayes law for model fitting (parameter estimation) is P (θ D) = Before acquiring data P (D θ) = sampling distribution P (θ)p (D θ) P (D) You can view the parameters as fixed and the data variable. After getting data, the unknown parameter values are a function of fixed data. We then rename P (D θ) (θ D) =likelihood function. Note that this form of Bayes theorem follows from conditional probabilities for a pair of propositions: P (AB) =P (A B)P (B) =P (B A)P (A) P (B A)P (A) = P (A B) = P (B) Let A θ and B D. 1
7 We infer the posterior probability (or PDF) of parameter values as P (θ D) = P (θ)(θ D) P (D) = Prior Likelihood function Normalization The normalization is simply the integral of the numerator if we want the posterior PDF to be normalized (which we often do) In the simplest case, we have no prior information so the posterior PDF is simply P (θ D) = (θ D) dθ (θ D) The normaliza-on is some-mes referred to as the prior predic)ve probability or the global likelihood 2
8 A Form for More Detailed Inference (model comparisons, hypothesis testing) Use 3-proposition probabilities written in two ways: P (ABC) =P (A BC)P (BC) =P (A BC)P (B C)P (C) and Equating we get P (ABC) =P (B AC)P (AC) =P (B AC)P (A C)P (C) P (A BC)P (B C)P (C) =P (B AC)P (A C)P (C) which gives Now let P (A BC) = P (A C)P (B AC) P (B C) A θ parameters of a model B D data C I background information (laws of physics, empirical results, wild guesses... (1) = P (θ DI) = P (θ I)P (D θi) P (D I) 3
9 What do we do with posterior probabilities or PDFs? Answer: the usual stuff: we characterize the quantity of interest according to what our goals are. Best value? mean, mode, median How well do we know it? variance, confidence or credible region. The credible region for a parameter is its range of values that cover X% of the PDF (e.g. 68%, 95%). These regions may or may not correspond to 1σ or 3σ regions, depending on how Gaussian-like the PDF is. Is it consistent with being Gaussian distributed? kurtosis, skewness If multiple parameters: Are they correlated or independent? There may be underlying physics or phenomena of interest Maybe only a subset of parameters is of interest. We then marginalize the uninteresting or nuisance parameters: Let θ =(φ, ψ) with ψ = nuisance parameters. We integrate the total posteriod PDF to get the PDF of the parameters of interest: P (φ DI) = dψ P (φ, ψ DI) 4
10 Sequential Learning Start we a prior P (θ I). Acquire first data point or set: Acquire second data point or set: D 1 = posterior 1 prior 1 1 D 2 = posterior 2 prior 2 2 posterior 2 posterior 1 2 posterior 2 prior D n = posterior n prior 1 n j=1 j 5
11 Examples Poisson event rate (photon counting) Gaussian mean and standard deviation
12 Example Data: {k i },i=1,...,n, i.i.d., drawn from Poisson process Poisson PDF: Want: an estimate of the mean of process P k = λk e λ k! FREQUENTIST APPROACH: We need an estimator for the mean; consider the likelihood f(λ) = n P (k i )= i=1 1 n i=1 k i! λ n i=1 k i e nλ. Maximizing, we obtain an estimator for the mean is df dλ =0=f(λ) n + λ 1 k = 1 n n k i. i=1 n k i i=1
13 BAYESIAN APPROACH: Likelihood (as before): P (D MI) = n P (k i )= i=1 1 n ı=1 k i! λ n i=1 k i e nλ. Prior: Assume Prior Predictive: P (D I) P (M I) =P (λ I) P (λ I)λ λ U(λ) dλ U(λ)P (D MI) = n n x n ı=1 k i! Γ(n x). Combining all the above, we find P (λ {k i }I) = nn x Γ(n x) λn x e nλ U(λ) Note that rather than getting a point estimate for the mean, we get a PDF for its value. For hypothesis testing, this is much more useful than a point estimate.
14 Issues Bayesian inference can look deceptively simple (especially for the examples given) Issues that arise: The underlying form for the likelihood function may not be known so an analytical form is not available The posterior PDF may not be easily integrated, especially if the dimensionality is high and its shape is not simple. Finding parameter values does not need normalization necessarily but comparison of models does Vast literature exists on how to sample and integrate the posterior PDF (e.g. MCMC and its variants)
15 Question How do we calculate the likelihood function if we do not know the underlying PDF for the data errors and cannot argue from the CLT that it is Gaussian?
16 Bayesian Priors: Art or Science? The prior PDF f(θ I) for a parameter vector θ is used to impose a priori information about parameter values, when known. If prior information is constraining (i.e. the prior PDF has a strong influence on the shape of the posterior PDF), it is said to be informative. When explicit constraints are not known, one often uses a non-informative prior. For example, suppose we have a parameter which is largely unconstrained and for which we want to calculate the posterior PDF while allowing a wide range of possible values for the parameter. We might, then, use a flat prior in the statistical inference. But, is a flat prior really the best one for expressing ignorance of the actual value for a parameter? The answer is, not necessarily. 1
17 To illustrate the issues, we will consider two kinds of parameters: a location parameter and a scale parameter. For example, consider data that we assume are described by a N(µ, σ 2 ) distribution whose parameters µ (the mean) and σ 2 (the variance) are not known and are not constrained a priori. What should we use as priors for these parameters? We can write the likelihood function as L = f(d θi) = di µ f σ θi, (1) i where {d i,i =1, N} are the data and f(x) =e x2 /2. Note that µ shifts the PDF while σ scales the PDF. 2
18 Choosing a prior for µ: We use translation invariance. Suppose we make a change of variable so that d i = d i + c. (2) Then d i µ d i (µ + c) = d i µ. (3) σ σ σ Since c is arbitrary, if we don t know µ and hence do not know µ + c, it is plausible that we should search uniformly in µ, i.e. the prior for µ should be flat. We can see this also by the following. Suppose the prior for µ is f µ (µ). Then the prior for µ = µ + c is f µ (µ )= f µ(µ c) dµ = f µ (µ c). (4) /dµ We would like the inference to be independent of any such change of variable, so the form of the prior for µ should be translation invariant. In order for the left-hand and right-hand sides of Eq.?? to be equal, the form of the prior needs to be independent of its argument, i.e. flat. 3
19 Thus an appropriate prior would be of the form 1 µ 2 µ,µ 1 1 µ µ 2 f µ (µ) = 0, otherwise, where µ 1,2 are chosen to encompass all plausible values of µ. Note that in calculating the posterior PDF, the 1/(µ 2 µ 1 ) factor drops out if the range µ 2 µ 1 is much wider than the likelihood function L(θ). An example of a noninformative prior is shown in Figure??. (5) 4
20 Figure 1: A noninformative prior for the mean, µ. In this case, a flat prior PDF, f µ (µ), is shown along with a likelihood function, L(µ), that is much narrower than the prior. The peak of L is the maximum likelihood estimate for µ and is the arithmetic mean of the data: ˆµ = N 1 i d i. For a case like this, the actual interval for the prior, [µ 1,µ 2 ] will drop out of the posterior PDF because it appears in both the numerator and denominator. 5
21 Choosing a prior for σ: Here we use scale invariance. Consider a change of variable Now d i µ σ d i /c µ σ d i = cd i. (6) = d i cµ cσ = d i µ cσ If the prior for σ is f σ (σ), then the prior for σ is = d i µ σ. (7) f σ (σ )= f σ(σ /c) dσ /dσ = 1 c f σ(σ /c). (8) We would like f σ and f σ to have the same shape. Consider a power-law form, f σ σ n. Then Eq.?? implies that σ n 1 σ n =, (9) c c which can be satisfied only for n =1. 6
22 Thus the scale-invariant prior for σ is σ 1, σ 1 σ σ 2 f σ (σ) 0, otherwise, where σ 1,2 are chosen to encompass all plausible values of σ. (10) 7
23 Reality check: we can show that the scale-invariant, non-informative prior for σ is reasonable by considering another change of variable. Suppose we want to use the reciprocal of σ as our parameter rather than σ: The prior for s is s = σ 1. (11) f s (s) = f σ(s 1 ) ds/dσ = dσ ds f σ(s 1 ) = s 2 f σ (s 1 ) = s 2 1 s 1 = s 1. (12) Thus, the prior has the same form for σ and its reciprocal. This is desirable because it would not be reasonable for the parameter inference to depend on which variable we used. Thus we can use either σ or s and then derive one from the other. 8
24 Some Stochastic Processes of Interest
25 Stochastic Processes II Useful Processes: A. Gaussian noise: n(t) is a gaussian random process if 1. f n (x) =1D Gaussian PDF 2. f n,n(t+τ) (x, y) =2D joint Gaussian PDF 3. All higher order PDFs, moments can be written in terms of the first and second moments. Note that Gaussian noise can be either stationary or nonstationary. For example, the mean X(t) and variance σx 2 (t) can both be time dependent. 1
26 B. White noise has a particular spectral shape (flat) but the 1D PDF is unspecified: The autocorrelation function is S n (f) = constant R(τ) = σ 2 n δ(τ) continuous case R(τ) = σ 2 n δ τ0 discrete case Thus, white noise need not be Gaussian noise and vice versa. However, white, Gaussian noise is often used or assumed. Example of white, non-gaussian noise constructed from white, gaussian noise: Let X k = white, Gaussian noise: X k X k = σ 2 x δ kk. Let Y k =sgn(x k )=±1 Then Y k is white noise but it is not Gaussian. The PDF of Y k is f Y (Y )= 1 2 [δ(y +1)+δ(Y 1)]. It may be shown that the autocorrelation function of Y is a function of the ACF of X: This relation (van Vleck relation) is the basis for autocorrelation spectrometers. 2
27 C. Shot Noise is associated with Poisson events, each having a shape h(t): x(t) = i h(t t i ). where events occur at a rate λ. If h(t) decays to zero as t ±, then x(t) has stationary statistics. If h(t) does not decay, x(t) has nonstationary statistics. C1. White noise: As h(t) δ(t), x(t) tends to white noise. C2. Bandlimited white noise: If h(t) has a power spectrum H(f) 2 that is low-pass in form (it goes to zero above some cutoff frequency f c, then x(t) will have a flat spectrum for f f c. Similar for bandpass noise, where the centroid frequency of the non-zero noise is at some frequency f = 0. 3
28 Figure 1: A single realization of Gaussian white noise and random walks derived from it. Since individual steps occur frequently, the random walks are termed dense. 4
29 Figure 2: A single realization of non-gaussian white noise (shot noise) and sparse random walks derived from it. 5
30 D. Autoregressive (AR) process: depends on past values + white noise: x t = n t M α j x t j, j=1 where n t = discrete white noise. M = order of AR model α = coefficients of AR model. AR processes play a role in maximum entropy spectral estimators. By taking the Fourier transform of the expression for x t we can solve for X f = 1+ j Ñ f α j e 2πijf 6
31 E. Moving average (MA) process: is a moving average of white noise: x t = N β j n t j. j=0 F. ARMA process: AR and MA combined. G. ARIMA process: An integrated ARMA process. 7
32 H. Markov chain: one whose present state depends probabilistically on some number p of previous values. A first-order Markov process has p =1, etc. For a chain with n states, e.g. S = {s 1,s 2,,s n } the probability of being in a given state at discrete time t is given by the state probability vector is the row vector P t =(p 1,p 2,,p n ) and the probability vector for time t +1is P t+1 = P t Q where Q is the transition matrix whose elements are the probabilities q ij of transitioning from the i th state to the j th state. The sum of the elements along a row of Q is unity because the chain has to be in some state at any time. A two-state chain, for example, has a transition matrix q 11 1 q 11 Q =. 1 q 22 q 22 8
33 I. Random Walks: Any integral of noise with stationary statistics leads to a process having nonstationary statistics, with random-walk-like behavior. E.g. where n(t) is white noise. x(t) = t 0 dt n(t ), J. Higher-order random walks: If white noise is integrated M times, the resultant process is an M th -order random walk. 9
34 Got to here 2015
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
Lecture 9 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Applications: Comparison of Frequentist and Bayesian inference
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 (continued) Lecture 8 Key points in probability CLT CLT examples Prior vs Likelihood Box & Tiao
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationSignal Modeling, Statistical Inference and Data Mining in Astrophysics
ASTRONOMY 6523 Spring 2013 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Course Approach The philosophy of the course reflects that of the instructor, who takes a dualistic view
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationAn example to illustrate frequentist and Bayesian approches
Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationA6523 Modeling, Inference, and Mining Jim Cordes, Cornell University
A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Lecture 4 See web page later tomorrow Searching for Monochromatic Signals
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationMore on nuisance parameters
BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution
More informationLecture 2: Univariate Time Series
Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationLecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1
Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationA6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2.
A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 2 Probability basics Fourier transform basics Typical problems Overall mantra: Discovery and cri@cal thinking with data + The
More informationIntroduction to Bayesian Data Analysis
Introduction to Bayesian Data Analysis Phil Gregory University of British Columbia March 2010 Hardback (ISBN-10: 052184150X ISBN-13: 9780521841504) Resources and solutions This title has free Mathematica
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationStochastic Processes. A stochastic process is a function of two variables:
Stochastic Processes Stochastic: from Greek stochastikos, proceeding by guesswork, literally, skillful in aiming. A stochastic process is simply a collection of random variables labelled by some parameter:
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationStochastic Processes
Stochastic Processes Stochastic Process Non Formal Definition: Non formal: A stochastic process (random process) is the opposite of a deterministic process such as one defined by a differential equation.
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationLecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process
Lecture Notes 7 Stationary Random Processes Strict-Sense and Wide-Sense Stationarity Autocorrelation Function of a Stationary Process Power Spectral Density Continuity and Integration of Random Processes
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationLecture 3: Statistical sampling uncertainty
Lecture 3: Statistical sampling uncertainty c Christopher S. Bretherton Winter 2015 3.1 Central limit theorem (CLT) Let X 1,..., X N be a sequence of N independent identically-distributed (IID) random
More informationProbability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.
Lecture 5 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Probability, CLT, CLT counterexamples, Bayes The PDF file of
More informationDetection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset
ASTR509-14 Detection William Sealey Gosset 1876-1937 Best known for his Student s t-test, devised for handling small samples for quality control in brewing. To many in the statistical world "Student" was
More informationChapter 3 - Temporal processes
STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect
More informationBayesian methods in the search for gravitational waves
Bayesian methods in the search for gravitational waves Reinhard Prix Albert-Einstein-Institut Hannover Bayes forum Garching, Oct 7 2016 Statistics as applied Probability Theory Probability Theory: extends
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationApplied Probability and Stochastic Processes
Applied Probability and Stochastic Processes In Engineering and Physical Sciences MICHEL K. OCHI University of Florida A Wiley-Interscience Publication JOHN WILEY & SONS New York - Chichester Brisbane
More informationSF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES
SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES This document is meant as a complement to Chapter 4 in the textbook, the aim being to get a basic understanding of spectral densities through
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationStatistical Methods in Particle Physics Lecture 1: Bayesian methods
Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationP 1.5 X 4.5 / X 2 and (iii) The smallest value of n for
DHANALAKSHMI COLLEGE OF ENEINEERING, CHENNAI DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING MA645 PROBABILITY AND RANDOM PROCESS UNIT I : RANDOM VARIABLES PART B (6 MARKS). A random variable X
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process
Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Definition of stochastic process (random
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationABC methods for phase-type distributions with applications in insurance risk problems
ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon
More informationSAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software
SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCourse content (will be adapted to the background knowledge of the class):
Biomedical Signal Processing and Signal Modeling Lucas C Parra, parra@ccny.cuny.edu Departamento the Fisica, UBA Synopsis This course introduces two fundamental concepts of signal processing: linear systems
More informationFundamentals of Applied Probability and Random Processes
Fundamentals of Applied Probability and Random Processes,nd 2 na Edition Oliver C. Ibe University of Massachusetts, LoweLL, Massachusetts ip^ W >!^ AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationStatistical techniques for data analysis in Cosmology
Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationECE295, Data Assimila0on and Inverse Problems, Spring 2015
ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch
More informationStat 248 Lab 2: Stationarity, More EDA, Basic TS Models
Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Tessa L. Childers-Day February 8, 2013 1 Introduction Today s section will deal with topics such as: the mean function, the auto- and cross-covariance
More informationStatistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009
Statistics for Particle Physics Kyle Cranmer New York University 91 Remaining Lectures Lecture 3:! Compound hypotheses, nuisance parameters, & similar tests! The Neyman-Construction (illustrated)! Inverted
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading: Chapter 10 = linear LSQ with Gaussian errors Chapter 11 = Nonlinear fitting Chapter 12 = Markov Chain Monte
More informationStatistics for Data Analysis a toolkit for the (astro)physicist
Statistics for Data Analysis a toolkit for the (astro)physicist Likelihood in Astrophysics Denis Bastieri Padova, February 21 st 2018 RECAP The main product of statistical inference is the pdf of the model
More informationCOPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition
Preface Preface to the First Edition xi xiii 1 Basic Probability Theory 1 1.1 Introduction 1 1.2 Sample Spaces and Events 3 1.3 The Axioms of Probability 7 1.4 Finite Sample Spaces and Combinatorics 15
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 of Gregory (Frequentist Statistical Inference) Lecture 7 Examples of FT applications Simulating
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More information1 Linear Difference Equations
ARMA Handout Jialin Yu 1 Linear Difference Equations First order systems Let {ε t } t=1 denote an input sequence and {y t} t=1 sequence generated by denote an output y t = φy t 1 + ε t t = 1, 2,... with
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 Lecture 26 Localization/Matched Filtering (continued) Prewhitening Lectures next week: Reading Bases, principal
More information