Introduction to Bayesian Inference Lecture 2: Key Examples
|
|
- Helen Gordon
- 5 years ago
- Views:
Transcription
1 Introduction to Bayesian Inference Lecture 2: Key Examples Tom Loredo Dept. of Astronomy, Cornell University CASt Summer School 10 June /76
2 Lecture 2: Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 2/76
3 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 3/76
4 Binary Outcomes: Parameter Estimation M = Existence of two outcomes, S and F; for each case or trial, the probability for S is α; for F it is (1 α) H i = Statements about α, the probability for success on the next trial seek p(α D,M) D = Sequence of results from N observed trials: Likelihood: FFSSSSFSSSFS (n = 8 successes in N = 12 trials) p(d α,m) = p(failure α,m) p(failure α,m) = α n (1 α) N n = L(α) 4/76
5 Prior Starting with no information about α beyond its definition, use as an uninformative prior p(α M) = 1. Justifications: Intuition: Don t prefer any α interval to any other of same size Bayes s justification: Ignorance means that before doing the N trials, we have no preference for how many will be successes: P(nsuccess M) = 1 N +1 p(α M) = 1 Consider this a convention an assumption added to M to make the problem well posed. 5/76
6 Prior Predictive p(d M) = dα α n (1 α) N n = B(n+1,N n+1) = n!(n n)! (N +1)! A Beta integral, B(a,b) dx x a 1 (1 x) b 1 = Γ(a)Γ(b) Γ(a+b). 6/76
7 Posterior p(α D,M) = (N +1)! n!(n n)! αn (1 α) N n A Beta distribution. Summaries: Best-fit: ˆα = n n+1 N = 2/3; α = N Uncertainty: σ α = (n+1)(n n+1) 0.12 (N+2) 2 (N+3) Find credible regions numerically, or with incomplete beta function Note that the posterior depends on the data only through n, not the N binary numbers describing the sequence. n is a (minimal) sufficient statistic. 7/76
8 8/76
9 Binary Outcomes: Model Comparison Equal Probabilities? M 1 : α = 1/2 M 2 : α [0,1] with flat prior. Maximum Likelihoods M 1 : p(d M 1 ) = 1 2 N = M 2 : L(ˆα) = ( ) 2 n ( ) 1 N n = p(d M 1 ) p(d ˆα,M 2 ) = 0.51 Maximum likelihoods favor M 2 (failures more probable). 9/76
10 Bayes Factor (ratio of model likelihoods) p(d M 1 ) = 1 2 N; and p(d M n!(n n)! 2) = (N +1)! B 12 p(d M 1) p(d M 2 ) (N +1)! = n!(n n)!2 N = 1.57 Bayes factor (odds) favors M 1 (equiprobable). Note that for n = 6, B 12 = 2.93; for this small amount of data, we can never be very sure results are equiprobable. If n = 0, B 12 1/315; if n = 2, B 12 1/4.8; for extreme data, 12 flips can be enough to lead us to strongly suspect outcomes have different probabilities. (Frequentist significance tests can reject null for any sample size.) 10/76
11 Binary Outcomes: Binomial Distribution Suppose D = n (number of heads in N trials), rather than the actual sequence. What is p(α n, M)? Likelihood Let S = a sequence of flips with n heads. p(n α,m) = S p(s α,m)p(n S,α,M) α n (1 α) N n # successes = n = α n (1 α) N n C n,n C n,n = # of sequences of length N with n heads. p(n α,m) = N! n!(n n)! αn (1 α) N n The binomial distribution for n given α, N. 11/76
12 Posterior p(α n,m) = N! n!(n n)! αn (1 α) N n p(n M) p(n M) = = N! n!(n n)! 1 N +1 dα α n (1 α) N n p(α n,m) = (N +1)! n!(n n)! αn (1 α) N n Same result as when data specified the actual sequence. 12/76
13 Another Variation: Negative Binomial Suppose D = N, the number of trials it took to obtain a predifined number of successes, n = 8. What is p(α N,M)? Likelihood p(n α,m) is probability for n 1 successes in N 1 trials, times probability that the final trial is a success: p(n α,m) = = (N 1)! (n 1)!(N n)! αn 1 (1 α) N n α (N 1)! (n 1)!(N n)! αn (1 α) N n The negative binomial distribution for N given α, n. 13/76
14 Posterior p(α D,M) = C n,n α n (1 α) N n p(d M) p(d M) = C n,n dα α n (1 α) N n p(α D,M) = Same result as other cases. (N +1)! n!(n n)! αn (1 α) N n 14/76
15 Final Variation: Meteorological Stopping Suppose D = (N,n), the number of samples and number of successes in an observing run whose total number was determined by the weather at the telescope. What is p(α D,M )? (M adds info about weather to M.) Likelihood p(d α,m ) is the binomial distribution times the probability that the weather allowed N samples, W(N): p(d α,m N! ) = W(N) n!(n n)! αn (1 α) N n Let C n,n = W(N) ( N n). We get the same result as before! 15/76
16 Likelihood Principle To define L(H i ) = p(d obs H i,i), we must contemplate what other data we might have obtained. But the real sample space may be determined by many complicated, seemingly irrelevant factors; it may not be well-specified at all. Should this concern us? Likelihood principle: The result of inferences depends only on how p(d obs H i,i) varies w.r.t. hypotheses. We can ignore aspects of the observing/sampling procedure that do not affect this dependence. This happens because no sums of probabilities for hypothetical data appear in Bayesian results; Bayesian calculations condition on D obs. This is a sensible property that frequentist methods do not share. Frequentist probabilities are long run rates of performance, and depend on details of the sample space that are irrelevant in a Bayesian calculation. 16/76
17 Goodness-of-fit Violates the Likelihood Principle Theory (H 0 ) The number of A stars in a cluster should be 0.1 of the total. Observations 5 A stars found out of 96 total stars observed. Theorist s analysis Calculate χ 2 using n A = 9.6 and n X = Significance level is p(> χ 2 H 0 ) = 0.12 (or 0.07 using more rigorous binomial tail area). Theory is accepted. 17/76
18 Observer s analysis Actual observing plan was to keep observing until 5 A stars seen! Random quantity is N tot, not n A ; it should follow the negative binomial dist n. Expect N tot = 50±21. p(> χ 2 H 0 ) = Theory is rejected. Telescope technician s analysis A storm was coming in, so the observations would have ended whether 5 A stars had been seen or not. The proper ensemble should take into account p(storm)... Bayesian analysis The Bayes factor is the same for binomial or negative binomial likelihoods, and slightly favors H 0. Include p(storm) if you want it will drop out! 18/76
19 Inference With Normals/Gaussians Gaussian PDF p(x µ,σ) = 1 σ (x µ) 2 2π e 2σ 2 over [, ] Common abbreviated notation: x N(µ,σ 2 ) Parameters µ = x dx x p(x µ,σ) σ 2 = (x µ) 2 dx (x µ) 2 p(x µ,σ) 19/76
20 Gauss s Observation: Sufficiency Suppose our data consist of N measurements, d i = µ+ǫ i. Suppose the noise contributions are independent, and ǫ i N(0,σ 2 ). p(d µ,σ,m) = i p(d i µ,σ,m) = p(ǫ i = d i µ µ,σ,m) i = 1 [ σ 2π exp (d i µ) 2 ] 2σ 2 i 1 = σ N (2π) N/2e Q(µ)/2σ2 20/76
21 Find dependence of Q on µ by completing the square: Q = i (d i µ) 2 [Note: Q/σ 2 = χ 2 (µ)] = i ( = di 2 + i ) i d 2 i = N(µ d) 2 + µ 2 2 i d i µ +Nµ 2 2Nµd ( i d 2 i ) Nd 2 where d 1 N = N(µ d) 2 +Nr 2 where r 2 1 (d i d) 2 N i i d i 21/76
22 Likelihood depends on {d i } only through d and r: ) ) 1 L(µ,σ) = σ N exp ( Nr2 (2π) N/2 2σ 2 exp ( N(µ d)2 2σ 2 The sample mean and variance are sufficient statistics. This is a miraculous compression of information the normal dist n is highly abnormal in this respect! 22/76
23 Estimating a Normal Mean Problem specification Model: d i = µ+ǫ i, ǫ i N(0,σ 2 ), σ is known I = (σ,m). Parameter space: µ; seek p(µ D, σ, M) Likelihood ) ) 1 p(d µ,σ,m) = σ N exp ( Nr2 (2π) N/2 2σ 2 exp ( N(µ d)2 2σ 2 ) exp ( N(µ d)2 2σ 2 23/76
24 Uninformative prior Translation invariance p(µ) C, a constant. This prior is improper unless bounded. Prior predictive/normalization ) p(d σ,m) = dµ C exp ( N(µ d)2 2σ 2 = C(σ/ N) 2π... minus a tiny bit from tails, using a proper prior. 24/76
25 Posterior ) 1 p(µ D,σ,M) = ( (σ/ N) 2π exp N(µ d)2 2σ 2 Posterior is N(d,w 2 ), with standard deviation w = σ/ N. 68.3% HPD credible region for µ is d ±σ/ N. Note that C drops out limit of infinite prior range is well behaved. 25/76
26 Informative Conjugate Prior Use a normal prior, µ N(µ 0,w 2 0 ). Conjugate because the posterior is also normal. Posterior Normal N( µ, w 2 ), but mean, std. deviation shrink towards prior. Define B = w2, so B < 1 and B = 0 when w w 2 +w0 2 0 is large. Then µ = d +B (µ 0 d) w = w 1 B Principle of stable estimation The prior affects estimates only when data are not informative relative to prior. 26/76
27 Conjugate normal examples: Data have d = 3, σ/ N = 1 Priors at µ 0 = 10, with w = {5,2} Prior L Post ) 0.25 p(x 0.20 ) 0.25 p(x x x /76
28 Estimating a Normal Mean: Unknown σ Problem specification Model: d i = µ+ǫ i, ǫ i N(0,σ 2 ), σ is unknown Parameter space: (µ, σ); seek p(µ D, M) Likelihood ) ) 1 p(d µ,σ,m) = σ N exp ( Nr2 (2π) N/2 2σ 2 exp ( N(µ d)2 2σ 2 1 σ Ne Q/2σ2 where Q = N [ r 2 +(µ d) 2] 28/76
29 Uninformative Priors Assume priors for µ and σ are independent. Translation invariance p(µ) C, a constant. Scale invariance p(σ) 1/σ (flat in logσ). Joint Posterior for µ, σ p(µ,σ D,M) 1 σ N+1e Q(µ)/2σ2 29/76
30 Marginal Posterior p(µ D,M) dσ 1 σ N+1e Q/2σ2 Let τ = Q Q so σ = 2σ 2 2τ and dσ = τ 3/2 Q 2 dτ p(µ D,M) 2 N/2 Q N/2 dτ τ N 2 1 e τ Q N/2 30/76
31 Write Q = Nr 2 [1+ ( ) ] 2 µ d r and normalize: p(µ D,M) = ( N 2 1) [! 1 ( N 2 3 ) 1+ 1 ( ) 2 ] N/2 µ d 2! π r N r/ N Student s t distribution, with t = (µ d) r/ N A bell curve, but with power-law tails Large N: p(µ D,M) e N(µ d)2 /2r 2 This is the rigorous way to adjust σ so χ 2 /dof = 1. It doesn t just plug in a best σ; it slightly broadens posterior to account for σ uncertainty. 31/76
32 Student t examples: 1 p(x) ( )n+1 1+ x2 2 n Location = 0, scale = 1 Degrees of freedom = {1,2,3,5,10, } normal p(x) x 32/76
33 Gaussian Background Subtraction Measure background rate b = ˆb ±σ b with source off. Measure total rate r =ˆr ±σ r with source on. Infer signal source strength s, where r = s +b. With flat priors, [ ] (b ˆb) 2 p(s,b D,M) exp 2σ 2 b exp [ ] (s +b ˆr)2 2σr 2 33/76
34 Marginalize b to summarize the results for s (complete the square to isolate b dependence; then do a simple Gaussian integral over b): ] (s ŝ)2 p(s D,M) exp [ 2σs 2 ŝ = ˆr ˆb σ 2 s = σ2 r +σ2 b Background subtraction is a special case of background marginalization; i.e., marginalization told us to subtract a background estimate. Recall the standard derivation of background uncertainty via propagation of errors based on Taylor expansion (statistician s Delta-method). Marginalization provides a generalization of error propagation without approximation! 34/76
35 Bayesian Curve Fitting & Least Squares Setup Data D = {d i } are measurements of an underlying function f(x;θ) at N sample points {x i }. Let f i (θ) f(x i ;θ): d i = f i (θ)+ǫ i, ǫ i N(0,σ 2 i ) We seek learn θ, or to compare different functional forms (model choice, M). Likelihood [ N 1 p(d θ,m) = exp 1 ( ) ] di f i (θ) 2 σ i=1 i 2π 2 σ i [ exp 1 ( ) ] di f i (θ) 2 2 σ i i [ ] = exp χ2 (θ) 2 35/76
36 Bayesian Curve Fitting & Least Squares Posterior For prior density π(θ), p(θ D,M) π(θ)exp If you have a least-squares or χ 2 code: Think of χ 2 (θ) as 2logL(θ). [ ] χ2 (θ) 2 Bayesian inference amounts to exploration and numerical integration of π(θ)e χ2 (θ)/2. 36/76
37 Important Case: Separable Nonlinear Models A (linearly) separable model has parameters θ = (A, ψ): Linear amplitudes A = {A α } Nonlinear parameters ψ f(x;θ) is a linear superposition of M nonlinear components g α (x;ψ): M d i = A α g α (x i ;ψ)+ǫ i or α=1 d = A α g α (ψ)+ ǫ. α Why this is important: You can marginalize over A analytically Bretthorst algorithm ( Bayesian Spectrum Analysis & Param. Est n 1988) Algorithm is closely related to linear least squares/diagonalization/svd. 37/76
38 Poisson Dist n: Infer a Rate from Counts Problem: Observe n counts in T; infer rate, r Likelihood L(r) p(n r,m) = p(n r,m) = (rt)n e rt n! Prior Two simple standard choices (or conjugate gamma dist n): r known to be nonzero; it is a scale parameter: p(r M) = 1 ln(r u /r l ) r may vanish; require p(n M) Const: 1 r p(r M) = 1 r u 38/76
39 Prior predictive p(n M) = 1 r u 1 n! = Posterior A gamma distribution: ru 0 rut dr(rt) n e rt 1 1 r u T n! 0 1 r u T for r u n T p(r n,m) = T(rT)n e rt n! d(rt)(rt) n e rt 39/76
40 Gamma Distributions A 2-parameter family of distributions over nonnegative x, with shape parameter α and scale parameter s: p Γ (x α,s) = 1 ( x ) α 1e x/s sγ(α) s Moments: E(x) = sα Var(x) = s 2 α Our posterior corresponds to α = n+1, s = 1/T. Mode ˆr = n n+1 T ; mean r = T (shift down 1 with 1/r prior) Std. dev n σ r = n+1 T ; credible regions found by integrating (can use incomplete gamma function) 40/76
41 41/76
42 The flat prior Bayes s justification: Not that ignorance of r p(r I) = C Require (discrete) predictive distribution to be flat: p(n I) = dr p(r I)p(n r,i) = C Useful conventions p(r I) = C Use a flat prior for a rate that may be zero Use a log-flat prior ( 1/r) for a nonzero scale parameter Use proper (normalized, bounded) priors Plot posterior with abscissa that makes prior flat 42/76
43 Basic problem The On/Off Problem Look off-source; unknown background rate b Count N off photons in interval T off Look on-source; rate is r = s +b with unknown signal s Count N on photons in interval T on Infer s Conventional solution ˆb = N off /T off ; σ b = N off /T off ˆr = N on /T on ; σ r = N on /T on ŝ = ˆr ˆb; σ s = σr 2 +σb 2 But ŝ can be negative! 43/76
44 Examples Spectra of X-Ray Sources Bassani et al Di Salvo et al /76
45 Spectrum of Ultrahigh-Energy Cosmic Rays Nagano & Watson 2000 HiRes Team 2007 Flux*E 3 /10 24 (ev 2 m -2 s -1 sr -1 ) 10 1 HiRes-2 Monocular HiRes-1 Monocular AGASA log 10 (E) (ev) 45/76
46 N is Never Large Sample sizes are never large. If N is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once N is large enough, you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc etc). N is never enough because if it were enough you d already be on to the next problem for which you need more data. Andrew Gelman (blog entry, 31 July 2005) 46/76
47 N is Never Large Sample sizes are never large. If N is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once N is large enough, you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc etc). N is never enough because if it were enough you d already be on to the next problem for which you need more data. Similarly, you never have quite enough money. But that s another story. Andrew Gelman (blog entry, 31 July 2005) 46/76
48 Bayesian Solution to On/Off Problem First consider off-source data; use it to estimate b: p(b N off,i off ) = T off(bt off ) N offe bt off N off! Use this as a prior for b to analyze on-source data. For on-source analysis I all = (I on,n off,i off ): p(s,b N on ) p(s)p(b)[(s +b)t on ] Non e (s+b)ton I all p(s I all ) is flat, but p(b I all ) = p(b N off,i off ), so p(s,b N on,i all ) (s +b) Non b N off e ston e b(ton+t off) 47/76
49 Now marginalize over b; p(s N on,i all ) = db p(s,b N on,i all ) db (s +b) Non b N off e ston e b(ton+t off) Expand (s +b) Non and do the resulting Γ integrals: p(s N on,i all ) = C i N on T on (st on ) i e ston C i i! i=0 ( 1+ T ) i off (N on +N off i)! T on (N on i)! Posterior is a weighted sum of Gamma distributions, each assigning a different number of on-source counts to the source. (Evaluate via recursive algorithm or confluent hypergeometric function.) 48/76
50 Example On/Off Posteriors Short Integrations T on = 1 49/76
51 Example On/Off Posteriors Long Background Integrations T on = 1 50/76
52 Recap of Key Findings From Examples Sufficiency: Model-dependent summary of data Conjugate priors Likelihood principle Marginalization: Generalizes background subtraction, propagation of errors Student s t for handling σ uncertainty Exact treatment of Poisson background uncertainty (don t subtract!) 51/76
53 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 52/76
54 Complications With Survey Data Selection effects (truncation, censoring) obvious (usually) Typically treated by correcting data Most sophisticated: product-limit estimators Scatter effects (measurement error, etc.) insidious Typically ignored (average out???) 53/76
55 Many Guises of Measurement Error Auger data above GZK cutoff (PAO 2007; Soiaporn ) QSO hardness vs. luminosity (Kelly 2007, 2011) 54/76
56 Accounting For Measurement Error Introduce latent/hidden/incidental parameters Suppose f(x θ) is a distribution for an observable, x. From N precisely measured samples, {x i }, we can infer θ from L(θ) p({x i } θ) = i f(x i θ) 55/76
57 Graphical representation Nodes/vertices = uncertain quantities Edges specify conditional dependence Absence of an edge denotes conditional independence θ x 1 x 2 x N L(θ) p({x i } θ) = i f(x i θ) 56/76
58 But what if the x data are noisy, D i = {x i +ǫ i }? We should somehow incorporate l i (x i ) = p(d i x i ) L(θ,{x i }) p({d i } θ,{x i }) = l i (x i )f(x i θ) i Marginalize over {x i } to summarize inferences for θ. Marginalize over θ to summarize inferences for {x i }. Key point: Maximizing over x i and integrating over x i can give very different results! 57/76
59 To estimate x 1 : L m (x 1 ) = dθ l 1 (x 1 )f(x 1 θ) = dθ l 1 (x 1 )f(x 1 θ)l 1 (θ) dx 2... N dx N l i (x i )f(x i θ) i=2 l 1 (x 1 )f(x 1 ˆθ) with ˆθ determined by the remaining data. f(x 1 ˆθ) behaves like a prior that shifts the x 1 estimate away from the peak of l 1 (x i ). This generalizes the corrections derived by Malmquist and Lutz-Kelker. Landy & Szalay (1992) proposed adaptive Malmquist corrections that can be viewed as an approximation to this. 58/76
60 Graphical representation θ x 1 x 2 x N D 1 D 2 D N L(θ,{x i }) p({d i } θ,{x i }) = i p(d i x i )f(x i θ) = i l i (x i )f(x i θ) A two-level multi-level model (MLM). 59/76
61 Bayesian MLMs in Astronomy Surveys (number counts/ log N log S /Malmquist): GRB peak flux dist n (Loredo & Wasserman 1998) TNO/KBO magnitude distribution (Gladman ; Petit ) MLM tutorial; Malmquist-type biases in cosmology (Loredo & Hendry 2009 in BMIC book) Extreme deconvolution for proper motion surveys (Bovy, Hogg, & Roweis 2011) Directional & spatio-temporal coincidences: GRB repetition (Luo ; Graziani ) GRB host ID (Band 1998; Graziani ) VO cross-matching (Badavári & Szalay 2008) 60/76
62 Time series: SN 1987A neutrinos, uncertain energy vs. time (Loredo & Lamb 2002) Multivariate Bayesian Blocks (Dobigeon, Tourneret & Scargle 2007) SN Ia multicolor light curve modeling (Mandel , 2011) Linear regression with measurement error: QSO hardness vs. luminosity (Kelly 2007, 2011) See last year s Surveys School notes for more! 61/76
63 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 62/76
64 Statistical Integrals Inference with independent data Consider N data, D = {x i }; and model M with m parameters. Suppose L(θ) = p(x 1 θ)p(x 2 θ) p(x N θ). Frequentist integrals Find long-run properties of procedures via sample space integrals: I(θ) = dx 1 p(x 1 θ) dx 2 p(x 2 θ) dx N p(x N θ)f(d,θ) Rigorous analysis must explore the θ dependence; rarely done in practice. Plug-in approximation: Report properties of procedure for θ = ˆθ. Asymptotically accurate (for large N, expect ˆθ θ). Plug-in results are easy via Monte Carlo (due to independence). 63/76
65 Bayesian integrals d m θ g(θ)p(θ M)L(θ) = d m θ g(θ)q(θ) p(θ M) L(θ) g(θ) = 1 p(d M) (norm. const., model likelihood) g(θ) = box credible region g(θ) = θ posterior mean for θ Such integrals are sometimes easy if analytic (especially in low dimensions), often easier than frequentist counterparts (e.g., normal credible regions, Student s t). Asymptotic approximations: Require ingredients familiar from frequentist calculations. Bayesian calculation is not significantly harder than frequentist calculation in this limit. Numerical calculation: For large m (> 4 is often enough!) the integrals are often very challenging because of structure (e.g., correlations) in parameter space. This is usually pursued without making any procedural approximations. 64/76
66 Bayesian Computation Large sample size: Laplace approximation Approximate posterior as multivariate normal det(covar) factors Uses ingredients available in χ 2 /ML fitting software (MLE, Hessian) Often accurate to O(1/N) Modest-dimensional models (d< 10 to 20) Adaptive cubature Monte Carlo integration (importance & stratified sampling, adaptive importance sampling, quasirandom MC) High-dimensional models (d> 5) Posterior sampling create RNG that samples posterior MCMC is most general framework Murali s lab 65/76
67 See SCMA 5 Bayesian Computation tutorial notes for more! 66/76
68 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 67/76
69 Tools for Computational Bayes Astronomer/Physicist Tools BIE Bayesian Inference Engine: General framework for Bayesian inference, tailored to astronomical and earth-science survey data. Built-in database capability to support analysis of terabyte-scale data sets. Inference is by Bayes via MCMC. CIAO/Sherpa On/off marginal likelihood support, and Bayesian Low-Count X-ray Spectral (BLoCXS) analysis via MCMC via the pyblocxs extension XSpec Includes some basic MCMC capability CosmoMC Parameter estimation for cosmological models using CMB, etc., via MCMC MultiNest Bayesian inference via an approximate implementation of the nested sampling algorithm ExoFit Adaptive MCMC for fitting exoplanet RV data extreme-deconvolution Multivariate density estimation with measurement error, via a multivariate normal finite mixture model; partly Bayesian; Python & IDL wrappers 68/76
70 Astronomer/Physicist Tools, cont d... root/roostats Statistical tools for particle physicists; Bayesian support being incorporated CDF Bayesian Limit Software Limits for Poisson counting processes, with background & efficiency uncertainties SuperBayeS Bayesian exploration of supersymmetric theories in particle physics using the MultiNest algorithm; includes a MATLAB GUI for plotting CUBA Multidimensional integration via adaptive cubature, adaptive importance sampling & stratification, and QMC (C/C++, Fortran, and Mathematica; R interface also via 3rd-party R2Cuba) Cubature Subregion-adaptive cubature in C, with a 3rd-party R interface; intended for low dimensions (< 7) APEMoST Automated Parameter Estimation and Model Selection Toolkit in C, a general-purpose MCMC environment that includes parallel computing support via MPI; motivated by asteroseismology problems Inference Forthcoming at Several self-contained Bayesian modules; Parametric Inference Engine 69/76
71 Python PyMC A framework for MCMC via Metropolis-Hastings; also implements Kalman filters and Gaussian processes. Targets biometrics, but is general. SimPy SimPy (rhymes with Blimpie ) is a process-oriented public-domain package for discrete-event simulation. RSPython Bi-directional communication between Python and R MDP Modular toolkit for Data Processing: Current emphasis is on machine learning (PCA, ICA...). Modularity allows combination of algorithms and other data processing elements into flows. Orange Component-based data mining, with preprocessing, modeling, and exploration components. Python/GUI interfaces to C + + implementations. Some Bayesian components. ELEFANT Machine learning library and platform providing Python interfaces to efficient, lower-level implementations. Some Bayesian components (Gaussian processes; Bayesian ICA/PCA). 70/76
72 R and S CRAN Bayesian task view Overview of many R packages implementing various Bayesian models and methods; pedagogical packages; packages linking R to other Bayesian software (BUGS, JAGS) Omega-hat RPython, RMatlab, R-Xlisp BOA Bayesian Output Analysis: Convergence diagnostics and statistical and graphical analysis of MCMC output; can read BUGS output files. CODA Convergence Diagnosis and Output Analysis: Menu-driven R/S plugins for analyzing BUGS output R2Cuba R interface to Thomas Hahn s Cuba library (see above) for deterministic and Monte Carlo cubature 71/76
73 Java Hydra HYDRA provides methods for implementing MCMC samplers using Metropolis, Metropolis-Hastings, Gibbs methods. In addition, it provides classes implementing several unique adaptive and multiple chain/parallel MCMC methods. YADAS Software system for statistical analysis using MCMC, based on the multi-parameter Metropolis-Hastings algorithm (rather than parameter-at-a-time Gibbs sampling) Omega-hat Java environment for statistical computing, being developed by XLisp-stat and R developers 72/76
74 C/C++/Fortran BayeSys 3 Sophisticated suite of MCMC samplers including transdimensional capability, by the author of MemSys fbm Flexible Bayesian Modeling: MCMC for simple Bayes, nonparametric Bayesian regression and classification models based on neural networks and Gaussian processes, and Bayesian density estimation and clustering using mixture models and Dirichlet diffusion trees BayesPack, DCUHRE Adaptive quadrature, randomized quadrature, Monte Carlo integration BIE, CDF Bayesian limits, CUBA (see above) 73/76
75 Other Statisticians & Engineers Tools BUGS/WinBUGS Bayesian Inference Using Gibbs Sampling: Flexible software for the Bayesian analysis of complex statistical models using MCMC OpenBUGS BUGS on Windows and Linux, and from inside the R JAGS Just Another Gibbs Sampler; MCMC for Bayesian hierarchical models XLisp-stat Lisp-based data analysis environment, with an emphasis on providing a framework for exploring the use of dynamic graphical methods ReBEL Library supporting recursive Bayesian estimation in Matlab (Kalman filter, particle filters, sequential Monte Carlo). 74/76
76 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 75/76
77 Philip Dawid (2000) Closing Reflections What is the principal distinction between Bayesian and classical statistics? It is that Bayesian statistics is fundamentally boring. There is so little to do: just specify the model and the prior, and turn the Bayesian handle. There is no room for clever tricks or an alphabetic cornucopia of definitions and optimality criteria. I have heard people use this dullness as an argument against Bayesianism. One might as well complain that Newton s dynamics, being based on three simple laws of motion and one of gravitation, is a poor substitute for the richness of Ptolemy s epicyclic system. All my experience teaches me that it is invariably more fruitful, and leads to deeper insights and better data analyses, to explore the consequences of being a thoroughly boring Bayesian. Dennis Lindley (2000) The philosophy places more emphasis on model construction than on formal inference... I do agree with Dawid that Bayesian statistics is fundamentally boring...my only qualification would be that the theory may be boring but the applications are exciting. 76/76
Introduction to Bayesian inference: Key examples
Introduction to Bayesian inference: Key examples Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ IAC Winter School, 3 4 Nov 2014 1/49 Key examples: 3
More informationIntroduction to Bayesian Inference: Supplemental Topics
Introduction to Bayesian Inference: Supplemental Topics Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ CASt Summer School 5 June 2014 1/42 Supplemental
More informationIntroduction to Bayesian Inference Lecture 4: Multilevel Models for Measurement Error, Basic Bayesian Computation
Introduction to Bayesian Inference Lecture 4: Multilevel Models for Measurement Error, Basic Bayesian Computation Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/
More informationIntroduction to Bayesian Inference: Supplemental Topics
Introduction to Bayesian Inference: Supplemental Topics Tom Loredo Cornell Center for Astrophysics and Planetary Science http://www.astro.cornell.edu/staff/loredo/bayes/ CASt Summer School 7 8 June 2017
More informationIntroduction to Bayesian Inference
Introduction to Bayesian Inference Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ June 10, 2006 Outline 1 The Big Picture 2 Foundations Axioms, Theorems
More informationLearning How To Count: Inference With Poisson Process Models (Lecture 2)
Learning How To Count: Inference With Poisson Process Models (Lecture 2) Tom Loredo Dept. of Astronomy, Cornell University Motivation We consider processes that produces discrete, isolated events in some
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationTwo Lectures: Bayesian Inference in a Nutshell and The Perils & Promise of Statistics With Large Data Sets & Complicated Models
Two Lectures: Bayesian Inference in a Nutshell and The Perils & Promise of Statistics With Large Data Sets & Complicated Models Bayesian-Frequentist Cross-Fertilization Tom Loredo Dept. of Astronomy, Cornell
More informationMultilevel modeling of cosmic populations
Multilevel modeling of cosmic populations Tom Loredo Center for Radiophysics & Space Research, Cornell University MLM team: Martin Hendry, David Ruppert, David Chernoff, Ira Wasserman MLMs on GPUs team:
More informationWhy Try Bayesian Methods? (Lecture 5)
Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random
More informationMiscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)
Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationHierarchical Bayesian modeling
Hierarchical Bayesian modeling Tom Loredo Cornell Center for Astrophysics and Planetary Science http://www.astro.cornell.edu/staff/loredo/bayes/ SAMSI ASTRO 19 Oct 2016 1 / 51 1970 baseball averages Efron
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationHierarchical Bayesian Modeling
Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an
More informationHierarchical Bayesian Modeling
Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationRemarks on Improper Ignorance Priors
As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationStatistics for Data Analysis a toolkit for the (astro)physicist
Statistics for Data Analysis a toolkit for the (astro)physicist Likelihood in Astrophysics Denis Bastieri Padova, February 21 st 2018 RECAP The main product of statistical inference is the pdf of the model
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationBayesian Inference. STA 121: Regression Analysis Artin Armagan
Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationHierarchical Bayesian Modeling of Planet Populations
Hierarchical Bayesian Modeling of Planet Populations You ve found planets in your data (or not)...... now what?! Angie Wolfgang NSF Postdoctoral Fellow, Penn State Why Astrostats? This week: a sample of
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationBayesian Inference. Chapter 2: Conjugate models
Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMeasurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007
Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationStatistical Methods in Particle Physics Lecture 1: Bayesian methods
Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationThe Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13
The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationIntroduction to Bayes
Introduction to Bayes Alan Heavens September 3, 2018 ICIC Data Analysis Workshop Alan Heavens Introduction to Bayes September 3, 2018 1 / 35 Overview 1 Inverse Problems 2 The meaning of probability Probability
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationAdvanced Statistical Methods. Lecture 6
Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Cosmological parameters A branch of modern cosmological research focuses on measuring
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationLecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1
Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationStatistical Tools and Techniques for Solar Astronomers
Statistical Tools and Techniques for Solar Astronomers Alexander W Blocker Nathan Stein SolarStat 2012 Outline Outline 1 Introduction & Objectives 2 Statistical issues with astronomical data 3 Example:
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationBayesian inference: what it means and why we care
Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More information2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling
2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationInference Control and Driving of Natural Systems
Inference Control and Driving of Natural Systems MSci/MSc/MRes nick.jones@imperial.ac.uk The fields at play We will be drawing on ideas from Bayesian Cognitive Science (psychology and neuroscience), Biological
More information1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008
1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008 1.1 Introduction Why do we need statistic? Wall and Jenkins (2003) give a good description of the scientific analysis
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationIntroduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department
Introduction to Bayesian Statistics James Swain University of Alabama in Huntsville ISEEM Department Author Introduction James J. Swain is Professor of Industrial and Systems Engineering Management at
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More information