Introduction to Bayesian Inference Lecture 2: Key Examples

Size: px
Start display at page:

Download "Introduction to Bayesian Inference Lecture 2: Key Examples"

Transcription

1 Introduction to Bayesian Inference Lecture 2: Key Examples Tom Loredo Dept. of Astronomy, Cornell University CASt Summer School 10 June /76

2 Lecture 2: Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 2/76

3 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 3/76

4 Binary Outcomes: Parameter Estimation M = Existence of two outcomes, S and F; for each case or trial, the probability for S is α; for F it is (1 α) H i = Statements about α, the probability for success on the next trial seek p(α D,M) D = Sequence of results from N observed trials: Likelihood: FFSSSSFSSSFS (n = 8 successes in N = 12 trials) p(d α,m) = p(failure α,m) p(failure α,m) = α n (1 α) N n = L(α) 4/76

5 Prior Starting with no information about α beyond its definition, use as an uninformative prior p(α M) = 1. Justifications: Intuition: Don t prefer any α interval to any other of same size Bayes s justification: Ignorance means that before doing the N trials, we have no preference for how many will be successes: P(nsuccess M) = 1 N +1 p(α M) = 1 Consider this a convention an assumption added to M to make the problem well posed. 5/76

6 Prior Predictive p(d M) = dα α n (1 α) N n = B(n+1,N n+1) = n!(n n)! (N +1)! A Beta integral, B(a,b) dx x a 1 (1 x) b 1 = Γ(a)Γ(b) Γ(a+b). 6/76

7 Posterior p(α D,M) = (N +1)! n!(n n)! αn (1 α) N n A Beta distribution. Summaries: Best-fit: ˆα = n n+1 N = 2/3; α = N Uncertainty: σ α = (n+1)(n n+1) 0.12 (N+2) 2 (N+3) Find credible regions numerically, or with incomplete beta function Note that the posterior depends on the data only through n, not the N binary numbers describing the sequence. n is a (minimal) sufficient statistic. 7/76

8 8/76

9 Binary Outcomes: Model Comparison Equal Probabilities? M 1 : α = 1/2 M 2 : α [0,1] with flat prior. Maximum Likelihoods M 1 : p(d M 1 ) = 1 2 N = M 2 : L(ˆα) = ( ) 2 n ( ) 1 N n = p(d M 1 ) p(d ˆα,M 2 ) = 0.51 Maximum likelihoods favor M 2 (failures more probable). 9/76

10 Bayes Factor (ratio of model likelihoods) p(d M 1 ) = 1 2 N; and p(d M n!(n n)! 2) = (N +1)! B 12 p(d M 1) p(d M 2 ) (N +1)! = n!(n n)!2 N = 1.57 Bayes factor (odds) favors M 1 (equiprobable). Note that for n = 6, B 12 = 2.93; for this small amount of data, we can never be very sure results are equiprobable. If n = 0, B 12 1/315; if n = 2, B 12 1/4.8; for extreme data, 12 flips can be enough to lead us to strongly suspect outcomes have different probabilities. (Frequentist significance tests can reject null for any sample size.) 10/76

11 Binary Outcomes: Binomial Distribution Suppose D = n (number of heads in N trials), rather than the actual sequence. What is p(α n, M)? Likelihood Let S = a sequence of flips with n heads. p(n α,m) = S p(s α,m)p(n S,α,M) α n (1 α) N n # successes = n = α n (1 α) N n C n,n C n,n = # of sequences of length N with n heads. p(n α,m) = N! n!(n n)! αn (1 α) N n The binomial distribution for n given α, N. 11/76

12 Posterior p(α n,m) = N! n!(n n)! αn (1 α) N n p(n M) p(n M) = = N! n!(n n)! 1 N +1 dα α n (1 α) N n p(α n,m) = (N +1)! n!(n n)! αn (1 α) N n Same result as when data specified the actual sequence. 12/76

13 Another Variation: Negative Binomial Suppose D = N, the number of trials it took to obtain a predifined number of successes, n = 8. What is p(α N,M)? Likelihood p(n α,m) is probability for n 1 successes in N 1 trials, times probability that the final trial is a success: p(n α,m) = = (N 1)! (n 1)!(N n)! αn 1 (1 α) N n α (N 1)! (n 1)!(N n)! αn (1 α) N n The negative binomial distribution for N given α, n. 13/76

14 Posterior p(α D,M) = C n,n α n (1 α) N n p(d M) p(d M) = C n,n dα α n (1 α) N n p(α D,M) = Same result as other cases. (N +1)! n!(n n)! αn (1 α) N n 14/76

15 Final Variation: Meteorological Stopping Suppose D = (N,n), the number of samples and number of successes in an observing run whose total number was determined by the weather at the telescope. What is p(α D,M )? (M adds info about weather to M.) Likelihood p(d α,m ) is the binomial distribution times the probability that the weather allowed N samples, W(N): p(d α,m N! ) = W(N) n!(n n)! αn (1 α) N n Let C n,n = W(N) ( N n). We get the same result as before! 15/76

16 Likelihood Principle To define L(H i ) = p(d obs H i,i), we must contemplate what other data we might have obtained. But the real sample space may be determined by many complicated, seemingly irrelevant factors; it may not be well-specified at all. Should this concern us? Likelihood principle: The result of inferences depends only on how p(d obs H i,i) varies w.r.t. hypotheses. We can ignore aspects of the observing/sampling procedure that do not affect this dependence. This happens because no sums of probabilities for hypothetical data appear in Bayesian results; Bayesian calculations condition on D obs. This is a sensible property that frequentist methods do not share. Frequentist probabilities are long run rates of performance, and depend on details of the sample space that are irrelevant in a Bayesian calculation. 16/76

17 Goodness-of-fit Violates the Likelihood Principle Theory (H 0 ) The number of A stars in a cluster should be 0.1 of the total. Observations 5 A stars found out of 96 total stars observed. Theorist s analysis Calculate χ 2 using n A = 9.6 and n X = Significance level is p(> χ 2 H 0 ) = 0.12 (or 0.07 using more rigorous binomial tail area). Theory is accepted. 17/76

18 Observer s analysis Actual observing plan was to keep observing until 5 A stars seen! Random quantity is N tot, not n A ; it should follow the negative binomial dist n. Expect N tot = 50±21. p(> χ 2 H 0 ) = Theory is rejected. Telescope technician s analysis A storm was coming in, so the observations would have ended whether 5 A stars had been seen or not. The proper ensemble should take into account p(storm)... Bayesian analysis The Bayes factor is the same for binomial or negative binomial likelihoods, and slightly favors H 0. Include p(storm) if you want it will drop out! 18/76

19 Inference With Normals/Gaussians Gaussian PDF p(x µ,σ) = 1 σ (x µ) 2 2π e 2σ 2 over [, ] Common abbreviated notation: x N(µ,σ 2 ) Parameters µ = x dx x p(x µ,σ) σ 2 = (x µ) 2 dx (x µ) 2 p(x µ,σ) 19/76

20 Gauss s Observation: Sufficiency Suppose our data consist of N measurements, d i = µ+ǫ i. Suppose the noise contributions are independent, and ǫ i N(0,σ 2 ). p(d µ,σ,m) = i p(d i µ,σ,m) = p(ǫ i = d i µ µ,σ,m) i = 1 [ σ 2π exp (d i µ) 2 ] 2σ 2 i 1 = σ N (2π) N/2e Q(µ)/2σ2 20/76

21 Find dependence of Q on µ by completing the square: Q = i (d i µ) 2 [Note: Q/σ 2 = χ 2 (µ)] = i ( = di 2 + i ) i d 2 i = N(µ d) 2 + µ 2 2 i d i µ +Nµ 2 2Nµd ( i d 2 i ) Nd 2 where d 1 N = N(µ d) 2 +Nr 2 where r 2 1 (d i d) 2 N i i d i 21/76

22 Likelihood depends on {d i } only through d and r: ) ) 1 L(µ,σ) = σ N exp ( Nr2 (2π) N/2 2σ 2 exp ( N(µ d)2 2σ 2 The sample mean and variance are sufficient statistics. This is a miraculous compression of information the normal dist n is highly abnormal in this respect! 22/76

23 Estimating a Normal Mean Problem specification Model: d i = µ+ǫ i, ǫ i N(0,σ 2 ), σ is known I = (σ,m). Parameter space: µ; seek p(µ D, σ, M) Likelihood ) ) 1 p(d µ,σ,m) = σ N exp ( Nr2 (2π) N/2 2σ 2 exp ( N(µ d)2 2σ 2 ) exp ( N(µ d)2 2σ 2 23/76

24 Uninformative prior Translation invariance p(µ) C, a constant. This prior is improper unless bounded. Prior predictive/normalization ) p(d σ,m) = dµ C exp ( N(µ d)2 2σ 2 = C(σ/ N) 2π... minus a tiny bit from tails, using a proper prior. 24/76

25 Posterior ) 1 p(µ D,σ,M) = ( (σ/ N) 2π exp N(µ d)2 2σ 2 Posterior is N(d,w 2 ), with standard deviation w = σ/ N. 68.3% HPD credible region for µ is d ±σ/ N. Note that C drops out limit of infinite prior range is well behaved. 25/76

26 Informative Conjugate Prior Use a normal prior, µ N(µ 0,w 2 0 ). Conjugate because the posterior is also normal. Posterior Normal N( µ, w 2 ), but mean, std. deviation shrink towards prior. Define B = w2, so B < 1 and B = 0 when w w 2 +w0 2 0 is large. Then µ = d +B (µ 0 d) w = w 1 B Principle of stable estimation The prior affects estimates only when data are not informative relative to prior. 26/76

27 Conjugate normal examples: Data have d = 3, σ/ N = 1 Priors at µ 0 = 10, with w = {5,2} Prior L Post ) 0.25 p(x 0.20 ) 0.25 p(x x x /76

28 Estimating a Normal Mean: Unknown σ Problem specification Model: d i = µ+ǫ i, ǫ i N(0,σ 2 ), σ is unknown Parameter space: (µ, σ); seek p(µ D, M) Likelihood ) ) 1 p(d µ,σ,m) = σ N exp ( Nr2 (2π) N/2 2σ 2 exp ( N(µ d)2 2σ 2 1 σ Ne Q/2σ2 where Q = N [ r 2 +(µ d) 2] 28/76

29 Uninformative Priors Assume priors for µ and σ are independent. Translation invariance p(µ) C, a constant. Scale invariance p(σ) 1/σ (flat in logσ). Joint Posterior for µ, σ p(µ,σ D,M) 1 σ N+1e Q(µ)/2σ2 29/76

30 Marginal Posterior p(µ D,M) dσ 1 σ N+1e Q/2σ2 Let τ = Q Q so σ = 2σ 2 2τ and dσ = τ 3/2 Q 2 dτ p(µ D,M) 2 N/2 Q N/2 dτ τ N 2 1 e τ Q N/2 30/76

31 Write Q = Nr 2 [1+ ( ) ] 2 µ d r and normalize: p(µ D,M) = ( N 2 1) [! 1 ( N 2 3 ) 1+ 1 ( ) 2 ] N/2 µ d 2! π r N r/ N Student s t distribution, with t = (µ d) r/ N A bell curve, but with power-law tails Large N: p(µ D,M) e N(µ d)2 /2r 2 This is the rigorous way to adjust σ so χ 2 /dof = 1. It doesn t just plug in a best σ; it slightly broadens posterior to account for σ uncertainty. 31/76

32 Student t examples: 1 p(x) ( )n+1 1+ x2 2 n Location = 0, scale = 1 Degrees of freedom = {1,2,3,5,10, } normal p(x) x 32/76

33 Gaussian Background Subtraction Measure background rate b = ˆb ±σ b with source off. Measure total rate r =ˆr ±σ r with source on. Infer signal source strength s, where r = s +b. With flat priors, [ ] (b ˆb) 2 p(s,b D,M) exp 2σ 2 b exp [ ] (s +b ˆr)2 2σr 2 33/76

34 Marginalize b to summarize the results for s (complete the square to isolate b dependence; then do a simple Gaussian integral over b): ] (s ŝ)2 p(s D,M) exp [ 2σs 2 ŝ = ˆr ˆb σ 2 s = σ2 r +σ2 b Background subtraction is a special case of background marginalization; i.e., marginalization told us to subtract a background estimate. Recall the standard derivation of background uncertainty via propagation of errors based on Taylor expansion (statistician s Delta-method). Marginalization provides a generalization of error propagation without approximation! 34/76

35 Bayesian Curve Fitting & Least Squares Setup Data D = {d i } are measurements of an underlying function f(x;θ) at N sample points {x i }. Let f i (θ) f(x i ;θ): d i = f i (θ)+ǫ i, ǫ i N(0,σ 2 i ) We seek learn θ, or to compare different functional forms (model choice, M). Likelihood [ N 1 p(d θ,m) = exp 1 ( ) ] di f i (θ) 2 σ i=1 i 2π 2 σ i [ exp 1 ( ) ] di f i (θ) 2 2 σ i i [ ] = exp χ2 (θ) 2 35/76

36 Bayesian Curve Fitting & Least Squares Posterior For prior density π(θ), p(θ D,M) π(θ)exp If you have a least-squares or χ 2 code: Think of χ 2 (θ) as 2logL(θ). [ ] χ2 (θ) 2 Bayesian inference amounts to exploration and numerical integration of π(θ)e χ2 (θ)/2. 36/76

37 Important Case: Separable Nonlinear Models A (linearly) separable model has parameters θ = (A, ψ): Linear amplitudes A = {A α } Nonlinear parameters ψ f(x;θ) is a linear superposition of M nonlinear components g α (x;ψ): M d i = A α g α (x i ;ψ)+ǫ i or α=1 d = A α g α (ψ)+ ǫ. α Why this is important: You can marginalize over A analytically Bretthorst algorithm ( Bayesian Spectrum Analysis & Param. Est n 1988) Algorithm is closely related to linear least squares/diagonalization/svd. 37/76

38 Poisson Dist n: Infer a Rate from Counts Problem: Observe n counts in T; infer rate, r Likelihood L(r) p(n r,m) = p(n r,m) = (rt)n e rt n! Prior Two simple standard choices (or conjugate gamma dist n): r known to be nonzero; it is a scale parameter: p(r M) = 1 ln(r u /r l ) r may vanish; require p(n M) Const: 1 r p(r M) = 1 r u 38/76

39 Prior predictive p(n M) = 1 r u 1 n! = Posterior A gamma distribution: ru 0 rut dr(rt) n e rt 1 1 r u T n! 0 1 r u T for r u n T p(r n,m) = T(rT)n e rt n! d(rt)(rt) n e rt 39/76

40 Gamma Distributions A 2-parameter family of distributions over nonnegative x, with shape parameter α and scale parameter s: p Γ (x α,s) = 1 ( x ) α 1e x/s sγ(α) s Moments: E(x) = sα Var(x) = s 2 α Our posterior corresponds to α = n+1, s = 1/T. Mode ˆr = n n+1 T ; mean r = T (shift down 1 with 1/r prior) Std. dev n σ r = n+1 T ; credible regions found by integrating (can use incomplete gamma function) 40/76

41 41/76

42 The flat prior Bayes s justification: Not that ignorance of r p(r I) = C Require (discrete) predictive distribution to be flat: p(n I) = dr p(r I)p(n r,i) = C Useful conventions p(r I) = C Use a flat prior for a rate that may be zero Use a log-flat prior ( 1/r) for a nonzero scale parameter Use proper (normalized, bounded) priors Plot posterior with abscissa that makes prior flat 42/76

43 Basic problem The On/Off Problem Look off-source; unknown background rate b Count N off photons in interval T off Look on-source; rate is r = s +b with unknown signal s Count N on photons in interval T on Infer s Conventional solution ˆb = N off /T off ; σ b = N off /T off ˆr = N on /T on ; σ r = N on /T on ŝ = ˆr ˆb; σ s = σr 2 +σb 2 But ŝ can be negative! 43/76

44 Examples Spectra of X-Ray Sources Bassani et al Di Salvo et al /76

45 Spectrum of Ultrahigh-Energy Cosmic Rays Nagano & Watson 2000 HiRes Team 2007 Flux*E 3 /10 24 (ev 2 m -2 s -1 sr -1 ) 10 1 HiRes-2 Monocular HiRes-1 Monocular AGASA log 10 (E) (ev) 45/76

46 N is Never Large Sample sizes are never large. If N is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once N is large enough, you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc etc). N is never enough because if it were enough you d already be on to the next problem for which you need more data. Andrew Gelman (blog entry, 31 July 2005) 46/76

47 N is Never Large Sample sizes are never large. If N is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once N is large enough, you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc etc). N is never enough because if it were enough you d already be on to the next problem for which you need more data. Similarly, you never have quite enough money. But that s another story. Andrew Gelman (blog entry, 31 July 2005) 46/76

48 Bayesian Solution to On/Off Problem First consider off-source data; use it to estimate b: p(b N off,i off ) = T off(bt off ) N offe bt off N off! Use this as a prior for b to analyze on-source data. For on-source analysis I all = (I on,n off,i off ): p(s,b N on ) p(s)p(b)[(s +b)t on ] Non e (s+b)ton I all p(s I all ) is flat, but p(b I all ) = p(b N off,i off ), so p(s,b N on,i all ) (s +b) Non b N off e ston e b(ton+t off) 47/76

49 Now marginalize over b; p(s N on,i all ) = db p(s,b N on,i all ) db (s +b) Non b N off e ston e b(ton+t off) Expand (s +b) Non and do the resulting Γ integrals: p(s N on,i all ) = C i N on T on (st on ) i e ston C i i! i=0 ( 1+ T ) i off (N on +N off i)! T on (N on i)! Posterior is a weighted sum of Gamma distributions, each assigning a different number of on-source counts to the source. (Evaluate via recursive algorithm or confluent hypergeometric function.) 48/76

50 Example On/Off Posteriors Short Integrations T on = 1 49/76

51 Example On/Off Posteriors Long Background Integrations T on = 1 50/76

52 Recap of Key Findings From Examples Sufficiency: Model-dependent summary of data Conjugate priors Likelihood principle Marginalization: Generalizes background subtraction, propagation of errors Student s t for handling σ uncertainty Exact treatment of Poisson background uncertainty (don t subtract!) 51/76

53 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 52/76

54 Complications With Survey Data Selection effects (truncation, censoring) obvious (usually) Typically treated by correcting data Most sophisticated: product-limit estimators Scatter effects (measurement error, etc.) insidious Typically ignored (average out???) 53/76

55 Many Guises of Measurement Error Auger data above GZK cutoff (PAO 2007; Soiaporn ) QSO hardness vs. luminosity (Kelly 2007, 2011) 54/76

56 Accounting For Measurement Error Introduce latent/hidden/incidental parameters Suppose f(x θ) is a distribution for an observable, x. From N precisely measured samples, {x i }, we can infer θ from L(θ) p({x i } θ) = i f(x i θ) 55/76

57 Graphical representation Nodes/vertices = uncertain quantities Edges specify conditional dependence Absence of an edge denotes conditional independence θ x 1 x 2 x N L(θ) p({x i } θ) = i f(x i θ) 56/76

58 But what if the x data are noisy, D i = {x i +ǫ i }? We should somehow incorporate l i (x i ) = p(d i x i ) L(θ,{x i }) p({d i } θ,{x i }) = l i (x i )f(x i θ) i Marginalize over {x i } to summarize inferences for θ. Marginalize over θ to summarize inferences for {x i }. Key point: Maximizing over x i and integrating over x i can give very different results! 57/76

59 To estimate x 1 : L m (x 1 ) = dθ l 1 (x 1 )f(x 1 θ) = dθ l 1 (x 1 )f(x 1 θ)l 1 (θ) dx 2... N dx N l i (x i )f(x i θ) i=2 l 1 (x 1 )f(x 1 ˆθ) with ˆθ determined by the remaining data. f(x 1 ˆθ) behaves like a prior that shifts the x 1 estimate away from the peak of l 1 (x i ). This generalizes the corrections derived by Malmquist and Lutz-Kelker. Landy & Szalay (1992) proposed adaptive Malmquist corrections that can be viewed as an approximation to this. 58/76

60 Graphical representation θ x 1 x 2 x N D 1 D 2 D N L(θ,{x i }) p({d i } θ,{x i }) = i p(d i x i )f(x i θ) = i l i (x i )f(x i θ) A two-level multi-level model (MLM). 59/76

61 Bayesian MLMs in Astronomy Surveys (number counts/ log N log S /Malmquist): GRB peak flux dist n (Loredo & Wasserman 1998) TNO/KBO magnitude distribution (Gladman ; Petit ) MLM tutorial; Malmquist-type biases in cosmology (Loredo & Hendry 2009 in BMIC book) Extreme deconvolution for proper motion surveys (Bovy, Hogg, & Roweis 2011) Directional & spatio-temporal coincidences: GRB repetition (Luo ; Graziani ) GRB host ID (Band 1998; Graziani ) VO cross-matching (Badavári & Szalay 2008) 60/76

62 Time series: SN 1987A neutrinos, uncertain energy vs. time (Loredo & Lamb 2002) Multivariate Bayesian Blocks (Dobigeon, Tourneret & Scargle 2007) SN Ia multicolor light curve modeling (Mandel , 2011) Linear regression with measurement error: QSO hardness vs. luminosity (Kelly 2007, 2011) See last year s Surveys School notes for more! 61/76

63 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 62/76

64 Statistical Integrals Inference with independent data Consider N data, D = {x i }; and model M with m parameters. Suppose L(θ) = p(x 1 θ)p(x 2 θ) p(x N θ). Frequentist integrals Find long-run properties of procedures via sample space integrals: I(θ) = dx 1 p(x 1 θ) dx 2 p(x 2 θ) dx N p(x N θ)f(d,θ) Rigorous analysis must explore the θ dependence; rarely done in practice. Plug-in approximation: Report properties of procedure for θ = ˆθ. Asymptotically accurate (for large N, expect ˆθ θ). Plug-in results are easy via Monte Carlo (due to independence). 63/76

65 Bayesian integrals d m θ g(θ)p(θ M)L(θ) = d m θ g(θ)q(θ) p(θ M) L(θ) g(θ) = 1 p(d M) (norm. const., model likelihood) g(θ) = box credible region g(θ) = θ posterior mean for θ Such integrals are sometimes easy if analytic (especially in low dimensions), often easier than frequentist counterparts (e.g., normal credible regions, Student s t). Asymptotic approximations: Require ingredients familiar from frequentist calculations. Bayesian calculation is not significantly harder than frequentist calculation in this limit. Numerical calculation: For large m (> 4 is often enough!) the integrals are often very challenging because of structure (e.g., correlations) in parameter space. This is usually pursued without making any procedural approximations. 64/76

66 Bayesian Computation Large sample size: Laplace approximation Approximate posterior as multivariate normal det(covar) factors Uses ingredients available in χ 2 /ML fitting software (MLE, Hessian) Often accurate to O(1/N) Modest-dimensional models (d< 10 to 20) Adaptive cubature Monte Carlo integration (importance & stratified sampling, adaptive importance sampling, quasirandom MC) High-dimensional models (d> 5) Posterior sampling create RNG that samples posterior MCMC is most general framework Murali s lab 65/76

67 See SCMA 5 Bayesian Computation tutorial notes for more! 66/76

68 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 67/76

69 Tools for Computational Bayes Astronomer/Physicist Tools BIE Bayesian Inference Engine: General framework for Bayesian inference, tailored to astronomical and earth-science survey data. Built-in database capability to support analysis of terabyte-scale data sets. Inference is by Bayes via MCMC. CIAO/Sherpa On/off marginal likelihood support, and Bayesian Low-Count X-ray Spectral (BLoCXS) analysis via MCMC via the pyblocxs extension XSpec Includes some basic MCMC capability CosmoMC Parameter estimation for cosmological models using CMB, etc., via MCMC MultiNest Bayesian inference via an approximate implementation of the nested sampling algorithm ExoFit Adaptive MCMC for fitting exoplanet RV data extreme-deconvolution Multivariate density estimation with measurement error, via a multivariate normal finite mixture model; partly Bayesian; Python & IDL wrappers 68/76

70 Astronomer/Physicist Tools, cont d... root/roostats Statistical tools for particle physicists; Bayesian support being incorporated CDF Bayesian Limit Software Limits for Poisson counting processes, with background & efficiency uncertainties SuperBayeS Bayesian exploration of supersymmetric theories in particle physics using the MultiNest algorithm; includes a MATLAB GUI for plotting CUBA Multidimensional integration via adaptive cubature, adaptive importance sampling & stratification, and QMC (C/C++, Fortran, and Mathematica; R interface also via 3rd-party R2Cuba) Cubature Subregion-adaptive cubature in C, with a 3rd-party R interface; intended for low dimensions (< 7) APEMoST Automated Parameter Estimation and Model Selection Toolkit in C, a general-purpose MCMC environment that includes parallel computing support via MPI; motivated by asteroseismology problems Inference Forthcoming at Several self-contained Bayesian modules; Parametric Inference Engine 69/76

71 Python PyMC A framework for MCMC via Metropolis-Hastings; also implements Kalman filters and Gaussian processes. Targets biometrics, but is general. SimPy SimPy (rhymes with Blimpie ) is a process-oriented public-domain package for discrete-event simulation. RSPython Bi-directional communication between Python and R MDP Modular toolkit for Data Processing: Current emphasis is on machine learning (PCA, ICA...). Modularity allows combination of algorithms and other data processing elements into flows. Orange Component-based data mining, with preprocessing, modeling, and exploration components. Python/GUI interfaces to C + + implementations. Some Bayesian components. ELEFANT Machine learning library and platform providing Python interfaces to efficient, lower-level implementations. Some Bayesian components (Gaussian processes; Bayesian ICA/PCA). 70/76

72 R and S CRAN Bayesian task view Overview of many R packages implementing various Bayesian models and methods; pedagogical packages; packages linking R to other Bayesian software (BUGS, JAGS) Omega-hat RPython, RMatlab, R-Xlisp BOA Bayesian Output Analysis: Convergence diagnostics and statistical and graphical analysis of MCMC output; can read BUGS output files. CODA Convergence Diagnosis and Output Analysis: Menu-driven R/S plugins for analyzing BUGS output R2Cuba R interface to Thomas Hahn s Cuba library (see above) for deterministic and Monte Carlo cubature 71/76

73 Java Hydra HYDRA provides methods for implementing MCMC samplers using Metropolis, Metropolis-Hastings, Gibbs methods. In addition, it provides classes implementing several unique adaptive and multiple chain/parallel MCMC methods. YADAS Software system for statistical analysis using MCMC, based on the multi-parameter Metropolis-Hastings algorithm (rather than parameter-at-a-time Gibbs sampling) Omega-hat Java environment for statistical computing, being developed by XLisp-stat and R developers 72/76

74 C/C++/Fortran BayeSys 3 Sophisticated suite of MCMC samplers including transdimensional capability, by the author of MemSys fbm Flexible Bayesian Modeling: MCMC for simple Bayes, nonparametric Bayesian regression and classification models based on neural networks and Gaussian processes, and Bayesian density estimation and clustering using mixture models and Dirichlet diffusion trees BayesPack, DCUHRE Adaptive quadrature, randomized quadrature, Monte Carlo integration BIE, CDF Bayesian limits, CUBA (see above) 73/76

75 Other Statisticians & Engineers Tools BUGS/WinBUGS Bayesian Inference Using Gibbs Sampling: Flexible software for the Bayesian analysis of complex statistical models using MCMC OpenBUGS BUGS on Windows and Linux, and from inside the R JAGS Just Another Gibbs Sampler; MCMC for Bayesian hierarchical models XLisp-stat Lisp-based data analysis environment, with an emphasis on providing a framework for exploring the use of dynamic graphical methods ReBEL Library supporting recursive Bayesian estimation in Matlab (Kalman filter, particle filters, sequential Monte Carlo). 74/76

76 Key Examples 1 Simple examples Binary Outcomes Normal Distribution Poisson Distribution 2 Multilevel models for measurement error 3 Bayesian calculation 4 Bayesian Software 5 Closing Reflections 75/76

77 Philip Dawid (2000) Closing Reflections What is the principal distinction between Bayesian and classical statistics? It is that Bayesian statistics is fundamentally boring. There is so little to do: just specify the model and the prior, and turn the Bayesian handle. There is no room for clever tricks or an alphabetic cornucopia of definitions and optimality criteria. I have heard people use this dullness as an argument against Bayesianism. One might as well complain that Newton s dynamics, being based on three simple laws of motion and one of gravitation, is a poor substitute for the richness of Ptolemy s epicyclic system. All my experience teaches me that it is invariably more fruitful, and leads to deeper insights and better data analyses, to explore the consequences of being a thoroughly boring Bayesian. Dennis Lindley (2000) The philosophy places more emphasis on model construction than on formal inference... I do agree with Dawid that Bayesian statistics is fundamentally boring...my only qualification would be that the theory may be boring but the applications are exciting. 76/76

Introduction to Bayesian inference: Key examples

Introduction to Bayesian inference: Key examples Introduction to Bayesian inference: Key examples Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ IAC Winter School, 3 4 Nov 2014 1/49 Key examples: 3

More information

Introduction to Bayesian Inference: Supplemental Topics

Introduction to Bayesian Inference: Supplemental Topics Introduction to Bayesian Inference: Supplemental Topics Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ CASt Summer School 5 June 2014 1/42 Supplemental

More information

Introduction to Bayesian Inference Lecture 4: Multilevel Models for Measurement Error, Basic Bayesian Computation

Introduction to Bayesian Inference Lecture 4: Multilevel Models for Measurement Error, Basic Bayesian Computation Introduction to Bayesian Inference Lecture 4: Multilevel Models for Measurement Error, Basic Bayesian Computation Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/

More information

Introduction to Bayesian Inference: Supplemental Topics

Introduction to Bayesian Inference: Supplemental Topics Introduction to Bayesian Inference: Supplemental Topics Tom Loredo Cornell Center for Astrophysics and Planetary Science http://www.astro.cornell.edu/staff/loredo/bayes/ CASt Summer School 7 8 June 2017

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference Introduction to Bayesian Inference Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ June 10, 2006 Outline 1 The Big Picture 2 Foundations Axioms, Theorems

More information

Learning How To Count: Inference With Poisson Process Models (Lecture 2)

Learning How To Count: Inference With Poisson Process Models (Lecture 2) Learning How To Count: Inference With Poisson Process Models (Lecture 2) Tom Loredo Dept. of Astronomy, Cornell University Motivation We consider processes that produces discrete, isolated events in some

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

Two Lectures: Bayesian Inference in a Nutshell and The Perils & Promise of Statistics With Large Data Sets & Complicated Models

Two Lectures: Bayesian Inference in a Nutshell and The Perils & Promise of Statistics With Large Data Sets & Complicated Models Two Lectures: Bayesian Inference in a Nutshell and The Perils & Promise of Statistics With Large Data Sets & Complicated Models Bayesian-Frequentist Cross-Fertilization Tom Loredo Dept. of Astronomy, Cornell

More information

Multilevel modeling of cosmic populations

Multilevel modeling of cosmic populations Multilevel modeling of cosmic populations Tom Loredo Center for Radiophysics & Space Research, Cornell University MLM team: Martin Hendry, David Ruppert, David Chernoff, Ira Wasserman MLMs on GPUs team:

More information

Why Try Bayesian Methods? (Lecture 5)

Why Try Bayesian Methods? (Lecture 5) Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random

More information

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Hierarchical Bayesian modeling

Hierarchical Bayesian modeling Hierarchical Bayesian modeling Tom Loredo Cornell Center for Astrophysics and Planetary Science http://www.astro.cornell.edu/staff/loredo/bayes/ SAMSI ASTRO 19 Oct 2016 1 / 51 1970 baseball averages Efron

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Hierarchical Bayesian Modeling

Hierarchical Bayesian Modeling Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an

More information

Hierarchical Bayesian Modeling

Hierarchical Bayesian Modeling Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State Astronomical Populations Once we discover an

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Statistics for Data Analysis a toolkit for the (astro)physicist

Statistics for Data Analysis a toolkit for the (astro)physicist Statistics for Data Analysis a toolkit for the (astro)physicist Likelihood in Astrophysics Denis Bastieri Padova, February 21 st 2018 RECAP The main product of statistical inference is the pdf of the model

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Hierarchical Bayesian Modeling of Planet Populations

Hierarchical Bayesian Modeling of Planet Populations Hierarchical Bayesian Modeling of Planet Populations You ve found planets in your data (or not)...... now what?! Angie Wolfgang NSF Postdoctoral Fellow, Penn State Why Astrostats? This week: a sample of

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Introduction to Bayes

Introduction to Bayes Introduction to Bayes Alan Heavens September 3, 2018 ICIC Data Analysis Workshop Alan Heavens Introduction to Bayes September 3, 2018 1 / 35 Overview 1 Inverse Problems 2 The meaning of probability Probability

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Cosmological parameters A branch of modern cosmological research focuses on measuring

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Statistical Tools and Techniques for Solar Astronomers

Statistical Tools and Techniques for Solar Astronomers Statistical Tools and Techniques for Solar Astronomers Alexander W Blocker Nathan Stein SolarStat 2012 Outline Outline 1 Introduction & Objectives 2 Statistical issues with astronomical data 3 Example:

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling 2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Inference Control and Driving of Natural Systems

Inference Control and Driving of Natural Systems Inference Control and Driving of Natural Systems MSci/MSc/MRes nick.jones@imperial.ac.uk The fields at play We will be drawing on ideas from Bayesian Cognitive Science (psychology and neuroscience), Biological

More information

1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008

1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008 1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008 1.1 Introduction Why do we need statistic? Wall and Jenkins (2003) give a good description of the scientific analysis

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department Introduction to Bayesian Statistics James Swain University of Alabama in Huntsville ISEEM Department Author Introduction James J. Swain is Professor of Industrial and Systems Engineering Management at

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information