Generalized Fiducial Inference

Size: px
Start display at page:

Download "Generalized Fiducial Inference"

Transcription

1 Generalized Fiducial Inference Parts of this short course are joint work with T. C.M Lee (UC Davis), H. Iyer (NIST) Randy Lai (U of Maine), J. Williams (UNC), Y. Cui (UNC), BFF 2018 Jan Hannig a University of North Carolina at Chapel Hill a NSF support acknowledged

2 outline Outline Introduction Definition Theoretical Results Applications Conclusions 1

3 introduction Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

4 introduction Fiducial? Oxford English Dictionary adjective technical (of a point or line) used as a fixed basis of comparison. Origin from Latin fiducia trust, confidence Merriam-Webster dictionary 1. taken as standard of reference a fiducial mark 2. founded on faith or trust 3. having the nature of a trust : fiduciary 2

5 introduction Long, long, long time ago Bayes Theorem: P (θ X) = f(x θ)π(θ) Θ f(x θ)π(θ)dθ. 3

6 introduction Long, long, long time ago Bayes Theorem: P (θ X) = Bayes-Laplace postulate: f(x θ)π(θ) Θ f(x θ)π(θ)dθ. When nothing is known about the parameter in advance, let the prior be so that all values of the parameter are equally likely. 3

7 introduction Long, long, time ago Not knowing the chance of mutually exclusive events and knowing the chance to be equal are two quite different states of knowledge R. A. Fisher (1930) 4

8 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition 5

9 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes 5

10 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference 5

11 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities 5

12 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities Dawid and Stone (1982) theoretical results for simple cases. 5

13 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities Dawid and Stone (1982) theoretical results for simple cases. Barnard (1995) pivotal based methods. 5

14 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities Dawid and Stone (1982) theoretical results for simple cases. Barnard (1995) pivotal based methods. Weerahandi (1989, 1993) generalized inference. 5

15 introduction Fiducial Inspired Inference since

16 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) 6

17 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) 6

18 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) 6

19 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) Higher order likelihood, tangent exponential family, r, Reid & Fraser (2010) 6

20 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) Higher order likelihood, tangent exponential family, r, Reid & Fraser (2010) Objective Bayesian inference, e.g., reference prior Berger, Bernardo & Sun (2009, 2012). 6

21 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) Higher order likelihood, tangent exponential family, r, Reid & Fraser (2010) Objective Bayesian inference, e.g., reference prior Berger, Bernardo & Sun (2009, 2012). Fiducial Inference H, Iyer & Patterson (2006), H (2009, 2013), H & Lee (2009), Taraldsen & Lindqvist (2013), Veronese & Melilli (2015), H, Iyer, Lai & Lee (2016) 6

22 introduction Aims Explain the definition of generalized fiducial distribution 7

23 introduction Aims Explain the definition of generalized fiducial distribution Discuss theoretical results 7

24 introduction Aims Explain the definition of generalized fiducial distribution Discuss theoretical results Show successful applications 7

25 introduction Aims Explain the definition of generalized fiducial distribution Discuss theoretical results Show successful applications My point of view is frequentist Justified using asymptotic theorems and simulations. GFI shows very good repeated sampling performance in applications. 7

26 definition Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

27 definition Comparison to likelihood Density is the function f(x, ξ), where ξ is fixed and x is variable. 8

28 definition Comparison to likelihood Density is the function f(x, ξ), where ξ is fixed and x is variable. Likelihood is the function f(x, ξ), where ξ is variable and x is fixed. 8

29 definition Comparison to likelihood Density is the function f(x, ξ), where ξ is fixed and x is variable. Likelihood is the function f(x, ξ), where ξ is variable and x is fixed. Likelihood as a distribution? 8

30 definition Data generating equation Data generating equation (DGE) X = G(U, ξ), U is a random with known distribution (iid U(0, 1)) Parameter ξ is fixed. Generate Xs by generating Us and DGE. This determines sampling distribution 9

31 definition Data generating equation Data generating equation (DGE) x = G(U, ξ ), U is a random with known distribution (iid U(0, 1)) Data x is fixed Generate ξ by generating Us and inverting DGE. This determines fiducial distribution Denote the inverse Q x(u ). 9

32 definition Data generating equation Data generating equation (DGE) x = G(U, ξ ), U is a random with known distribution (iid U(0, 1)) Data x is fixed Generate ξ by generating Us and inverting DGE. This determines fiducial distribution Denote the inverse Q x(u ). Issues: Multiple solutions and no solutions. 9

33 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{U i p}, U i Uniform(0,1) Generating U i samples Bernoulli(p). 10

34 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{U i p }, U i Uniform(0,1) Estimating U i by U i defines fiducial distribution 10

35 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{Ui p }, Ui Uniform(0,1) Estimating U i by Ui defines fiducial distribution If x i = 1, then p [Ui, 1] x i = 1 p : [ ] 0 1 U i 10

36 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{Ui p }, Ui Uniform(0,1) Estimating U i by Ui defines fiducial distribution If x i = 0, then p [0, Ui ] x i = 0 p : [ ] 0 Ui 1 10

37 definition Main Idea Example -- Binomial Data generating equation X 1 = 1{U 1 p}, X 2 = 1{U 2 p} U 1, U 2 i.i.d. Uniform(0,1) 11

38 definition Main Idea Example -- Binomial Data generating equation X 1 = 1{U 1 p}, X 2 = 1{U 2 p} U 1, U 2 i.i.d. Uniform(0,1) If X 1 = 0, X 2 = 1 and U 1 < U 2 x 1 = 0 x 2 = 1 p : [ ] [ ] 0 U1 U2 1 No solution! Remove (U1, U 2 ) inconsistent with data. 11

39 definition Main Idea Example -- Binomial Data generating equation X 1 = 1{U 1 p}, X 2 = 1{U 2 p} U 1, U 2 i.i.d. Uniform(0,1) If X 1 = 0, X 2 = 1 and U 1 > U 2 x 1 = 0 x 2 = 1 p : [ [ ] ] 0 U2 U1 1 (U1, U 2 ) {U 1 > U 2 } estimates (u 1, u 2 ). 11

40 definition Main Idea Example -- Binomial (X 1,... X n ) iid Bernoulli(p), S = n i=1 X i Binomial(n, p) 12

41 definition Main Idea Example -- Binomial (X 1,... X n ) iid Bernoulli(p), S = n i=1 X i Binomial(n, p) Condition U on having a solution for p Q x (U ) p : [ [ [ [ ] ] ] 0 U1:n U 2:n Us:n U(s+1):n U n:n 1 12

42 definition Main Idea Example -- Binomial (X 1,... X n ) iid Bernoulli(p), S = n i=1 X i Binomial(n, p) Condition U on having a solution for p Q x (U ) p : [ [ [ [ ] ] ] 0 U1:n U 2:n Us:n U(s+1):n U n:n 1 Select a point in the interval. A particular choice results in Beta(s + 1/2, n s + 1/2) 12

43 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. 13

44 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise 13

45 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise Estimate u by conditional U x 2 x 1 = U 2 U 1,..., x n x 1 = U n U 1 ; 13

46 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise Estimate u by conditional U x 2 x 1 = U 2 U 1,..., x n x 1 = U n U 1 ; Fiducial density r(µ x) n i=1 (1 + (µ x i) 2 ) 1. 13

47 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise Estimate u by conditional U x 2 x 1 = U 2 U 1,..., x n x 1 = U n U 1 ; Fiducial density r(µ x) n i=1 (1 + (µ x i) 2 ) 1. Location problem same as posterior computed using Jeffreys prior 13

48 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i 14

49 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i A distribution on the parameter space is Generalized Fiducial Distribution if it can be obtained as a limit (as ε 0) of arg min x G(U, ξ) {min x G(U, ξ) ε} (1) ξ ξ 14

50 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i A distribution on the parameter space is Generalized Fiducial Distribution if it can be obtained as a limit (as ε 0) of arg min x G(U, ξ) {min x G(U, ξ) ε} (1) ξ ξ Similar to ABC; generating from prior replaced by min. 14

51 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i A distribution on the parameter space is Generalized Fiducial Distribution if it can be obtained as a limit (as ε 0) of arg min x G(U, ξ) {min x G(U, ξ) ε} (1) ξ ξ Similar to ABC; generating from prior replaced by min. Is this practical? Can we compute? 14

52 definition Formal Defintion Explicit limit (1) Assume X R n is continuous; parameter ξ R p The limit in (1) has density (H, Iyer, Lai & Lee, 2016) r(ξ x) = f X (x ξ)j(x, ξ) Ξ f X(x ξ )J(x, ξ ) dξ, ( ) where J(x, ξ) = D ξ G(u, ξ) u=g 1 (x,ξ) n = p gives D(A) = det A 15

53 definition Formal Defintion Explicit limit (1) Assume X R n is continuous; parameter ξ R p The limit in (1) has density (H, Iyer, Lai & Lee, 2016) r(ξ x) = f X (x ξ)j(x, ξ) Ξ f X(x ξ )J(x, ξ ) dξ, ( ) where J(x, ξ) = D ξ G(u, ξ) u=g 1 (x,ξ) n = p gives D(A) = det A 2 gives D(A) = (det A A) 1/2 gives D(A) = 1 gives D(A) = i=(i 1,...,i p) i=(i 1,...,i p) det(a) i w i det(a) i 15

54 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 16

55 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). 16

56 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). d dθ [θ + (θ2 θ)u i ] = 1 + (2θ 1)U i, with U i = X i θ θ 2 θ. 16

57 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). d dθ [θ + (θ2 θ)u i ] = 1 + (2θ 1)U i, with U i = X i θ θ 2 θ. Jacobian 1 + (2θ 1)(x 1 θ) θ 2 θ J(x, θ) = D. = 1 x 1 (2θ 1) θ (2θ 1)(xn θ) θ 2 θ θ 2 θ D. x n (2θ 1) θ 2 16

58 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). d dθ [θ + (θ2 θ)u i ] = 1 + (2θ 1)U i, with U i = X i θ θ 2 θ. Jacobian 1 + (2θ 1)(x 1 θ) θ 2 θ J(x, θ) = D. = 1 x 1 (2θ 1) θ (2θ 1)(xn θ) θ 2 θ = n x(2θ 1) θ2 θ 2 θ for L. θ 2 θ D. x n (2θ 1) θ 2 16

59 definition Formal Defintion Example -- Uniform(θ, θ 2 ) Reference prior (Berger, Bernardo & Sun, 2009) ( 2θ 1) π(θ) = eψ (2θ 1). θ 2 θ 17

60 definition Formal Defintion Example -- Uniform(θ, θ 2 ) Reference prior (Berger, Bernardo & Sun, 2009) ( 2θ 1) π(θ) = eψ (2θ 1). θ 2 θ reference prior vs fiducial Jacobian 17

61 definition Formal Defintion Example -- Uniform(θ, θ 2 ) Reference prior (Berger, Bernardo & Sun, 2009) ( 2θ 1) π(θ) = eψ (2θ 1). θ 2 θ reference prior vs fiducial Jacobian In simulations fiducial was marginally better than reference prior which was much better than flat prior. 17

62 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz 18

63 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. 18

64 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. ( ) Jacobian J(y, β, σ) = D = σ 1 D(X, y) X, y Xβ σ 18

65 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. ( ) Jacobian J(y, β, σ) = D = σ 1 D(X, y) X, y Xβ σ = σ 1 det(x T X) 1/2 (RSS) 1/2 for L 2. 18

66 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. ( ) Jacobian J(y, β, σ) = D = σ 1 D(X, y) X, y Xβ σ = σ 1 det(x T X) 1/2 (RSS) 1/2 for L 2. Same as independence Jeffreys, explicit normalizing constant 18

67 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. 19

68 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n 1 i=1. σ(1+ γx i σ ) 1+1/γ 19

69 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i Jacobian evaluated at u i = ( 1 + γx i d dσ G(u i, γ, σ) = xi σ. σ ) 1+1/γ. σ ) 1/γ 19

70 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i Jacobian evaluated at u i = ( 1 + γx i d dσ G(u i, γ, σ) = xi σ. d dγ G(u i, γ, σ) = xi γ γxi σ(1+ + σ σ ) 1+1/γ. σ ) 1/γ ) log(1+ γx i σ ) γ 2. 19

71 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i Jacobian evaluated at u i = ( 1 + γx i d dσ G(u i, γ, σ) = xi σ. σ ) 1+1/γ. σ ) 1/γ γxi σ(1+ ) log(1+ γx i σ ) + γ. 2 d dγ G(u i, γ, σ) = xi σ γ ( x γx 1 σ J(x, γ, σ) = γ 2 D. ( x n 1 + γx n σ ) log ( 1 + γx 1 σ ). ( log 1 + γx n σ ) ) 19

72 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i. σ ) 1+1/γ Jacobian evaluated at u i = ( 1 + γx ) i 1/γ σ d dσ G(u i, γ, σ) = xi σ. d γxi dγ G(u i, γ, σ) = xi σ(1+ ) γ + σ log(1+ γx i σ ) γ. ( 2 x γx 1 ) ( σ log 1 + γx 1 ) σ J(x, γ, σ) = γ 2 D. (. 1 + γx n ) ( log 1 + γx n ) = i<j x n x j (1+ γx ) ( i log σ 1+ γx i σ σ ( ) x i 1+ γx j σ γ 2 σ ) ( log 1+ γx j σ ) for L. 19

73 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution 20

74 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution Negative Binomial Distribution (compare to Binomial) 20

75 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution Negative Binomial Distribution (compare to Binomial) T distribution (might not have a pretty form) 20

76 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution Negative Binomial Distribution (compare to Binomial) T distribution (might not have a pretty form) Your favorite model 20

77 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! 21

78 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! Recall: arg min x G(U, ξ) {min x G(U, ξ) ε} ξ ξ 21

79 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! Recall: arg min x G(U, ξ) {min x G(U, ξ) ε} ξ ξ Optimization problem: min ξ i x i G i (U, ξ) 21

80 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! Recall: arg min x G(U, ξ) {min x G(U, ξ) ε} ξ ξ Optimization problem: min ξ i x i G i (U, ξ) G(U, ξ) is nearly linear in ξ on near x 21

81 definition Formal Defintion L 1 minimum Objective function: locally linear, locally convex 22

82 definition Formal Defintion L 1 minimum Objective function: locally linear, locally convex At minimum: p coordinates equal to x KKT condition: 0 x G(U, ξ) 1, i.e., 0 = λ i x G i (U, ξ), where {1} x i G i (U, ξ) > 0 λ i { 1} [ 1, 1] x i G i (U, ξ) < 0 x i G i (U, ξ) = 0 G = (.1,.25,.2) ξ + (.2,.1,.3) x = (0, 0, 0) minimum at ξ =.4, G = (0.24, 0, 0.22) 22

83 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) 23

84 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) 23

85 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) Condition on remaining of the equations and KKT condition 23

86 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) Condition on remaining of the equations and KKT condition Final formula r(ξ x) J(x, ξ)f(x ξ), J(x, ξ) = w i det( ξ G i (u, ξ)) u=g 1 (x,ξ). i=(i 1,...,i k ) 23

87 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) Condition on remaining of the equations and KKT condition Final formula r(ξ x) J(x, ξ)f(x ξ), J(x, ξ) = w i det( ξ G i (u, ξ)) u=g 1 (x,ξ). i=(i 1,...,i k ) The KKT factor w i = P ( λ [ 1, 1] p : λ ξ G i (u, ξ)+r ξ G i (u, ξ) = 0), where u = G 1 (x, ξ), R = (R 1,..., R n p ) i.i.d. Rademacher. 23

88 theoretical results Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

89 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper 24

90 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) 24

91 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) GFD is not invariant to smooth transformation of the data if n > p 24

92 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) GFD is not invariant to smooth transformation of the data if n > p Consequently: GFD does not satisfy likelihood principle. 24

93 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) GFD is not invariant to smooth transformation of the data if n > p Consequently: GFD does not satisfy likelihood principle. Adding a multiple of a column to another column does not alter D(A). Row operations not allowed! 24

94 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 25

95 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 1. G S (u, ξ) is non-decreasing in ξ for all u 2. For all u and s the inverse Q s (u) = {ξ : s = G S (u, ξ)} =. 3. For all ξ the cdf F S (s, ξ) is continuous. 25

96 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 1. G S (u, ξ) is non-decreasing in ξ for all u 2. For all u and s the inverse Q s (u) = {ξ : s = G S (u, ξ)} =. 3. For all ξ the cdf F S (s, ξ) is continuous. Then one has unique fiducial distribution and exact coverage of one-sided confidence intervals, i.e., P (Q s (U ) ξ) = 1 F S (s, ξ). 25

97 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 1. G S (u, ξ) is non-decreasing in ξ for all u 2. For all u and s the inverse Q s (u) = {ξ : s = G S (u, ξ)} =. 3. For all ξ the cdf F S (s, ξ) is continuous. Then one has unique fiducial distribution and exact coverage of one-sided confidence intervals, i.e., P (Q s (U ) ξ) = 1 F S (s, ξ). If S ξ 0 then 1 F S (S, ξ 0 ) U(0, 1) fiducial p-value. 25

98 theoretical results Basic Observations Exact frequentist coverage Set P (Q s (U ) C α (s)) = 1 α. 26

99 theoretical results Basic Observations Exact frequentist coverage Set P (Q s (U ) C α (s)) = 1 α. Coverage of upper confidence limit: P ξ (ξ C α (S)) = P ξ (P (Q S (U ) ξ S) 1 α) = P (U(0, 1) 1 α)= 1 α 26

100 theoretical results Basic Observations Exact frequentist coverage Set P (Q s (U ) C α (s)) = 1 α. Coverage of upper confidence limit: P ξ (ξ C α (S)) = P ξ (P (Q S (U ) ξ S) 1 α) = P (U(0, 1) 1 α)= 1 α This is general: simulate m fiducial p-values exact conservative liberal 26

101 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? 27

102 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) 27

103 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α 27

104 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α p = 1 above: ξ (, C(s)) u (α, 1) 27

105 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α p = 1 above: ξ (, C(s)) u (α, 1) Comments: Links sets of 1 α fiducial probability for different X. 27

106 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α p = 1 above: ξ (, C(s)) u (α, 1) Comments: Links sets of 1 α fiducial probability for different X. Reverse: Map C(S) of fiducial probability 1 α to U. If invariant in X then exact coverage. 27

107 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y 28

108 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (U x, U y ) = x U x y U y 28

109 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (U x, U y ) = x U x y U y Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. 28

110 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy ) = x U x y Uy Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. Visualize U = {(u x, u y ) : c 1 x u x y u c 2 } y µ x0 = 10, µ y0 = 10 coverage 95% µ x0 = 0.5, µ y0 = 0.5 coverage 98% 28

111 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy ) = x U x y Uy Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. Visualize U = {(u x, u y ) : c 1 x u x y u c 2 } y µ x0 = 10, µ y0 = 10 coverage 95% µ x0 = 0.5, µ y0 = 0.5 coverage 98% 28

112 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy ) = x U x y Uy Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. Visualize U = {(u x, u y ) : c 1 x u x y u c 2 } y µ x0 = 10, µ y0 = 10 coverage 95% µ x0 = 0.5, µ y0 = 0.5 coverage 98% 28

113 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 29

114 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 29

115 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 Fieller picked the set of fiducial probability 95% corresponding to U = { u 1 < 1.96} 29

116 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 Fieller picked the set of fiducial probability 95% corresponding to U = { u 1 < 1.96} Pros: Fiducial sets are linked, exact coverage is guaranteed Cons: The shape of the sets is strange (interval, complement of interval, whole real line) 29

117 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 Fieller picked the set of fiducial probability 95% corresponding to U = { u 1 < 1.96} Pros: Fiducial sets are linked, exact coverage is guaranteed Cons: The shape of the sets is strange (interval, complement of interval, whole real line) GFD1 GFD2 if y >> 0. 29

118 theoretical results Basic Observations Ancillary Representation (n > 1, p = 1) (4) Let (S(X), A(X)) be a smooth 1-1 transformation of X = G(U, ξ). S(X) is one dimensional satisfying 1, 2, 3. A(X) is a vector of functional ancillary statistics ( ξ A G(U, ξ) = 0). Theorem (Majumder, 2015) If (4) is satisfied GFI derived from (S, A) is exact. 30

119 theoretical results Basic Observations Ancillary Representation (n > 1, p = 1) (4) Let (S(X), A(X)) be a smooth 1-1 transformation of X = G(U, ξ). S(X) is one dimensional satisfying 1, 2, 3. A(X) is a vector of functional ancillary statistics ( ξ A G(U, ξ) = 0). Theorem (Majumder, 2015) If (4) is satisfied GFI derived from (S, A) is exact. Idea: The GFD is the same as FD based on S = G S a (U a, ξ) U a U a = A(G(U, ξ)) G S a is the restriction of G to the domain of U a. 30

120 theoretical results Basic Observations Ancillary Representation (n > 1, p = 1) (4) Let (S(X), A(X)) be a smooth 1-1 transformation of X = G(U, ξ). S(X) is one dimensional satisfying 1, 2, 3. A(X) is a vector of functional ancillary statistics ( ξ A G(U, ξ) = 0). Theorem (Majumder, 2015) If (4) is satisfied GFI derived from (S, A) is exact. Idea: The GFD is the same as FD based on S = G S a (U a, ξ) U a U a = A(G(U, ξ)) G S a is the restriction of G to the domain of U a. Same argument works for p > 1. 30

121 theoretical results Asymptotic results Various Asymptotic Results ( ) d r(ξ x) f X (x ξ)j(x, ξ) where J(x, ξ) = D dξ G(u, ξ) u=g 1 (x,ξ) Most start with C 1 n J(x, ξ) J(ξ 0, ξ) Bernstein-von Mises theorem for fiducial distributions provides asymptotic correctness of fiducial CIs H (2009, 2013), Sonderegger & H (2013). Consistency of model selection H & Lee (2009), Lai, H & Lee (2015), H, Iyer, Lai & Lee (2016). Regular higher order asymptotics in Pal Majumdar & H (2016+). 31

122 theoretical results Asymptotic results Another look Do fiducial probabilities correspond to (asymptotic) coverage? What is needed? 32

123 theoretical results Asymptotic results Another look Do fiducial probabilities correspond to (asymptotic) coverage? What is needed? Convergence of the posteriors to something nice 32

124 theoretical results Asymptotic results Another look Do fiducial probabilities correspond to (asymptotic) coverage? What is needed? Convergence of the posteriors to something nice Linkage of credible sets across all potential data. 32

125 theoretical results Asymptotic results LARGE Data X n generated using ξ n,0 and GFD measures R n,xn. 33

126 theoretical results Asymptotic results LARGE Data X n generated using ξ n,0 and GFD measures R n,xn. 1. t n (X n ) D T = (T 1, T 2 ) 2. T 1 = H 1 (V 1, ζ), T 2 = H 2 (V 2 ) H 1 and H 2 are one-to-one Q t1 (v 1 ) = ζ solves t 1 = H 1 (v 1, ζ) GFD R t Q t1 (V 1 ) H 2(V 2 ) = t 2. R t ( C) = 0 for open C Bernstein-von Mises: centered and scaled MLE T = ζ + V, V N(0, I 1 ξ 0 ) N(t, I 1 ξ 0 ) 3. Homeomorphic injective Ξ n Ξ n(ζ n,0 ) = 0 t n(x n) t implies R n,xn Ξ 1 W n R t. Ξ n(ξ) = n(ξ ξ 0 ) 33

127 theoretical results Asymptotic results LARGE theorem Theorem (H., Lai & Lee, 2016) Assume LARGE, fix 0 < α < 1, select open C n (x n ) 1. R n,xn (C n (x n )) = α 2. t n (x n ) t implies Ξ n (C n (x n )) C(t) 3. the set V t2 = {v : t = H(v, ξ) for ξ C(t)} depends on t = (t 1, t 2 ) only through ancillary t 2. 34

128 theoretical results Asymptotic results LARGE theorem Theorem (H., Lai & Lee, 2016) Assume LARGE, fix 0 < α < 1, select open C n (x n ) 1. R n,xn (C n (x n )) = α 2. t n (x n ) t implies Ξ n (C n (x n )) C(t) 3. the set V t2 = {v : t = H(v, ξ) for ξ C(t)} depends on t = (t 1, t 2 ) only through ancillary t 2. Then the sets C n (x n ) are α asymptotic confidence sets. 34

129 theoretical results Asymptotic results LARGE theorem Theorem (H., Lai & Lee, 2016) Assume LARGE, fix 0 < α < 1, select open C n (x n ) 1. R n,xn (C n (x n )) = α 2. t n (x n ) t implies Ξ n (C n (x n )) C(t) 3. the set V t2 = {v : t = H(v, ξ) for ξ C(t)} depends on t = (t 1, t 2 ) only through ancillary t 2. Then the sets C n (x n ) are α asymptotic confidence sets. If T 1 = H 1 (V 1, ξ) has group structure 3) means C(t) is invariant in t 1. 34

130 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied 35

131 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 (θ 2 θ) n+1 I ( x(n),x (1) )(θ) 35

132 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 I (θ 2 θ) n+1 ( x(n),x (1) )(θ) Transformations: ) ( ) x(1) θ 0 t n (y) = n, Ξ n (θ) = n(θ θ 0 ) ( 1/2 1/2 1/2 1/2 θ 2 0 x (n) 2θ 0 35

133 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 I (θ 2 θ) n+1 ( x(n),x (1) )(θ) Transformations: ) ( ) x(1) θ 0 t n (y) = n, Ξ n (θ) = n(θ θ 0 ) ( 1/2 1/2 1/2 1/2 θ 2 0 x (n) 2θ 0 Limit T 1 = ξ + V 1, T 2 = V 2, where V 1, V 2 are dependent. Limiting fiducial ( distribution r(ζ t) exp ζ 2θ 0 1 θ 2 0 θ 0 ) I (T1 T 2,T 1 +T 2 )(ζ). 35

134 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 I (θ 2 θ) n+1 ( x(n),x (1) )(θ) Transformations: ) ( ) x(1) θ 0 t n (y) = n, Ξ n (θ) = n(θ θ 0 ) ( 1/2 1/2 1/2 1/2 θ 2 0 x (n) 2θ 0 Limit T 1 = ξ + V 1, T 2 = V 2, where V 1, V 2 are dependent. Limiting fiducial ( distribution r(ζ t) exp ζ 2θ 0 1 θ 2 0 θ 0 ) I (T1 T 2,T 1 +T 2 )(ζ). One sided credible sets have correct asymptotic coverage! 35

135 applications Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

136 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE 36

137 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE Typically need a normalizing constant to J(x, ξ)f(x ξ) Ride the Bayesian Wave: MCMC (Gibbs, MH), SMC (trick is to resample right) 36

138 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE Typically need a normalizing constant to J(x, ξ)f(x ξ) Ride the Bayesian Wave: MCMC (Gibbs, MH), SMC (trick is to resample right) For p small numerical integration 36

139 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE Typically need a normalizing constant to J(x, ξ)f(x ξ) Ride the Bayesian Wave: MCMC (Gibbs, MH), SMC (trick is to resample right) For p small numerical integration Fiducial Quirk Interested in Frequentist Properties m-number of MC samples vs k-number of replications: k Cm 2 36

140 applications Distributed Data Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

141 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer 37

142 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites 37

143 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns 37

144 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns partition x into K subsets, where each subsets can be analyzed 37

145 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns partition x into K subsets, where each subsets can be analyzed e.g., use a computer cluster for parallel processing, where K is the number of nodes (or workers) 37

146 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns partition x into K subsets, where each subsets can be analyzed e.g., use a computer cluster for parallel processing, where K is the number of nodes (or workers) merge results from different nodes 37

147 applications Distributed Data Importance sampling for Massive Data 38

148 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K 38

149 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K r(θ x) the generalized fiducial density of x r(θ x k ) the generalized fiducial density of x k 38

150 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K r(θ x) the generalized fiducial density of x r(θ x k ) the generalized fiducial density of x k On each worker sample from q k (θ) r(θ x) k importance weight q k (θ) 38

151 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K r(θ x) the generalized fiducial density of x r(θ x k ) the generalized fiducial density of x k On each worker sample from q k (θ) r(θ x) k importance weight q k (θ) depending on the problem, r(θ x) and r k (θ x k ) may be known only up to normalizing constant. 38

152 applications Distributed Data Naive scheme generate a fiducial sample for data on each node 39

153 applications Distributed Data Naive scheme generate a fiducial sample for data on each node compute the weight w k (θ) = r(θ x) r k (θ x k ) 39

154 applications Distributed Data Naive scheme generate a fiducial sample for data on each node compute the weight w k (θ) = Not feasible and very inefficient! r(θ x) r k (θ x k ) 39

155 applications Distributed Data Naive scheme generate a fiducial sample for data on each node compute the weight w k (θ) = r(θ x) r k (θ x k ) Not feasible and very inefficient! Target of order n 1/2 Fiducial sample on each worker of order n 1/2 k. Most realizations get extremely small weights. 39

156 applications Distributed Data Improved scheme 40

157 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers 40

158 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) 40

159 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) Practical choice g Normal(ˆθ j, γî 1 k ). 40

160 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) Practical choice g Normal(ˆθ j, γî 1 k ). Weight w k (θ) f(x k,θ) j k g(θ θ j,i j ) (Computed in parallel.) 40

161 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) Practical choice g Normal(ˆθ j, γî 1 k ). Weight w k (θ) f(x k,θ) j k g(θ θ j,i j ) (Computed in parallel.) We proved consistency and asymptotic normality of the approximation error. 40

162 applications Distributed Data Experiments All good performance (K = 1, 2, 4, 8, 16, 32) Gaussian mixture: 0.6N( 1, 1) + 0.4N(1, 1) (n = ) Linear regression with Cauchy errors (n = 10 5, p T = 4, p = 6, 8, 11) 41

163 applications Distributed Data Computational time 42

164 applications Distributed Data Computational time For Cauchy: speed improves until K = 16 then deteriorates (Cheng & Shang, 2015) 42

165 applications Solar Physics Sun Spots low activity high activity 43

166 applications Solar Physics Sun Spots low activity high activity The bright flare on the right has value 253. Is this high? 43

167 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on

168 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on 2010 one instrument is Atmospheric Imaging Assembly (AIA) (Schuh et al. 2013) photographs the sun in 8 wavelengths every 12s image size: TB compressed data per day same as 3 TB raw (i.e., uncompressed) data per day 44

169 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on 2010 one instrument is Atmospheric Imaging Assembly (AIA) (Schuh et al. 2013) photographs the sun in 8 wavelengths every 12s image size: TB compressed data per day same as 3 TB raw (i.e., uncompressed) data per day ultimate goal: detect and predict solar flares 44

170 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on 2010 one instrument is Atmospheric Imaging Assembly (AIA) (Schuh et al. 2013) photographs the sun in 8 wavelengths every 12s image size: TB compressed data per day same as 3 TB raw (i.e., uncompressed) data per day ultimate goal: detect and predict solar flares Tool: GFD for Generalized Pareto (Wandler & H, 2012) 44

171 applications Solar Physics GFD for extreme quantiles Large Quantiles Fiducial Excedance Probability 45

172 applications Right Censored Data Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

173 applications Right Censored Data Right Censored Data Data generating equation: T i = F 1 (U i ), C i = G 1 i (W i F 1 (U i )), U i, W i i.i.d. Uniform(0,1) observe only X i = min(t i, C i ), δ i = I {Ti C i }. 46

174 applications Right Censored Data Right Censored Data Data generating equation: T i = F 1 (U i ), C i = G 1 i (W i F 1 (U i )), U i, W i i.i.d. Uniform(0,1) observe only X i = min(t i, C i ), δ i = I {Ti C i }. Solvie for F and G: δ i = 1 F (x i ϵ) < Ui F (x i ) G i (x i ϵ x i ) Wi δ i = 0 F (x i ) < Ui G i (x i ϵ F 1 (U i )) < W i G i (x i F 1 (U i )) 46

175 applications Right Censored Data Right Censored Data Data generating equation: T i = F 1 (U i ), C i = G 1 i (W i F 1 (U i )), U i, W i i.i.d. Uniform(0,1) observe only X i = min(t i, C i ), δ i = I {Ti C i }. Solvie for F and G: δ i = 1 F (x i ϵ) < Ui F (x i ) G i (x i ϵ x i ) Wi δ i = 0 F (x i ) < Ui G i (x i ϵ F 1 (U i )) < W i G i (x i F 1 (U i )) Generate U conditional on existence of F solving the equations. 46

176 applications Right Censored Data Visual Demonstation U i 1 0 x x x x x x x x failure censored x i U i generated so that there is a solution 47

177 applications Right Censored Data Visual Demonstation F 1 0 x x x x x x x x failure censored x i lower upper F is any cdf between bounds 47

178 applications Right Censored Data Visual Demonstation F 1 0 x x x x x x x x failure censored x i lower upper F is any cdf between bounds Fact: For any failure time E [F L (s)] < ˆF (s) < E [F U (s)]. 47

179 applications Right Censored Data Theoretical Results Known: KM estimator n[ ˆF (t) F (t)] D Wt in D[0,T] (W mean zero Gaussian process with known covariance depending on censoring) 48

180 applications Right Censored Data Theoretical Results Known: KM estimator n[ ˆF (t) F (t)] D Wt in D[0,T] (W mean zero Gaussian process with known covariance depending on censoring) Theorem (Cui, Hannig): Under similar assumptions n[f (t) ˆF (t)] X D W t almost surely. 48

181 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. 49

182 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. 49

183 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. Simultaneous CI Center on mean (or Kaplan Mayer). Find L ball containing 1 α fiducial sample curves. 49

184 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. Simultaneous CI Center on mean (or Kaplan Mayer). Find L ball containing 1 α fiducial sample curves. Fiducial p-value for testing H 0 : F = F 0 Find largest α so that F 0 is in the 1 α CI. 49

185 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. Simultaneous CI Center on mean (or Kaplan Mayer). Find L ball containing 1 α fiducial sample curves. Fiducial p-value for testing H 0 : F = F 0 Find largest α so that F 0 is in the 1 α CI. For difference of two populations use difference of fiducial distributions 49

186 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings 50

187 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage 50

188 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage Simultaneous CI also show good/conservative coverage 50

189 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage Simultaneous CI also show good/conservative coverage Two sample testing Good type 1 error Compared to log rank tests and sup-log rank test: competitive. 50

190 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage Simultaneous CI also show good/conservative coverage Two sample testing Good type 1 error Compared to log rank tests and sup-log rank test: competitive. Room for improvement: Use something else than L ball half-region depth? 50

191 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. 51

192 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. 51

193 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. Different: Log-rank statistics p-value 0.63 (hazard crossing); Fiducial p-value

194 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. Different: Log-rank statistics p-value 0.63 (hazard crossing); Fiducial p-value Simulation with estimated hazard shows fiducial more powerful than competitors. 51

195 applications High D Regression Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

196 applications High D Regression Model Selection X = G(M, ξ M, U), M M, ξ M ξ M Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions r(m y) q M ξ M f M (y, ξ M )J M (y, ξ M ) dξ M 52

197 applications High D Regression Model Selection X = G(M, ξ M, U), M M, ξ M ξ M Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions r(m y) q M ξ M f M (y, ξ M )J M (y, ξ M ) dξ M Need for penalty in fiducial framework additional equations 0 = P k, k = 1,..., min( M, n) 52

198 applications High D Regression Model Selection X = G(M, ξ M, U), M M, ξ M ξ M Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions r(m y) q M ξ M f M (y, ξ M )J M (y, ξ M ) dξ M Need for penalty in fiducial framework additional equations 0 = P k, k = 1,..., min( M, n) Default value q = n 1/2 (motivated by MDL) 52

199 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters 53

200 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters Real issue: Not too many parameters but a smaller model can do almost the same job 53

201 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters Real issue: Not too many parameters but a smaller model can do almost the same job r(m y) f M (y, ξ M )J M (y, ξ M )h M (ξ M ) dξ M, Ξ M h M (ξ M ) = { 0 a smaller model predicts nearly as well 1 otherwise 53

202 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters Real issue: Not too many parameters but a smaller model can do almost the same job r(m y) f M (y, ξ M )J M (y, ξ M )h M (ξ M ) dξ M, Ξ M h M (ξ M ) = { 0 a smaller model predicts nearly as well 1 otherwise Motivated by non-local priors of Johnson & Rossell (2009) 53

203 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity 54

204 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity Better: where b min solves h M (β M ) := I { 1 2 XT (X M β M Xb min ) 2 2 ε M} 1 min b R p 2 XT (X M β M Xb) 2 2 subject to b 0 M 1. algorithm Bertsimas et al (2016) 54

205 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity Better: where b min solves h M (β M ) := I { 1 2 XT (X M β M Xb min ) 2 2 ε M} 1 min b R p 2 XT (X M β M Xb) 2 2 subject to b 0 M 1. algorithm Bertsimas et al (2016) similar to Dantzig selector Candes & Tao (2007) different norm and target 54

206 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity Better: where b min solves h M (β M ) := I { 1 2 XT (X M β M Xb min ) 2 2 ε M} 1 min b R p 2 XT (X M β M Xb) 2 2 subject to b 0 M 1. algorithm Bertsimas et al (2016) similar to Dantzig selector Candes & Tao (2007) different norm and target Call this: ε-admissible subset 54

207 applications High D Regression GFD ( r(m y) π M n M 2 Γ )RSS 2 Observations: n M 1 ( ) 2 M E[h ε M(β M)] Expectation with respect to within model GFD (usual T) 55

On Generalized Fiducial Inference

On Generalized Fiducial Inference On Generalized Fiducial Inference Jan Hannig jan.hannig@colostate.edu University of North Carolina at Chapel Hill Parts of this talk are based on joint work with: Hari Iyer, Thomas C.M. Lee, Paul Patterson,

More information

Fiducial Inference and Generalizations

Fiducial Inference and Generalizations Fiducial Inference and Generalizations Jan Hannig Department of Statistics and Operations Research The University of North Carolina at Chapel Hill Hari Iyer Department of Statistics, Colorado State University

More information

Generalized Fiducial Inference: A Review

Generalized Fiducial Inference: A Review Generalized Fiducial Inference: A Review Jan Hannig Hari Iyer Randy C. S. Lai Thomas C. M. Lee September 16, 2015 Abstract R. A. Fisher, the father of modern statistics, proposed the idea of fiducial inference

More information

Generalized Fiducial Inference: A Review and New Results

Generalized Fiducial Inference: A Review and New Results Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Generalized Fiducial Inference: A Review and New Results

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

arxiv: v3 [stat.me] 24 Mar 2018

arxiv: v3 [stat.me] 24 Mar 2018 Nonparametric generalized fiducial inference for survival functions under censoring Yifan Cui 1 and Jan Hannig 1 1 Department of Statistics and Operations Research, University of North Carolina, Chapel

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

BFF Four: Are we Converging?

BFF Four: Are we Converging? BFF Four: Are we Converging? Nancy Reid May 2, 2017 Classical Approaches: A Look Way Back Nature of Probability BFF one to three: a look back Comparisons Are we getting there? BFF Four Harvard, May 2017

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Generalized fiducial confidence intervals for extremes

Generalized fiducial confidence intervals for extremes Extremes (2012) 15:67 87 DOI 10.1007/s10687-011-0127-9 Generalized fiducial confidence intervals for extremes Damian V. Wandler Jan Hannig Received: 1 December 2009 / Revised: 13 December 2010 / Accepted:

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Light touch assessment One or two problems assigned weekly graded during Reading Week http://www.utstat.toronto.edu/reid/4508s14.html

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Monte Carlo conditioning on a sufficient statistic

Monte Carlo conditioning on a sufficient statistic Seminar, UC Davis, 24 April 2008 p. 1/22 Monte Carlo conditioning on a sufficient statistic Bo Henry Lindqvist Norwegian University of Science and Technology, Trondheim Joint work with Gunnar Taraldsen,

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Probability and Statistics qualifying exam, May 2015

Probability and Statistics qualifying exam, May 2015 Probability and Statistics qualifying exam, May 2015 Name: Instructions: 1. The exam is divided into 3 sections: Linear Models, Mathematical Statistics and Probability. You must pass each section to pass

More information

Confidence distributions in statistical inference

Confidence distributions in statistical inference Confidence distributions in statistical inference Sergei I. Bityukov Institute for High Energy Physics, Protvino, Russia Nikolai V. Krasnikov Institute for Nuclear Research RAS, Moscow, Russia Motivation

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

ON GENERALIZED FIDUCIAL INFERENCE

ON GENERALIZED FIDUCIAL INFERENCE Statistica Sinica 9 (2009), 49-544 ON GENERALIZED FIDUCIAL INFERENCE Jan Hannig The University of North Carolina at Chapel Hill Abstract: In this paper we extend Fisher s fiducial argument and obtain a

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Exact Statistical Inference in. Parametric Models

Exact Statistical Inference in. Parametric Models Exact Statistical Inference in Parametric Models Audun Sektnan December 2016 Specialization Project Department of Mathematical Sciences Norwegian University of Science and Technology Supervisor: Professor

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Hybrid Censoring; An Introduction 2

Hybrid Censoring; An Introduction 2 Hybrid Censoring; An Introduction 2 Debasis Kundu Department of Mathematics & Statistics Indian Institute of Technology Kanpur 23-rd November, 2010 2 This is a joint work with N. Balakrishnan Debasis Kundu

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Bayesian and frequentist inference

Bayesian and frequentist inference Bayesian and frequentist inference Nancy Reid March 26, 2007 Don Fraser, Ana-Maria Staicu Overview Methods of inference Asymptotic theory Approximate posteriors matching priors Examples Logistic regression

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6. Approximating models Nancy Reid, University of Toronto Oxford, February 6 www.utstat.utoronto.reid/research 1 1. Context Likelihood based inference model f(y; θ), log likelihood function l(θ; y) y = (y

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2008 Paper 94 The Highest Confidence Density Region and Its Usage for Inferences about the Survival Function with Censored

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Fiducial and Confidence Distributions for Real Exponential Families

Fiducial and Confidence Distributions for Real Exponential Families Scandinavian Journal of Statistics doi: 10.1111/sjos.12117 Published by Wiley Publishing Ltd. Fiducial and Confidence Distributions for Real Exponential Families PIERO VERONESE and EUGENIO MELILLI Department

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Bootstrap and Parametric Inference: Successes and Challenges

Bootstrap and Parametric Inference: Successes and Challenges Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Anne Philippe Laboratoire de Mathématiques Jean Leray Université de Nantes Workshop EDF-INRIA,

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Objective Bayesian and fiducial inference: some results and comparisons. Piero Veronese and Eugenio Melilli Bocconi University, Milano, Italy

Objective Bayesian and fiducial inference: some results and comparisons. Piero Veronese and Eugenio Melilli Bocconi University, Milano, Italy Objective Bayesian and fiducial inference: some results and comparisons Piero Veronese and Eugenio Melilli Bocconi University, Milano, Italy Abstract Objective Bayesian analysis and fiducial inference

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS EDWARD I. GEORGE and ZUOSHUN ZHANG The University of Texas at Austin and Quintiles Inc. June 2 SUMMARY For Bayesian analysis of hierarchical

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

A simple analysis of the exact probability matching prior in the location-scale model

A simple analysis of the exact probability matching prior in the location-scale model A simple analysis of the exact probability matching prior in the location-scale model Thomas J. DiCiccio Department of Social Statistics, Cornell University Todd A. Kuffner Department of Mathematics, Washington

More information

An Overview of Objective Bayesian Analysis

An Overview of Objective Bayesian Analysis An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Analysis of Middle Censored Data with Exponential Lifetime Distributions

Analysis of Middle Censored Data with Exponential Lifetime Distributions Analysis of Middle Censored Data with Exponential Lifetime Distributions Srikanth K. Iyer S. Rao Jammalamadaka Debasis Kundu Abstract Recently Jammalamadaka and Mangalam (2003) introduced a general censoring

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Checking for Prior-Data Conflict

Checking for Prior-Data Conflict Bayesian Analysis (2006) 1, Number 4, pp. 893 914 Checking for Prior-Data Conflict Michael Evans and Hadas Moshonov Abstract. Inference proceeds from ingredients chosen by the analyst and data. To validate

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

STAT 830 Non-parametric Inference Basics

STAT 830 Non-parametric Inference Basics STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Simulating Random Variables

Simulating Random Variables Simulating Random Variables Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 23 R has many built-in random number generators... Beta, gamma (also

More information

Change Of Variable Theorem: Multiple Dimensions

Change Of Variable Theorem: Multiple Dimensions Change Of Variable Theorem: Multiple Dimensions Moulinath Banerjee University of Michigan August 30, 01 Let (X, Y ) be a two-dimensional continuous random vector. Thus P (X = x, Y = y) = 0 for all (x,

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information