Generalized Fiducial Inference

Size: px

Start display at page:

Download "Generalized Fiducial Inference"

Sherman Owens
5 years ago
Views:

1 Generalized Fiducial Inference Parts of this short course are joint work with T. C.M Lee (UC Davis), H. Iyer (NIST) Randy Lai (U of Maine), J. Williams (UNC), Y. Cui (UNC), BFF 2018 Jan Hannig a University of North Carolina at Chapel Hill a NSF support acknowledged

2 outline Outline Introduction Definition Theoretical Results Applications Conclusions 1

3 introduction Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

4 introduction Fiducial? Oxford English Dictionary adjective technical (of a point or line) used as a fixed basis of comparison. Origin from Latin fiducia trust, confidence Merriam-Webster dictionary 1. taken as standard of reference a fiducial mark 2. founded on faith or trust 3. having the nature of a trust : fiduciary 2

5 introduction Long, long, long time ago Bayes Theorem: P (θ X) = f(x θ)π(θ) Θ f(x θ)π(θ)dθ. 3

When nothing is known about the parameter in advance, let

6 introduction Long, long, long time ago Bayes Theorem: P (θ X) = Bayes-Laplace postulate: f(x θ)π(θ) Θ f(x θ)π(θ)dθ. When nothing is known about the parameter in advance, let the prior be so that all values of the parameter are equally likely. 3

7 introduction Long, long, time ago Not knowing the chance of mutually exclusive events and knowing the chance to be equal are two quite different states of knowledge R. A. Fisher (1930) 4

8 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition 5

9 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes 5

10 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference 5

11 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities 5

12 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities Dawid and Stone (1982) theoretical results for simple cases. 5

13 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities Dawid and Stone (1982) theoretical results for simple cases. Barnard (1995) pivotal based methods. 5

introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster

14 introduction Brief history of fiducial inference Fisher (1922, 1930, 1935) no formal definition Lindley (1958) fiducial vs Bayes Fraser (1966) structural inference Dempster (1967) upper and lower probabilities Dawid and Stone (1982) theoretical results for simple cases. Barnard (1995) pivotal based methods. Weerahandi (1989, 1993) generalized inference. 5

15 introduction Fiducial Inspired Inference since

16 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) 6

17 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) 6

18 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) 6

19 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) Higher order likelihood, tangent exponential family, r, Reid & Fraser (2010) 6

20 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) Higher order likelihood, tangent exponential family, r, Reid & Fraser (2010) Objective Bayesian inference, e.g., reference prior Berger, Bernardo & Sun (2009, 2012). 6

21 introduction Fiducial Inspired Inference since 2000 Dempster-Shafer calculus; Dempster (2008), Edlefsen, Liu & Dempster (2009) Inferential Models; Liu & Martin (2015) Confidence Distributions; Xie, Singh & Strawderman (2011), Schweder & Hjort (2016) Higher order likelihood, tangent exponential family, r, Reid & Fraser (2010) Objective Bayesian inference, e.g., reference prior Berger, Bernardo & Sun (2009, 2012). Fiducial Inference H, Iyer & Patterson (2006), H (2009, 2013), H & Lee (2009), Taraldsen & Lindqvist (2013), Veronese & Melilli (2015), H, Iyer, Lai & Lee (2016) 6

22 introduction Aims Explain the definition of generalized fiducial distribution 7

23 introduction Aims Explain the definition of generalized fiducial distribution Discuss theoretical results 7

24 introduction Aims Explain the definition of generalized fiducial distribution Discuss theoretical results Show successful applications 7

25 introduction Aims Explain the definition of generalized fiducial distribution Discuss theoretical results Show successful applications My point of view is frequentist Justified using asymptotic theorems and simulations. GFI shows very good repeated sampling performance in applications. 7

26 definition Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

27 definition Comparison to likelihood Density is the function f(x, ξ), where ξ is fixed and x is variable. 8

28 definition Comparison to likelihood Density is the function f(x, ξ), where ξ is fixed and x is variable. Likelihood is the function f(x, ξ), where ξ is variable and x is fixed. 8

29 definition Comparison to likelihood Density is the function f(x, ξ), where ξ is fixed and x is variable. Likelihood is the function f(x, ξ), where ξ is variable and x is fixed. Likelihood as a distribution? 8

30 definition Data generating equation Data generating equation (DGE) X = G(U, ξ), U is a random with known distribution (iid U(0, 1)) Parameter ξ is fixed. Generate Xs by generating Us and DGE. This determines sampling distribution 9

31 definition Data generating equation Data generating equation (DGE) x = G(U, ξ ), U is a random with known distribution (iid U(0, 1)) Data x is fixed Generate ξ by generating Us and inverting DGE. This determines fiducial distribution Denote the inverse Q x(u ). 9

32 definition Data generating equation Data generating equation (DGE) x = G(U, ξ ), U is a random with known distribution (iid U(0, 1)) Data x is fixed Generate ξ by generating Us and inverting DGE. This determines fiducial distribution Denote the inverse Q x(u ). Issues: Multiple solutions and no solutions. 9

33 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{U i p}, U i Uniform(0,1) Generating U i samples Bernoulli(p). 10

34 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{U i p }, U i Uniform(0,1) Estimating U i by U i defines fiducial distribution 10

35 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{Ui p }, Ui Uniform(0,1) Estimating U i by Ui defines fiducial distribution If x i = 1, then p [Ui, 1] x i = 1 p : [ ] 0 1 U i 10

36 definition Main Idea Example -- Bernoulli trials Data generating equation X i = 1{Ui p }, Ui Uniform(0,1) Estimating U i by Ui defines fiducial distribution If x i = 0, then p [0, Ui ] x i = 0 p : [ ] 0 Ui 1 10

37 definition Main Idea Example -- Binomial Data generating equation X 1 = 1{U 1 p}, X 2 = 1{U 2 p} U 1, U 2 i.i.d. Uniform(0,1) 11

38 definition Main Idea Example -- Binomial Data generating equation X 1 = 1{U 1 p}, X 2 = 1{U 2 p} U 1, U 2 i.i.d. Uniform(0,1) If X 1 = 0, X 2 = 1 and U 1 < U 2 x 1 = 0 x 2 = 1 p : [ ] [ ] 0 U1 U2 1 No solution! Remove (U1, U 2 ) inconsistent with data. 11

39 definition Main Idea Example -- Binomial Data generating equation X 1 = 1{U 1 p}, X 2 = 1{U 2 p} U 1, U 2 i.i.d. Uniform(0,1) If X 1 = 0, X 2 = 1 and U 1 > U 2 x 1 = 0 x 2 = 1 p : [ [ ] ] 0 U2 U1 1 (U1, U 2 ) {U 1 > U 2 } estimates (u 1, u 2 ). 11

40 definition Main Idea Example -- Binomial (X 1,... X n ) iid Bernoulli(p), S = n i=1 X i Binomial(n, p) 12

41 definition Main Idea Example -- Binomial (X 1,... X n ) iid Bernoulli(p), S = n i=1 X i Binomial(n, p) Condition U on having a solution for p Q x (U ) p : [ [ [ [ ] ] ] 0 U1:n U 2:n Us:n U(s+1):n U n:n 1 12

42 definition Main Idea Example -- Binomial (X 1,... X n ) iid Bernoulli(p), S = n i=1 X i Binomial(n, p) Condition U on having a solution for p Q x (U ) p : [ [ [ [ ] ] ] 0 U1:n U 2:n Us:n U(s+1):n U n:n 1 Select a point in the interval. A particular choice results in Beta(s + 1/2, n s + 1/2) 12

43 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. 13

44 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise 13

45 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise Estimate u by conditional U x 2 x 1 = U 2 U 1,..., x n x 1 = U n U 1 ; 13

46 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise Estimate u by conditional U x 2 x 1 = U 2 U 1,..., x n x 1 = U n U 1 ; Fiducial density r(µ x) n i=1 (1 + (µ x i) 2 ) 1. 13

47 definition Main Idea Example -- Location Cauchy Consider X i = µ + U i where U i are i.i.d. standard Cauchy. Solve: Q x(u) = { x 1 u 1 if x 2 x 1 = u 2 u 1,..., xn x 1 = u n u 1 otherwise Estimate u by conditional U x 2 x 1 = U 2 U 1,..., x n x 1 = U n U 1 ; Fiducial density r(µ x) n i=1 (1 + (µ x i) 2 ) 1. Location problem same as posterior computed using Jeffreys prior 13

48 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i 14

49 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i A distribution on the parameter space is Generalized Fiducial Distribution if it can be obtained as a limit (as ε 0) of arg min x G(U, ξ) {min x G(U, ξ) ε} (1) ξ ξ 14

50 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i A distribution on the parameter space is Generalized Fiducial Distribution if it can be obtained as a limit (as ε 0) of arg min x G(U, ξ) {min x G(U, ξ) ε} (1) ξ ξ Similar to ABC; generating from prior replaced by min. 14

51 definition Formal Defintion General Definition Data generating equation X = G(U, ξ). e.g. X i = µ + σu i A distribution on the parameter space is Generalized Fiducial Distribution if it can be obtained as a limit (as ε 0) of arg min x G(U, ξ) {min x G(U, ξ) ε} (1) ξ ξ Similar to ABC; generating from prior replaced by min. Is this practical? Can we compute? 14

52 definition Formal Defintion Explicit limit (1) Assume X R n is continuous; parameter ξ R p The limit in (1) has density (H, Iyer, Lai & Lee, 2016) r(ξ x) = f X (x ξ)j(x, ξ) Ξ f X(x ξ )J(x, ξ ) dξ, ( ) where J(x, ξ) = D ξ G(u, ξ) u=g 1 (x,ξ) n = p gives D(A) = det A 15

53 definition Formal Defintion Explicit limit (1) Assume X R n is continuous; parameter ξ R p The limit in (1) has density (H, Iyer, Lai & Lee, 2016) r(ξ x) = f X (x ξ)j(x, ξ) Ξ f X(x ξ )J(x, ξ ) dξ, ( ) where J(x, ξ) = D ξ G(u, ξ) u=g 1 (x,ξ) n = p gives D(A) = det A 2 gives D(A) = (det A A) 1/2 gives D(A) = 1 gives D(A) = i=(i 1,...,i p) i=(i 1,...,i p) det(a) i w i det(a) i 15

54 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 16

55 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). 16

56 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). d dθ [θ + (θ2 θ)u i ] = 1 + (2θ 1)U i, with U i = X i θ θ 2 θ. 16

57 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). d dθ [θ + (θ2 θ)u i ] = 1 + (2θ 1)U i, with U i = X i θ θ 2 θ. Jacobian 1 + (2θ 1)(x 1 θ) θ 2 θ J(x, θ) = D. = 1 x 1 (2θ 1) θ (2θ 1)(xn θ) θ 2 θ θ 2 θ D. x n (2θ 1) θ 2 16

58 definition Formal Defintion Example -- Uniform(θ, θ 2 ) X i i.i.d. U(θ, θ 2 ), θ > 1 Data generating equation X i = θ + (θ 2 θ)u i, U i U(0, 1). d dθ [θ + (θ2 θ)u i ] = 1 + (2θ 1)U i, with U i = X i θ θ 2 θ. Jacobian 1 + (2θ 1)(x 1 θ) θ 2 θ J(x, θ) = D. = 1 x 1 (2θ 1) θ (2θ 1)(xn θ) θ 2 θ = n x(2θ 1) θ2 θ 2 θ for L. θ 2 θ D. x n (2θ 1) θ 2 16

59 definition Formal Defintion Example -- Uniform(θ, θ 2 ) Reference prior (Berger, Bernardo & Sun, 2009) ( 2θ 1) π(θ) = eψ (2θ 1). θ 2 θ 17

60 definition Formal Defintion Example -- Uniform(θ, θ 2 ) Reference prior (Berger, Bernardo & Sun, 2009) ( 2θ 1) π(θ) = eψ (2θ 1). θ 2 θ reference prior vs fiducial Jacobian 17

61 definition Formal Defintion Example -- Uniform(θ, θ 2 ) Reference prior (Berger, Bernardo & Sun, 2009) ( 2θ 1) π(θ) = eψ (2θ 1). θ 2 θ reference prior vs fiducial Jacobian In simulations fiducial was marginally better than reference prior which was much better than flat prior. 17

62 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz 18

63 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. 18

64 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. ( ) Jacobian J(y, β, σ) = D = σ 1 D(X, y) X, y Xβ σ 18

65 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. ( ) Jacobian J(y, β, σ) = D = σ 1 D(X, y) X, y Xβ σ = σ 1 det(x T X) 1/2 (RSS) 1/2 for L 2. 18

66 definition Formal Defintion Example -- Linear Regression Data generating equation Y = Xβ + σz d dθ Y = (X, Z) and Z = (Y Xβ)/σ. ( ) Jacobian J(y, β, σ) = D = σ 1 D(X, y) X, y Xβ σ = σ 1 det(x T X) 1/2 (RSS) 1/2 for L 2. Same as independence Jeffreys, explicit normalizing constant 18

67 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. 19

68 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n 1 i=1. σ(1+ γx i σ ) 1+1/γ 19

69 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i Jacobian evaluated at u i = ( 1 + γx i d dσ G(u i, γ, σ) = xi σ. σ ) 1+1/γ. σ ) 1/γ 19

70 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i Jacobian evaluated at u i = ( 1 + γx i d dσ G(u i, γ, σ) = xi σ. d dγ G(u i, γ, σ) = xi γ γxi σ(1+ + σ σ ) 1+1/γ. σ ) 1/γ ) log(1+ γx i σ ) γ 2. 19

71 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i Jacobian evaluated at u i = ( 1 + γx i d dσ G(u i, γ, σ) = xi σ. σ ) 1+1/γ. σ ) 1/γ γxi σ(1+ ) log(1+ γx i σ ) + γ. 2 d dγ G(u i, γ, σ) = xi σ γ ( x γx 1 σ J(x, γ, σ) = γ 2 D. ( x n 1 + γx n σ ) log ( 1 + γx 1 σ ). ( log 1 + γx n σ ) ) 19

72 definition Formal Defintion Example -- Generalized Pareto X i = G(U i, γ, σ) = σ U γ i 1 γ Models excedances over a large threshold. Likelihood f(x, γ, σ) = n i=1 1 σ(1+ γx i. σ ) 1+1/γ Jacobian evaluated at u i = ( 1 + γx ) i 1/γ σ d dσ G(u i, γ, σ) = xi σ. d γxi dγ G(u i, γ, σ) = xi σ(1+ ) γ + σ log(1+ γx i σ ) γ. ( 2 x γx 1 ) ( σ log 1 + γx 1 ) σ J(x, γ, σ) = γ 2 D. (. 1 + γx n ) ( log 1 + γx n ) = i<j x n x j (1+ γx ) ( i log σ 1+ γx i σ σ ( ) x i 1+ γx j σ γ 2 σ ) ( log 1+ γx j σ ) for L. 19

73 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution 20

74 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution Negative Binomial Distribution (compare to Binomial) 20

75 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution Negative Binomial Distribution (compare to Binomial) T distribution (might not have a pretty form) 20

76 definition Formal Defintion Exercise Derive a GFD for: Weibull distribution Negative Binomial Distribution (compare to Binomial) T distribution (might not have a pretty form) Your favorite model 20

77 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! 21

78 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! Recall: arg min x G(U, ξ) {min x G(U, ξ) ε} ξ ξ 21

79 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! Recall: arg min x G(U, ξ) {min x G(U, ξ) ε} ξ ξ Optimization problem: min ξ i x i G i (U, ξ) 21

80 definition Formal Defintion Jacobian Formula Short course special - L 1 norm! Recall: arg min x G(U, ξ) {min x G(U, ξ) ε} ξ ξ Optimization problem: min ξ i x i G i (U, ξ) G(U, ξ) is nearly linear in ξ on near x 21

81 definition Formal Defintion L 1 minimum Objective function: locally linear, locally convex 22

82 definition Formal Defintion L 1 minimum Objective function: locally linear, locally convex At minimum: p coordinates equal to x KKT condition: 0 x G(U, ξ) 1, i.e., 0 = λ i x G i (U, ξ), where {1} x i G i (U, ξ) > 0 λ i { 1} [ 1, 1] x i G i (U, ξ) < 0 x i G i (U, ξ) = 0 G = (.1,.25,.2) ξ + (.2,.1,.3) x = (0, 0, 0) minimum at ξ =.4, G = (0.24, 0, 0.22) 22

83 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) 23

84 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) 23

85 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) Condition on remaining of the equations and KKT condition 23

86 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) Condition on remaining of the equations and KKT condition Final formula r(ξ x) J(x, ξ)f(x ξ), J(x, ξ) = w i det( ξ G i (u, ξ)) u=g 1 (x,ξ). i=(i 1,...,i k ) 23

87 definition Formal Defintion L 1 Jacobian Select p equations i = (i 1,..., i p ) and solve x i = G i (U, ξ) Implicit function theorem: det( ξg i (u, ξ)) 1 u=g (x i i,ξ) f(x i ξ) Condition on remaining of the equations and KKT condition Final formula r(ξ x) J(x, ξ)f(x ξ), J(x, ξ) = w i det( ξ G i (u, ξ)) u=g 1 (x,ξ). i=(i 1,...,i k ) The KKT factor w i = P ( λ [ 1, 1] p : λ ξ G i (u, ξ)+r ξ G i (u, ξ) = 0), where u = G 1 (x, ξ), R = (R 1,..., R n p ) i.i.d. Rademacher. 23

88 theoretical results Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

89 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper 24

90 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) 24

91 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) GFD is not invariant to smooth transformation of the data if n > p 24

92 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) GFD is not invariant to smooth transformation of the data if n > p Consequently: GFD does not satisfy likelihood principle. 24

93 theoretical results Basic Observations Important Observations (Bayesian) GFD is always proper GFD is invariant to re-parametrizations (same as Jeffreys) GFD is not invariant to smooth transformation of the data if n > p Consequently: GFD does not satisfy likelihood principle. Adding a multiple of a column to another column does not alter D(A). Row operations not allowed! 24

94 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 25

95 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 1. G S (u, ξ) is non-decreasing in ξ for all u 2. For all u and s the inverse Q s (u) = {ξ : s = G S (u, ξ)} =. 3. For all ξ the cdf F S (s, ξ) is continuous. 25

96 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 1. G S (u, ξ) is non-decreasing in ξ for all u 2. For all u and s the inverse Q s (u) = {ξ : s = G S (u, ξ)} =. 3. For all ξ the cdf F S (s, ξ) is continuous. Then one has unique fiducial distribution and exact coverage of one-sided confidence intervals, i.e., P (Q s (U ) ξ) = 1 F S (s, ξ). 25

97 theoretical results Basic Observations Classical Result (n=1, p=1; Frequentist) Data generating equation S = G S (U, ξ) (1-dimensional statistic) 1. G S (u, ξ) is non-decreasing in ξ for all u 2. For all u and s the inverse Q s (u) = {ξ : s = G S (u, ξ)} =. 3. For all ξ the cdf F S (s, ξ) is continuous. Then one has unique fiducial distribution and exact coverage of one-sided confidence intervals, i.e., P (Q s (U ) ξ) = 1 F S (s, ξ). If S ξ 0 then 1 F S (S, ξ 0 ) U(0, 1) fiducial p-value. 25

98 theoretical results Basic Observations Exact frequentist coverage Set P (Q s (U ) C α (s)) = 1 α. 26

99 theoretical results Basic Observations Exact frequentist coverage Set P (Q s (U ) C α (s)) = 1 α. Coverage of upper confidence limit: P ξ (ξ C α (S)) = P ξ (P (Q S (U ) ξ S) 1 α) = P (U(0, 1) 1 α)= 1 α 26

100 theoretical results Basic Observations Exact frequentist coverage Set P (Q s (U ) C α (s)) = 1 α. Coverage of upper confidence limit: P ξ (ξ C α (S)) = P ξ (P (Q S (U ) ξ S) 1 α) = P (U(0, 1) 1 α)= 1 α This is general: simulate m fiducial p-values exact conservative liberal 26

101 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? 27

102 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) 27

103 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α 27

104 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α p = 1 above: ξ (, C(s)) u (α, 1) 27

105 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α p = 1 above: ξ (, C(s)) u (α, 1) Comments: Links sets of 1 α fiducial probability for different X. 27

106 theoretical results Basic Observations Exact frequentist coverage (p > 1) Which set of fiducial probability 1 α will be confidence set? Follow pivots and Inferential Models (Liu & Martin, 2015) Select a P (U U) = α Set Q S (U) has both fiducial probability and confidence 1 α p = 1 above: ξ (, C(s)) u (α, 1) Comments: Links sets of 1 α fiducial probability for different X. Reverse: Map C(S) of fiducial probability 1 α to U. If invariant in X then exact coverage. 27

107 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y 28

108 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (U x, U y ) = x U x y U y 28

109 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (U x, U y ) = x U x y U y Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. 28

theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy )

110 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy ) = x U x y Uy Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. Visualize U = {(u x, u y ) : c 1 x u x y u c 2 } y µ x0 = 10, µ y0 = 10 coverage 95% µ x0 = 0.5, µ y0 = 0.5 coverage 98% 28

111 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy ) = x U x y Uy Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. Visualize U = {(u x, u y ) : c 1 x u x y u c 2 } y µ x0 = 10, µ y0 = 10 coverage 95% µ x0 = 0.5, µ y0 = 0.5 coverage 98% 28

112 theoretical results Basic Observations Example -- Fieller's Problem X N(µ x, 1), Y N(µ y, 1), parameter of interest η = µx µ y DGE 1: X = µ x + U x, Y = µ y + U y Marginal Fiducial RV Q x,y (Ux, Uy ) = x U x y Uy Consider equal tailed regions of fiducial probability 95%: When µ y >> 0 good frequentist performance, when µ y 0 poor performance. Visualize U = {(u x, u y ) : c 1 x u x y u c 2 } y µ x0 = 10, µ y0 = 10 coverage 95% µ x0 = 0.5, µ y0 = 0.5 coverage 98% 28

113 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 29

114 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 29

115 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 Fieller picked the set of fiducial probability 95% corresponding to U = { u 1 < 1.96} 29

116 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 Fieller picked the set of fiducial probability 95% corresponding to U = { u 1 < 1.96} Pros: Fiducial sets are linked, exact coverage is guaranteed Cons: The shape of the sets is strange (interval, complement of interval, whole real line) 29

117 theoretical results Basic Observations Example -- Fieller's Solution DGE 2: X = ηµ y + ηu 1+U 2, Y = µ 1+η 2 y + U 1+ηU 2 1+η 2 Fiducial density r(η x, y) = e (x ηy) 2 2(η 2 +1) y+xη 2π(η 2 +1) 3/2 Jacobian J(x, y, η, µ y ) = y+xη 1+η 2 Fieller picked the set of fiducial probability 95% corresponding to U = { u 1 < 1.96} Pros: Fiducial sets are linked, exact coverage is guaranteed Cons: The shape of the sets is strange (interval, complement of interval, whole real line) GFD1 GFD2 if y >> 0. 29

118 theoretical results Basic Observations Ancillary Representation (n > 1, p = 1) (4) Let (S(X), A(X)) be a smooth 1-1 transformation of X = G(U, ξ). S(X) is one dimensional satisfying 1, 2, 3. A(X) is a vector of functional ancillary statistics ( ξ A G(U, ξ) = 0). Theorem (Majumder, 2015) If (4) is satisfied GFI derived from (S, A) is exact. 30

119 theoretical results Basic Observations Ancillary Representation (n > 1, p = 1) (4) Let (S(X), A(X)) be a smooth 1-1 transformation of X = G(U, ξ). S(X) is one dimensional satisfying 1, 2, 3. A(X) is a vector of functional ancillary statistics ( ξ A G(U, ξ) = 0). Theorem (Majumder, 2015) If (4) is satisfied GFI derived from (S, A) is exact. Idea: The GFD is the same as FD based on S = G S a (U a, ξ) U a U a = A(G(U, ξ)) G S a is the restriction of G to the domain of U a. 30

120 theoretical results Basic Observations Ancillary Representation (n > 1, p = 1) (4) Let (S(X), A(X)) be a smooth 1-1 transformation of X = G(U, ξ). S(X) is one dimensional satisfying 1, 2, 3. A(X) is a vector of functional ancillary statistics ( ξ A G(U, ξ) = 0). Theorem (Majumder, 2015) If (4) is satisfied GFI derived from (S, A) is exact. Idea: The GFD is the same as FD based on S = G S a (U a, ξ) U a U a = A(G(U, ξ)) G S a is the restriction of G to the domain of U a. Same argument works for p > 1. 30

121 theoretical results Asymptotic results Various Asymptotic Results ( ) d r(ξ x) f X (x ξ)j(x, ξ) where J(x, ξ) = D dξ G(u, ξ) u=g 1 (x,ξ) Most start with C 1 n J(x, ξ) J(ξ 0, ξ) Bernstein-von Mises theorem for fiducial distributions provides asymptotic correctness of fiducial CIs H (2009, 2013), Sonderegger & H (2013). Consistency of model selection H & Lee (2009), Lai, H & Lee (2015), H, Iyer, Lai & Lee (2016). Regular higher order asymptotics in Pal Majumdar & H (2016+). 31

122 theoretical results Asymptotic results Another look Do fiducial probabilities correspond to (asymptotic) coverage? What is needed? 32

123 theoretical results Asymptotic results Another look Do fiducial probabilities correspond to (asymptotic) coverage? What is needed? Convergence of the posteriors to something nice 32

124 theoretical results Asymptotic results Another look Do fiducial probabilities correspond to (asymptotic) coverage? What is needed? Convergence of the posteriors to something nice Linkage of credible sets across all potential data. 32

125 theoretical results Asymptotic results LARGE Data X n generated using ξ n,0 and GFD measures R n,xn. 33

126 theoretical results Asymptotic results LARGE Data X n generated using ξ n,0 and GFD measures R n,xn. 1. t n (X n ) D T = (T 1, T 2 ) 2. T 1 = H 1 (V 1, ζ), T 2 = H 2 (V 2 ) H 1 and H 2 are one-to-one Q t1 (v 1 ) = ζ solves t 1 = H 1 (v 1, ζ) GFD R t Q t1 (V 1 ) H 2(V 2 ) = t 2. R t ( C) = 0 for open C Bernstein-von Mises: centered and scaled MLE T = ζ + V, V N(0, I 1 ξ 0 ) N(t, I 1 ξ 0 ) 3. Homeomorphic injective Ξ n Ξ n(ζ n,0 ) = 0 t n(x n) t implies R n,xn Ξ 1 W n R t. Ξ n(ξ) = n(ξ ξ 0 ) 33

127 theoretical results Asymptotic results LARGE theorem Theorem (H., Lai & Lee, 2016) Assume LARGE, fix 0 < α < 1, select open C n (x n ) 1. R n,xn (C n (x n )) = α 2. t n (x n ) t implies Ξ n (C n (x n )) C(t) 3. the set V t2 = {v : t = H(v, ξ) for ξ C(t)} depends on t = (t 1, t 2 ) only through ancillary t 2. 34

128 theoretical results Asymptotic results LARGE theorem Theorem (H., Lai & Lee, 2016) Assume LARGE, fix 0 < α < 1, select open C n (x n ) 1. R n,xn (C n (x n )) = α 2. t n (x n ) t implies Ξ n (C n (x n )) C(t) 3. the set V t2 = {v : t = H(v, ξ) for ξ C(t)} depends on t = (t 1, t 2 ) only through ancillary t 2. Then the sets C n (x n ) are α asymptotic confidence sets. 34

129 theoretical results Asymptotic results LARGE theorem Theorem (H., Lai & Lee, 2016) Assume LARGE, fix 0 < α < 1, select open C n (x n ) 1. R n,xn (C n (x n )) = α 2. t n (x n ) t implies Ξ n (C n (x n )) C(t) 3. the set V t2 = {v : t = H(v, ξ) for ξ C(t)} depends on t = (t 1, t 2 ) only through ancillary t 2. Then the sets C n (x n ) are α asymptotic confidence sets. If T 1 = H 1 (V 1, ξ) has group structure 3) means C(t) is invariant in t 1. 34

130 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied 35

131 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 (θ 2 θ) n+1 I ( x(n),x (1) )(θ) 35

132 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 I (θ 2 θ) n+1 ( x(n),x (1) )(θ) Transformations: ) ( ) x(1) θ 0 t n (y) = n, Ξ n (θ) = n(θ θ 0 ) ( 1/2 1/2 1/2 1/2 θ 2 0 x (n) 2θ 0 35

133 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 I (θ 2 θ) n+1 ( x(n),x (1) )(θ) Transformations: ) ( ) x(1) θ 0 t n (y) = n, Ξ n (θ) = n(θ θ 0 ) ( 1/2 1/2 1/2 1/2 θ 2 0 x (n) 2θ 0 Limit T 1 = ξ + V 1, T 2 = V 2, where V 1, V 2 are dependent. Limiting fiducial ( distribution r(ζ t) exp ζ 2θ 0 1 θ 2 0 θ 0 ) I (T1 T 2,T 1 +T 2 )(ζ). 35

134 theoretical results Asymptotic results Example - Uniform(θ, θ 2 ) The conditions are satisfied Fiducial Density r(θ x) x(2θ 1) θ2 I (θ 2 θ) n+1 ( x(n),x (1) )(θ) Transformations: ) ( ) x(1) θ 0 t n (y) = n, Ξ n (θ) = n(θ θ 0 ) ( 1/2 1/2 1/2 1/2 θ 2 0 x (n) 2θ 0 Limit T 1 = ξ + V 1, T 2 = V 2, where V 1, V 2 are dependent. Limiting fiducial ( distribution r(ζ t) exp ζ 2θ 0 1 θ 2 0 θ 0 ) I (T1 T 2,T 1 +T 2 )(ζ). One sided credible sets have correct asymptotic coverage! 35

135 applications Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

136 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE 36

137 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE Typically need a normalizing constant to J(x, ξ)f(x ξ) Ride the Bayesian Wave: MCMC (Gibbs, MH), SMC (trick is to resample right) 36

138 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE Typically need a normalizing constant to J(x, ξ)f(x ξ) Ride the Bayesian Wave: MCMC (Gibbs, MH), SMC (trick is to resample right) For p small numerical integration 36

139 applications Computational Issues When lucky: Direct Monte Carlo inversion of DGE Typically need a normalizing constant to J(x, ξ)f(x ξ) Ride the Bayesian Wave: MCMC (Gibbs, MH), SMC (trick is to resample right) For p small numerical integration Fiducial Quirk Interested in Frequentist Properties m-number of MC samples vs k-number of replications: k Cm 2 36

140 applications Distributed Data Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

141 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer 37

142 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites 37

143 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns 37

144 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns partition x into K subsets, where each subsets can be analyzed 37

145 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns partition x into K subsets, where each subsets can be analyzed e.g., use a computer cluster for parallel processing, where K is the number of nodes (or workers) 37

146 applications Distributed Data Distributed Data Motivation n is so big that the X s cannot be loaded to one computer the data are at different sites data cannot be released off site for privacy concerns partition x into K subsets, where each subsets can be analyzed e.g., use a computer cluster for parallel processing, where K is the number of nodes (or workers) merge results from different nodes 37

147 applications Distributed Data Importance sampling for Massive Data 38

148 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K 38

149 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K r(θ x) the generalized fiducial density of x r(θ x k ) the generalized fiducial density of x k 38

150 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K r(θ x) the generalized fiducial density of x r(θ x k ) the generalized fiducial density of x k On each worker sample from q k (θ) r(θ x) k importance weight q k (θ) 38

151 applications Distributed Data Importance sampling for Massive Data x = x 1 x 2... x K r(θ x) the generalized fiducial density of x r(θ x k ) the generalized fiducial density of x k On each worker sample from q k (θ) r(θ x) k importance weight q k (θ) depending on the problem, r(θ x) and r k (θ x k ) may be known only up to normalizing constant. 38

152 applications Distributed Data Naive scheme generate a fiducial sample for data on each node 39

153 applications Distributed Data Naive scheme generate a fiducial sample for data on each node compute the weight w k (θ) = r(θ x) r k (θ x k ) 39

154 applications Distributed Data Naive scheme generate a fiducial sample for data on each node compute the weight w k (θ) = Not feasible and very inefficient! r(θ x) r k (θ x k ) 39

155 applications Distributed Data Naive scheme generate a fiducial sample for data on each node compute the weight w k (θ) = r(θ x) r k (θ x k ) Not feasible and very inefficient! Target of order n 1/2 Fiducial sample on each worker of order n 1/2 k. Most realizations get extremely small weights. 39

156 applications Distributed Data Improved scheme 40

157 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers 40

158 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) 40

159 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) Practical choice g Normal(ˆθ j, γî 1 k ). 40

160 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) Practical choice g Normal(ˆθ j, γî 1 k ). Weight w k (θ) f(x k,θ) j k g(θ θ j,i j ) (Computed in parallel.) 40

161 applications Distributed Data Improved scheme Each worker computes MLE ˆθ k and empirical Fisher Information Îk and passes it to other workers Each worker simulates a sample from q(x k ) r k (θ x k ) j k g(θ ˆθ j, Îj) Practical choice g Normal(ˆθ j, γî 1 k ). Weight w k (θ) f(x k,θ) j k g(θ θ j,i j ) (Computed in parallel.) We proved consistency and asymptotic normality of the approximation error. 40

162 applications Distributed Data Experiments All good performance (K = 1, 2, 4, 8, 16, 32) Gaussian mixture: 0.6N( 1, 1) + 0.4N(1, 1) (n = ) Linear regression with Cauchy errors (n = 10 5, p T = 4, p = 6, 8, 11) 41

163 applications Distributed Data Computational time 42

164 applications Distributed Data Computational time For Cauchy: speed improves until K = 16 then deteriorates (Cheng & Shang, 2015) 42

165 applications Solar Physics Sun Spots low activity high activity 43

166 applications Solar Physics Sun Spots low activity high activity The bright flare on the right has value 253. Is this high? 43

167 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on

168 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on 2010 one instrument is Atmospheric Imaging Assembly (AIA) (Schuh et al. 2013) photographs the sun in 8 wavelengths every 12s image size: TB compressed data per day same as 3 TB raw (i.e., uncompressed) data per day 44

169 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on 2010 one instrument is Atmospheric Imaging Assembly (AIA) (Schuh et al. 2013) photographs the sun in 8 wavelengths every 12s image size: TB compressed data per day same as 3 TB raw (i.e., uncompressed) data per day ultimate goal: detect and predict solar flares 44

170 applications Solar Physics Data Solar Dynamics Observatory (SDO), launched on 2010 one instrument is Atmospheric Imaging Assembly (AIA) (Schuh et al. 2013) photographs the sun in 8 wavelengths every 12s image size: TB compressed data per day same as 3 TB raw (i.e., uncompressed) data per day ultimate goal: detect and predict solar flares Tool: GFD for Generalized Pareto (Wandler & H, 2012) 44

171 applications Solar Physics GFD for extreme quantiles Large Quantiles Fiducial Excedance Probability 45

172 applications Right Censored Data Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

173 applications Right Censored Data Right Censored Data Data generating equation: T i = F 1 (U i ), C i = G 1 i (W i F 1 (U i )), U i, W i i.i.d. Uniform(0,1) observe only X i = min(t i, C i ), δ i = I {Ti C i }. 46

174 applications Right Censored Data Right Censored Data Data generating equation: T i = F 1 (U i ), C i = G 1 i (W i F 1 (U i )), U i, W i i.i.d. Uniform(0,1) observe only X i = min(t i, C i ), δ i = I {Ti C i }. Solvie for F and G: δ i = 1 F (x i ϵ) < Ui F (x i ) G i (x i ϵ x i ) Wi δ i = 0 F (x i ) < Ui G i (x i ϵ F 1 (U i )) < W i G i (x i F 1 (U i )) 46

175 applications Right Censored Data Right Censored Data Data generating equation: T i = F 1 (U i ), C i = G 1 i (W i F 1 (U i )), U i, W i i.i.d. Uniform(0,1) observe only X i = min(t i, C i ), δ i = I {Ti C i }. Solvie for F and G: δ i = 1 F (x i ϵ) < Ui F (x i ) G i (x i ϵ x i ) Wi δ i = 0 F (x i ) < Ui G i (x i ϵ F 1 (U i )) < W i G i (x i F 1 (U i )) Generate U conditional on existence of F solving the equations. 46

176 applications Right Censored Data Visual Demonstation U i 1 0 x x x x x x x x failure censored x i U i generated so that there is a solution 47

177 applications Right Censored Data Visual Demonstation F 1 0 x x x x x x x x failure censored x i lower upper F is any cdf between bounds 47

178 applications Right Censored Data Visual Demonstation F 1 0 x x x x x x x x failure censored x i lower upper F is any cdf between bounds Fact: For any failure time E [F L (s)] < ˆF (s) < E [F U (s)]. 47

179 applications Right Censored Data Theoretical Results Known: KM estimator n[ ˆF (t) F (t)] D Wt in D[0,T] (W mean zero Gaussian process with known covariance depending on censoring) 48

180 applications Right Censored Data Theoretical Results Known: KM estimator n[ ˆF (t) F (t)] D Wt in D[0,T] (W mean zero Gaussian process with known covariance depending on censoring) Theorem (Cui, Hannig): Under similar assumptions n[f (t) ˆF (t)] X D W t almost surely. 48

181 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. 49

182 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. 49

183 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. Simultaneous CI Center on mean (or Kaplan Mayer). Find L ball containing 1 α fiducial sample curves. 49

184 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. Simultaneous CI Center on mean (or Kaplan Mayer). Find L ball containing 1 α fiducial sample curves. Fiducial p-value for testing H 0 : F = F 0 Find largest α so that F 0 is in the 1 α CI. 49

185 applications Right Censored Data Implementation Fast MC algorithm for generating samples from fiducial distribution. Quantiles of fiducial at given x approximate pointwise CIs. Simultaneous CI Center on mean (or Kaplan Mayer). Find L ball containing 1 α fiducial sample curves. Fiducial p-value for testing H 0 : F = F 0 Find largest α so that F 0 is in the 1 α CI. For difference of two populations use difference of fiducial distributions 49

186 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings 50

187 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage 50

188 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage Simultaneous CI also show good/conservative coverage 50

189 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage Simultaneous CI also show good/conservative coverage Two sample testing Good type 1 error Compared to log rank tests and sup-log rank test: competitive. 50

190 applications Right Censored Data Lessons from Simulation We simulated several one and two sample settings Pointwise simulations show good coverage Simultaneous CI also show good/conservative coverage Two sample testing Good type 1 error Compared to log rank tests and sup-log rank test: competitive. Room for improvement: Use something else than L ball half-region depth? 50

191 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. 51

192 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. 51

193 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. Different: Log-rank statistics p-value 0.63 (hazard crossing); Fiducial p-value

applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy.

194 applications Right Censored Data Gastric Tumor Study Group (1982) Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy. Different: Log-rank statistics p-value 0.63 (hazard crossing); Fiducial p-value Simulation with estimated hazard shows fiducial more powerful than competitors. 51

195 applications High D Regression Outline Introduction Definition Theoretical Results Applications Distributed Data Right Censored Data High D Regression Conclusions

196 applications High D Regression Model Selection X = G(M, ξ M, U), M M, ξ M ξ M Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions r(m y) q M ξ M f M (y, ξ M )J M (y, ξ M ) dξ M 52

197 applications High D Regression Model Selection X = G(M, ξ M, U), M M, ξ M ξ M Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions r(m y) q M ξ M f M (y, ξ M )J M (y, ξ M ) dξ M Need for penalty in fiducial framework additional equations 0 = P k, k = 1,..., min( M, n) 52

198 applications High D Regression Model Selection X = G(M, ξ M, U), M M, ξ M ξ M Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions r(m y) q M ξ M f M (y, ξ M )J M (y, ξ M ) dξ M Need for penalty in fiducial framework additional equations 0 = P k, k = 1,..., min( M, n) Default value q = n 1/2 (motivated by MDL) 52

199 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters 53

200 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters Real issue: Not too many parameters but a smaller model can do almost the same job 53

201 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters Real issue: Not too many parameters but a smaller model can do almost the same job r(m y) f M (y, ξ M )J M (y, ξ M )h M (ξ M ) dξ M, Ξ M h M (ξ M ) = { 0 a smaller model predicts nearly as well 1 otherwise 53

202 applications High D Regression Alternative to penalty - see poster! Penalty is used to discourage models with many parameters Real issue: Not too many parameters but a smaller model can do almost the same job r(m y) f M (y, ξ M )J M (y, ξ M )h M (ξ M ) dξ M, Ξ M h M (ξ M ) = { 0 a smaller model predicts nearly as well 1 otherwise Motivated by non-local priors of Johnson & Rossell (2009) 53

203 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity 54

204 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity Better: where b min solves h M (β M ) := I { 1 2 XT (X M β M Xb min ) 2 2 ε M} 1 min b R p 2 XT (X M β M Xb) 2 2 subject to b 0 M 1. algorithm Bertsimas et al (2016) 54

205 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity Better: where b min solves h M (β M ) := I { 1 2 XT (X M β M Xb min ) 2 2 ε M} 1 min b R p 2 XT (X M β M Xb) 2 2 subject to b 0 M 1. algorithm Bertsimas et al (2016) similar to Dantzig selector Candes & Tao (2007) different norm and target 54

206 applications High D Regression Regression Y = Xβ + σz First idea h M (β M ) = I { βi >ϵ, i M} issue: collinearity Better: where b min solves h M (β M ) := I { 1 2 XT (X M β M Xb min ) 2 2 ε M} 1 min b R p 2 XT (X M β M Xb) 2 2 subject to b 0 M 1. algorithm Bertsimas et al (2016) similar to Dantzig selector Candes & Tao (2007) different norm and target Call this: ε-admissible subset 54

207 applications High D Regression GFD ( r(m y) π M n M 2 Γ )RSS 2 Observations: n M 1 ( ) 2 M E[h ε M(β M)] Expectation with respect to within model GFD (usual T) 55

On Generalized Fiducial Inference

On Generalized Fiducial Inference Jan Hannig jan.hannig@colostate.edu University of North Carolina at Chapel Hill Parts of this talk are based on joint work with: Hari Iyer, Thomas C.M. Lee, Paul Patterson,