Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective

Size: px
Start display at page:

Download "Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective"

Transcription

1 Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign University of Tokyo: 25 November 2013 Joint work with Jiaying Gu (UIUC) µ^ σ 2 ^ Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

2 An Empirical Bayes Homework Problem Suppose you observe a sample {Y 1,..., Y n } and Y i N(µ i, 1) for i = 1,..., n, and would like to estimate all of the µ i s under squared error loss. We might call this incidental parameters with a vengence. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

3 An Empirical Bayes Homework Problem Suppose you observe a sample {Y 1,..., Y n } and Y i N(µ i, 1) for i = 1,..., n, and would like to estimate all of the µ i s under squared error loss. We might call this incidental parameters with a vengence. Not knowing any better, we assume that the µ i are drawn iid-ly from a distribution F so the Y i have density, g(y) = φ(y µ)df(µ), the Bayes rule is then given by Tweedie s formula: δ(y) = y + g (y) g(y) Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

4 An Empirical Bayes Homework Problem Suppose you observe a sample {Y 1,..., Y n } and Y i N(µ i, 1) for i = 1,..., n, and would like to estimate all of the µ i s under squared error loss. We might call this incidental parameters with a vengence. Not knowing any better, we assume that the µ i are drawn iid-ly from a distribution F so the Y i have density, g(y) = φ(y µ)df(µ), the Bayes rule is then given by Tweedie s formula: δ(y) = y + g (y) g(y) When F is unknown, one can try to estimate g and plug it into the Bayes rule. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

5 Stein Rules I Suppose that the µ i s were iid N(0, σ 2 0 ), so the Y i s are iid N(0, 1 + σ 2 0 ), the Bayes rule would be, ( δ(y) = 1 1 ) 1 + σ 2 y. 0 When σ 2 0 is unknown, S = Yi 2 (1 + σ2 0 )χ2 n, and recalling that an inverse χ 2 n random variable has expectation, (n 2) 1, we obtain the Stein rule in its original form: ( ˆδ(y) = 1 n 2 ) y. S Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

6 Stein Rules II More generally, if µ i N(µ 0, σ 2 0 ) we shrink instead toward the prior mean, ( δ(y) = µ ) 1 + σ 2 (y µ 0 ), 0 estimating the prior mean parameter costs us one more degree of freedom, and we obtain the celebrated James-Stein (1960) estimator, ( ˆδ(y) = Ȳ n + 1 n 3 ) (y Ȳ n ), S with Ȳ n = n 1 Y i and S = (Y i Ȳ n ) 2. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

7 Nonparametric Empirical Bayes Rules Brown and Greenshtein (Annals, 2009) propose estimating g by standard fixed bandwidth kernel methods and they compare performance of their estimated Bayes rule with various other methods including the various parametric empirical Bayes methods investigated by Johnstone and Silverman in their Needles and Haystacks paper. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

8 Nonparametric Empirical Bayes Rules Brown and Greenshtein (Annals, 2009) propose estimating g by standard fixed bandwidth kernel methods and they compare performance of their estimated Bayes rule with various other methods including the various parametric empirical Bayes methods investigated by Johnstone and Silverman in their Needles and Haystacks paper. A drawback of the kernel approach is that it fails to impose a monotonicity constraint that should hold for the Gaussian problem, or indeed for any similar problem in which we have iid observations from a mixture density, g(y) = ϕ(y, θ)df(θ) and ϕ is an exponential family density with natural parameter θ R. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

9 Back to the Homework When ϕ is an exponential family density we may write, ϕ(y, θ) = m(y)e yθ h(θ) Quadratic loss implies that the Bayes rule is a conditional mean: δ G (y) = E[Θ Y = y] = θϕ(y, θ)df/ ϕ(y, θ)df = θe yθ h(θ)df/ e yθ h(θ)df = d dy log( e yθ h(θ)df = d dy log(g(y)/m(y)) Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

10 Monotonicity of the Bayes Rule When ϕ is of the exponential family form, δ G (y) = d dy [ ] θϕdf θ 2 ( ) 2 ϕdf θϕdf = ϕdf ϕdf ϕdf = E[Θ 2 Y = y] (E[Θ Y = y]) 2 = V[Θ Y = y] 0, implying that δ G must be monotone, or equivalently that, K(y) = log ĝ(y) log m(y) is convex. Such problems are closely related to recent work on estimating log-concave densities, e.g. Cule, Samworth and Stewart (JRSSB, 2010), Koenker and Mizera (Annals, 2010), Seregin and Wellner (Annals, 2010). Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

11 Standard Gaussian Case In our homework problem, ϕ(y, θ) = φ(y θ) = K exp{ (y θ) 2 /2} = Ke y2 /2 e yθ e θ2 /2 So m(y) = e y2 /2 and the logarithmic derivative yields our Bayes rule: δ G (y) = d [ ] 1 dy 2 y2 + log g(y) = y + g (y) g(y). Estimating g by maximum likelihood subject to the constraint that K(y) = 1 2 y2 + log ĝ(y) is convex is discussed in Koenker and Mizera (2013). Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

12 Nonparametric MLE Kiefer and Wolfowitz (1956) reconsidering the Neyman and Scott (1948) problem showed that non-parametric maximum likelihood could be used to establish consistent estimators even when the number of incidental parameters tended to infinity. Laird (1978) and Heckman and Singer (1984) suggested that the EM algorithm could be used to compute the MLE in such cases. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

13 Nonparametric MLE Kiefer and Wolfowitz (1956) reconsidering the Neyman and Scott (1948) problem showed that non-parametric maximum likelihood could be used to establish consistent estimators even when the number of incidental parameters tended to infinity. Laird (1978) and Heckman and Singer (1984) suggested that the EM algorithm could be used to compute the MLE in such cases. Jiang and Zhang (Annals, 2009) adapt this approach for the empirical Bayes problem: Let u i : i = 1,..., m denote a grid on the support of the sample Y i s, then the prior (mixing) density f is estimated by the EM fixed point iteration: ˆf (k+1) j = n 1 n ˆf (k) m i=1 l=1 j φ(y i u j ) (k) ˆf l φ(y i u l ), Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

14 Nonparametric MLE Kiefer and Wolfowitz (1956) reconsidering the Neyman and Scott (1948) problem showed that non-parametric maximum likelihood could be used to establish consistent estimators even when the number of incidental parameters tended to infinity. Laird (1978) and Heckman and Singer (1984) suggested that the EM algorithm could be used to compute the MLE in such cases. Jiang and Zhang (Annals, 2009) adapt this approach for the empirical Bayes problem: Let u i : i = 1,..., m denote a grid on the support of the sample Y i s, then the prior (mixing) density f is estimated by the EM fixed point iteration: ˆf (k+1) j = n 1 n ˆf (k) m i=1 l=1 j φ(y i u j ) (k) ˆf l φ(y i u l ), and the implied Bayes rule becomes at convergence: ˆδ(Y i ) = m j=1 u jφ(y i u j )ˆf j m j=1 φ(y i u j )ˆf j. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

15 The Incredible Lethargy of EM-ing Unfortunately, EM fixed point iterations are notoriously slow and this is especially apparent in the Kiefer and Wolfowitz setting. Solutions approximate discrete (point mass) distributions, but EM goes ever so slowly. (Approximation is controlled by the grid spacing of the u i s.) f(x) GMLEBEM: m=10 2 GMLEBEM: m=10 4 GMLEBEM: m= x Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

16 Accelerating EM There is a large literature on accelerating EM iterations, but none of the recent developments seem to help very much. However, the Kiefer-Wolfowitz problem can be reformulated as a convex maximum likelihood problem and solved by standard interior point methods: can be rewritten as, max f F min{ n m log( φ(y i u j )f j ), i=1 j=1 n log(g i ) Af = g, f S}, i=1 where A = (φ(y i u j )) and S = {s R m 1 s = 1, s 0}. So f j denotes the estimated mixing density estimate ˆf at the grid point u j, and g i denotes the estimated mixture density estimate, ĝ, at Y i. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

17 Interior Point vs. EM g(x) GMLEBIP GMLEBEM: m=10 2 GMLEBEM: m=10 4 GMLEBEM: m= x Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

18 Interior Point vs. EM In the foregoing test problem we have n = 200 observations and m = 300 grid points. Timing and accuracy is summarized in this table. Estimator EM1 EM2 EM3 IP Iterations , , Time L(g) Comparison of EM and Interior Point Solutions: Iteration counts, log likelihoods and CPU times (in seconds) for three EM variants and the interior point solver. Scaling problem sizes up, the deficiency of the EM approach is even more serious. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

19 Johnstone and Silverman Simulation Design Data is generated from 12 distinct models, all of the form: Y i = µ i + u i, u i N(0, 1), i = 1,..., Of the n = 1000 observations n k of the µ i = 0, and the remaining k take one of the four values {3, 4, 5, 7}. There are three choices of k: {5, 50, 500}. There are 50 replications for each of the 12 experimental settings and 18 different competing estimators. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

20 Johnstone and Silverman Simulation Design Data is generated from 12 distinct models, all of the form: Y i = µ i + u i, u i N(0, 1), i = 1,..., Of the n = 1000 observations n k of the µ i = 0, and the remaining k take one of the four values {3, 4, 5, 7}. There are three choices of k: {5, 50, 500}. There are 50 replications for each of the 12 experimental settings and 18 different competing estimators. Performance is measured by the mean (over replications) of the sum (over the n = 1000 observations) of squared errors, so a score of 500 means that the mean squared prediction error is 0.5, or half of what the naïve prediction ˆµ i = Y i would yield if the µ i were all zero. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

21 Johnstone and Silverman Simulation Results Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

22 Performance of the NP-MLE Bayes Rule In the (now familiar) Johnstone and Silverman sweepstakes we have the following comparison of performance. Estimator k = 5 k = 50 k = ˆδ MLE IP ˆδ MLE EM ˆδ δ J-S Min Here MLE-EM is Jaing and Zhang s (2009) Bayes rule with their suggested 100 EM iterations. It does somewhat better than the shape constrained estimator, but the interior point version MLE-IP does even better. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

23 The Castillo and van der Vaart Experiment The setup is quite similar to the first earlier ones, Y i = θ i + u i, i = 1, n the θ i are most zero, but s of them take one of the values from the set {1, 2,, 5}. The sample size is n = 500, and s {25, 50, 100} and θ a takes five possible values: The first 8 rows of the Table are taken directly from Table 1 of Castillo and van der Vaart (2012). s = 25 s = 50 s = PM PM EBM PMed PMed EBMed HT HTO EBMR EBKM MSE based on 1000 replications Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

24 But How Does It Work in Theory? For the Gaussian location mixture problem empirical Bayes rules based on the Kiefer-Wolfowitz estimator are adaptively minimax. Theorem: Jiang and Zhang For the normal location mixture problem, with a (complicated) weak pth moment restriction on Θ, the approximate non-parametric MLE, ˆθ = ˆδˆF n (Y) is adaptively minimax, i.e. sup θ E n,θ L n (ˆθ, θ) inf θ sup θ Θ E n,θ L n ( θ, θ) 1. The weak pth moment condition encompasses a much broader class of both deterministic and stochastic classes Θ. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

25 Gaussian Mixtures with Longitudinal Data Model: y it = µ i + θ i u it, t = 1,, m i, 1,, n, u it N(0, 1) Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

26 Gaussian Mixtures with Longitudinal Data Model: y it = µ i + θ i u it, t = 1,, m i, 1,, n, u it N(0, 1) Sufficient Statistics: ˆµ i = m 1 i m i y it N(µ i, θ i /m i ) t=1 m i ˆθ i = (m i 1) 1 (y it ˆµ i ) 2 Γ(r i, θ i /r i ), r i = (m i 1)/2 t=1 Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

27 Gaussian Mixtures with Longitudinal Data Model: y it = µ i + θ i u it, t = 1,, m i, 1,, n, u it N(0, 1) Sufficient Statistics: ˆµ i = m 1 i m i y it N(µ i, θ i /m i ) t=1 m i ˆθ i = (m i 1) 1 (y it ˆµ i ) 2 Γ(r i, θ i /r i ), r i = (m i 1)/2 Likelihood L(F y) = t=1 n φ((ˆµ i µ i )/ θm i )/ θ i m i γ(ˆθ i r i, θ i /r i )df µ (µ)df θ (θ) i=1 Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

28 A Toy Example Model y it = µ i + θ i u it, t = 1,, m i, 1,, n, u it N(0, 1) µ i 1 3 (δ δ 1 + δ 3 ) θ i 1 3 (δ δ 2 + δ 4 ) Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

29 A Toy Example Model y it = µ i + θ i u it, t = 1,, m i, 1,, n, u it N(0, 1) µ i 1 3 (δ δ 1 + δ 3 ) θ i 1 3 (δ δ 2 + δ 4 ) Mean Mixing Distribution Variance Mixing Distribution Mixture Distribution Bayes Rule f(µ) f(θ) g(µ) δ(x) µ θ µ x Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

30 Contour Plot for Joint Bayes Rule: δ(ˆµ, ˆθ) = E(µ ˆµ, ˆθ) µ^ σ 2 ^ Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

31 Empirical Bayesball Using (ESPN) data we have constructed an unbalanced panel, 10,575 observations, on 1072 players from Following standard practice, Brown (2009, AoAS) and Jiang and Zhang (2010, Brown Festschrift) we transform batting averages to (approximate) normality: ( ) H i1 + 1/4 Ŷ i = asin N(asin( ρ), 1/(4N i1 )) N i1 + 1/2 Treating these observations as approximately Gaussian, we compute sample means and variances for each player through 2011, and estimate our independent prior model. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

32 Prior Estimates on the Gaussian Scale f(µ) f(θ) µ θ Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

33 Prior Estimates on the Batting Average Scale Gaussian Model Binomial Model f(p) f(p) p p Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

34 Dirichlet Prior Estimates on the Batting Average Scale α = 1 α = 0.01 f(µ) f(µ) µ µ Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

35 Bayes Rule Predictions Binomial Model Prior Batting Average 2012 Batting Average Gaussian Model Prior Batting Average 2012 Batting Average Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

36 Covariate Effects The location-scale mixture model is really just a starting point for more general panel data models with covariate effects and unobserved heterogeneity estimable by profile likelihood. Given the model, y it = x it β + α i + σ i u it, and a fixed β R p, we have sufficient statistics ȳ i x i β, for α i and S i = 1 m i 1 m i (y it x it β (ȳ i x i β)) 2 t=1 for σ 2 i. Clearly, ȳ i α i, β, σ 2 i N(α i + x i β, σ 2 i ) and S i β, σ 2 i Γ(r i, σ 2 i /r i), where, r i = (m i 1)/2. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

37 Profile Likelihood for Covariate Effects Reducing the likelihood to sufficient statistics we have (almost) a decomposition in terms of within and between information: L(α, β, σ) = = = K n g((α, β, σ) y i1,..., y imi ) i=1 n mi σ 1 i φ((y it x it β α i )/σ i )h(α i, σ i )dα i dσ i i=1 n i=1 t=1 S 1 r i i σ 1 i where R i = r i S i /σ 2 i, r i = (m i 1)/2, and K = n i=1 φ((ȳ i x i β α i )/σ i ) e R i R r i i S i Γ(r i ) h(α i, σ i )dα i dσ i ( Γ(ri ) r r i (1/ ) 2π) mi 1 ). i Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

38 Profile Likelihood for Covariate Effects Reducing the likelihood to sufficient statistics we have (almost) a decomposition in terms of within and between information: L(α, β, σ) = = = K n g((α, β, σ) y i1,..., y imi ) i=1 n mi σ 1 i φ((y it x it β α i )/σ i )h(α i, σ i )dα i dσ i i=1 n i=1 t=1 S 1 r i i σ 1 i where R i = r i S i /σ 2 i, r i = (m i 1)/2, and K = n i=1 φ((ȳ i x i β α i )/σ i ) e R i R r i i S i Γ(r i ) h(α i, σ i )dα i dσ i ( Γ(ri ) r r i (1/ ) 2π) mi 1 ). i But note that the likelihood doesn t factor so the between and within information isn t independent. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

39 Age and Batting Ability There is considerable controversy about the relationship between player s age and their batting ability. To explore this we collected (reported) birth years for each of the players and reestimated the model including both linear and quadratic age effects using the profile likelihood method. We evaluate the profile likelihood on a grid of parameter values, but as you will see the likelihood is quite smooth and well behaved so higher dimensional problems could be done with standard optimization software. Evaluations of the profile likelihood are quick, a few seconds for our application, with grids of a few hundred points for the mixing distributions. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

40 Profile Likelihood for the Linear Age Effect b Profile Likelihood Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

41 6 7 Contour Plot of the Quadratic Age Effect β β 1 Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

42 The Mixing Densities at the Profile MLE h(α) h(σ) α σ Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

43 The Estimated Quadratic Age Effect Age Effect Age Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

44 Predictive Performance Gaussian model Prior Batting Avergae 2012 Batting Average Naive Gaussx Gauss Binomial model Prior Batting Avergae 2012 Batting Average Naive Bmix Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

45 Predictive Performance Root Mean Squared Prediction Error Gaussian Gaussian Binomial Dirichlet Dirichlet naive Age Effects (α = 1) (α = 0.01) Dismal performance due to multimodality of the estimated mixing distribution not much shrinkage compared to naive estimator. Prior performance is not very useful for predicting future performance Model with age covariates performs slightly better. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

46 Conclusions and Extrapolations Empirical Bayes methods, employing maximum likelihood, offer some advantages over other thresholding and kernel methods, Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

47 Conclusions and Extrapolations Empirical Bayes methods, employing maximum likelihood, offer some advantages over other thresholding and kernel methods, Kernel based empirical Bayes rules can be improved with shape constrained MLEs and are computationally very efficient, but Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

48 Conclusions and Extrapolations Empirical Bayes methods, employing maximum likelihood, offer some advantages over other thresholding and kernel methods, Kernel based empirical Bayes rules can be improved with shape constrained MLEs and are computationally very efficient, but Kiefer-Wolfowitz type non-parametric MLEs perform even better. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

49 Conclusions and Extrapolations Empirical Bayes methods, employing maximum likelihood, offer some advantages over other thresholding and kernel methods, Kernel based empirical Bayes rules can be improved with shape constrained MLEs and are computationally very efficient, but Kiefer-Wolfowitz type non-parametric MLEs perform even better. There are many opportunities for linking such methods to various semi-parametric estimation problems a la Heckman and Singer (1983) and van der Vaart (1996) as for the baseball problem, Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

50 Conclusions and Extrapolations Empirical Bayes methods, employing maximum likelihood, offer some advantages over other thresholding and kernel methods, Kernel based empirical Bayes rules can be improved with shape constrained MLEs and are computationally very efficient, but Kiefer-Wolfowitz type non-parametric MLEs perform even better. There are many opportunities for linking such methods to various semi-parametric estimation problems a la Heckman and Singer (1983) and van der Vaart (1996) as for the baseball problem, It is all downhill after 27 in mathematics and baseball, Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

51 Conclusions and Extrapolations Empirical Bayes methods, employing maximum likelihood, offer some advantages over other thresholding and kernel methods, Kernel based empirical Bayes rules can be improved with shape constrained MLEs and are computationally very efficient, but Kiefer-Wolfowitz type non-parametric MLEs perform even better. There are many opportunities for linking such methods to various semi-parametric estimation problems a la Heckman and Singer (1983) and van der Vaart (1996) as for the baseball problem, It is all downhill after 27 in mathematics and baseball, Be cautious about predicting baseball batting averages, or anything else about baseball. Roger Koenker (UIUC) Empirical Bayes Tokyo: / 36

Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective

Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign Johns Hopkins: 28 October 2015 Joint work with Jiaying Gu (Toronto) and

More information

SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES

SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES ROGER KOENKER AND IVAN MIZERA Abstract. A shape constrained maximum likelihood variant of the kernel based empirical Bayes rule proposed

More information

Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization

Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization Ivan Mizera University of Alberta Edmonton, Alberta, Canada Department of Mathematical and Statistical Sciences ( Edmonton

More information

CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES

CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES CONVEX OPTIMIATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES ROGER KOENKER AND IVAN MIERA Abstract. Estimation of mixture densities for the classical Gaussian compound decision

More information

UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE

UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE JIAYING GU AND ROGER KOENKER ABSTRACT. Empirical Bayes methods for Gaussian compound decision problems involving longitudinal

More information

UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE

UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE JIAYING GU AND ROGER KOENKER Abstract. Empirical Bayes methods for Gaussian compound decision problems involving longitudinal

More information

Testing for homogeneity in mixture models

Testing for homogeneity in mixture models Testing for homogeneity in mixture models Jiaying Gu Roger Koenker Stanislav Volgushev The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP09/13 TESTING FOR HOMOGENEITY

More information

UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE

UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE UNOBSERVED HETEROGENEITY IN INCOME DYNAMICS: AN EMPIRICAL BAYES PERSPECTIVE JIAYING GU AND ROGER KOENKER UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN JOB MARKET PAPER II OCTOBER 29, 2014 Abstract. Empirical

More information

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973] Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein

More information

Hakone Seminar Recent Developments in Statistics

Hakone Seminar Recent Developments in Statistics Hakone Seminar Recent Developments in Statistics November 12-14, 2015 Hotel Green Plaza Hakone: http://www.hgp.co.jp/language/english/sp/ Organizer: Masanobu TANIGUCHI (Research Institute for Science &

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

L. Brown. Statistics Department, Wharton School University of Pennsylvania

L. Brown. Statistics Department, Wharton School University of Pennsylvania Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

A Geometric Approach to Density Estimation with Additive Noise

A Geometric Approach to Density Estimation with Additive Noise A Geometric Approach to Density Estimation with Additive Noise Stefan Wager Department of Statistics, Stanford University Stanford, CA-9435, U.S.A. swager@stanford.edu February 19, 13 Abstract We introduce

More information

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p. Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington Maximum likelihood: p. 1/7 Talk at University of Idaho, Department of Mathematics, September 15,

More information

Regularization prescriptions and convex duality: density estimation and Renyi entropies

Regularization prescriptions and convex duality: density estimation and Renyi entropies Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz,

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

Penalty Methods for Bivariate Smoothing and Chicago Land Values

Penalty Methods for Bivariate Smoothing and Chicago Land Values Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Chi-square lower bounds

Chi-square lower bounds IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

On a Problem of Robbins

On a Problem of Robbins doi:10.1111/insr.12098 On a Problem of Robbins Jiaying Gu and Roger Koenker Department of Economics, University of Illinois, Urbana, IL 61801, US E-mail: rkoenker@uiuc.edu Summary An early example of a

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November

More information

Quantile Regression for Panel/Longitudinal Data

Quantile Regression for Panel/Longitudinal Data Quantile Regression for Panel/Longitudinal Data Roger Koenker University of Illinois, Urbana-Champaign University of Minho 12-14 June 2017 y it 0 5 10 15 20 25 i = 1 i = 2 i = 3 0 2 4 6 8 Roger Koenker

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Testing for homogeneity in mixture models

Testing for homogeneity in mixture models Testing for homogeneity in mixture models Jiaying Gu Roger Koenker Stanislav Volgushev The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP39/17 TESTING FOR HOMOGENEITY

More information

Augmented Likelihood Estimators for Mixture Models

Augmented Likelihood Estimators for Mixture Models Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich What is mixture degeneracy? mixtures under study are finite convex

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Efficient Estimation in Convex Single Index Models 1

Efficient Estimation in Convex Single Index Models 1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Under Appropriate Regularity Conditions: And Without Loss of Generality

Under Appropriate Regularity Conditions: And Without Loss of Generality Under Appropriate Regularity Conditions: And Without Loss of Generality Roger Koenker University of Illinois, Urbana-Champaign MEG: 6 October 2011 Roger Koenker (UIUC) Appropriate Loss of Generality MEG:

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator

More information

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

Rate Maps and Smoothing

Rate Maps and Smoothing Rate Maps and Smoothing Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline Mapping Rates Risk

More information

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed

More information

MEDDE: MAXIMUM ENTROPY DEREGULARIZED DENSITY ESTIMATION

MEDDE: MAXIMUM ENTROPY DEREGULARIZED DENSITY ESTIMATION MEDDE: MAXIMUM ENTROPY DEREGULARIZED DENSITY ESTIMATION ROGER KOENKER Abstract. The trouble with likelihood is not the likelihood itself, it s the log. 1. Introduction Suppose that you would like to estimate

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Statistics and Econometrics I

Statistics and Econometrics I Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Shrinkage Estimation in High Dimensions

Shrinkage Estimation in High Dimensions Shrinkage Estimation in High Dimensions Pavan Srinath and Ramji Venkataramanan University of Cambridge ITA 206 / 20 The Estimation Problem θ R n is a vector of parameters, to be estimated from an observation

More information

The Threshold Selection Problem

The Threshold Selection Problem . p.1 The Threshold Selection Problem Given data X i N(µ i, 1), i =1,...,n some threshold rule, e.g. { X i X i t ˆθ i = 0 X i

More information

EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES

EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES Statistica Sinica 23 (2013), 333-357 doi:http://dx.doi.org/10.5705/ss.2011.071 EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES Noam Cohen 1, Eitan Greenshtein 1 and Ya acov Ritov 2 1 Israeli CBS

More information

A nonparametric test for path dependence in discrete panel data

A nonparametric test for path dependence in discrete panel data A nonparametric test for path dependence in discrete panel data Maximilian Kasy Department of Economics, University of California - Los Angeles, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095,

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

Theory of Statistical Tests

Theory of Statistical Tests Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Total Variation Regularization for Bivariate Density Estimation

Total Variation Regularization for Bivariate Density Estimation Total Variation Regularization for Bivariate Density Estimation Roger Koenker University of Illinois, Urbana-Champaign Oberwolfach: November, 2006 Joint work with Ivan Mizera, University of Alberta Roger

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

arxiv: v1 [math.st] 6 Nov 2012

arxiv: v1 [math.st] 6 Nov 2012 The Annals of Statistics 212, Vol. 4, No. 4, 269 211 DOI: 1.1214/12-AOS129 c Institute of Mathematical Statistics, 212 arxiv:1211.1197v1 [math.st] 6 Nov 212 NEEDLES AND STRAW IN A HAYSTACK: POSTERIOR CONCENTRATION

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Complexity of two and multi-stage stochastic programming problems

Complexity of two and multi-stage stochastic programming problems Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Forecasting with Dynamic Panel Data Models

Forecasting with Dynamic Panel Data Models Forecasting with Dynamic Panel Data Models Laura Liu 1 Hyungsik Roger Moon 2 Frank Schorfheide 3 1 University of Pennsylvania 2 University of Southern California 3 University of Pennsylvania, CEPR, and

More information

4 Invariant Statistical Decision Problems

4 Invariant Statistical Decision Problems 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that

More information