Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization

Size: px
Start display at page:

Download "Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization"

Transcription

1 Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization Ivan Mizera University of Alberta Edmonton, Alberta, Canada Department of Mathematical and Statistical Sciences ( Edmonton Eulers ) Wien, January 2016 Research supported by the Natural Sciences and Engineering Research Council of Canada co-authors: Roger Koenker (Illinois), Mu Lin, Sile Tao (Alberta) 1

2 Compound decision problem Estimate (predict) a vector µ = (µ 1,, µ n ) Many quantities here, and not that much of sample for those Observing one Y i for every µ i, with (conditionally) known distribution Example: Y i Binomial(n i, µ i ), n i known Another example: Y i N(µ i, 1) Yet another example: Y i Poisson(µ i ) µ i s assumed to be sampled ( random) iid-ly from P Thus, when the conditional density (...) of the Y i s is ϕ(y, µ), then the marginal density of the Y i s is g(y) = ϕ(y, µ)dp(µ) 2

3 A sporty example Data: known performance of individual players, typically summarized as of successes, k i, in a number, n i, of some repeated trials (bats, penalties) - typically, data not very extensive (start of the season, say); the objective is to predict true capabilities of individual players One possibility: Y i = k i Binomial(n i, µ i ) Another possibility: take Y i = arcsin k ( ) i + 1/4 n i + 1/2 N 1 µ i, 4n i Solutions via maximum likelihood ˆµ i = k i /n i or ˆµ i = µ i The overall mean (or marginal MLE) is often better than this Efron and Morris (1975), Brown (2008), Koenker and Mizera (2014): bayesball 3

4 NBA data (Agresti, 2002) player n k prop 1 Yao Frye Camby Okur Blount Mihm it may be better to take 7 Ilgauskas the overall mean! 8 Brown Curry Miller Haywood Olowokandi Mourning Wallace Ostertag

5 An insurance example Y i - known number of accidents of individual insured motorists Predict their expected number - rate, µ i (in next year, say) Y i Poisson(µ i ) Maximum likelihood: ˆµ i = Y i Nothing better? 5

6 The data of Simar (1976) 6

7 So, what is better? First, what is better? We express it via some (expected) loss function Most often it is averaged or aggregated squared error loss ( ˆµ i µ i ) 2 i But it could be also some other loss... And then? Well, it is sooo simple... 7

8 ... if P is known! µ i s are sampled iid-ly from P - prior distribution Conditionally on µ i, the distribution of Y i is, say, N(µ i, 1) The optimal prediction is the posterior mean, the mean of the posterior distribution: conditional distribution of µ i given Y i (given that the loss function is quadratic!) For instance, if P is N(0, σ 2 ), then (homework) the best predictor is ˆµ i = Y i 1 σ Y i Borrowing strength via shrinkage neither will be the good that good, nor the bad that bad More generally, µ i can be N(µ, σ 2 ) and Y i then N(µ i, σ 2 0 ), And then ˆµ i = Y i σ2 0 σ 2 + σ 2 (Y i µ) (if σ 2 = σ 2 0, halfway to µ) 0 8

9 If only all of them published posthumously... Thomas Bayes ( ) 9

10 10 But do we know P (or σ 2 )? Hierarchical model Random effects Smoothing Empirical Bayes no less Bayes than empirical Bayes we know it is frequentist, but frequentists think it is Bayesian, so this is why we discuss it here Many inventors...

11 11 What is mathematics? Herbert Ellis Robbins ( )

12 12 On experience in statistical decision theory (1954) Antonín Špaček ( )

13 13 I. J. Good (2000) Alann Mathison Turing ( )

14 14 So, how A. we may try to estimate the prior - f-modeling, Efron (2014) B. or more directly, the prediction rule - g-modeling A. Estimated normal prior (parametric) (Nonparametric ouverture) A. Empirical prior (nonparametric) B. Empirical prediction rule (nonparametric) Simulation contests

15 15 A. Estimated normal prior James-Stein (JS): if P is N(0, σ 2 ) 1 then the unknown part, σ 2, of the prediction rule + 1 can be estimated by n 2 S, where S = Yi 2 i

16 15 A. Estimated normal prior James-Stein (JS): if P is N(0, σ 2 ) 1 then the unknown part, σ 2, of the prediction rule + 1 can be estimated by n 2 S, where S = Yi 2 i For general µ in place of 0, the rule is ˆµ i = Y i n 3 S (Y i Ȳ), with Ȳ = 1 Y i and S = n i i (Y i Ȳ) 2

17 16 JS as empirical Bayes: Efron and Morris (1975) Charles Stein (1920 )

18 17 Nonparametric ouverture: MLE of density Density estimation: given the datapoints X 1, X 2,..., X n, solve or equivalently n i=1 g(x i ) max g! n i=1 under the side conditions log g(x i ) min g! g 0, g = 1

19 18 Doesn t work

20 How to prevent Dirac catastrophe? 19

21 20 Reference Koenker and Mizera (2014)... and those that cite it (Google Scholar)... the chance meeting on a dissecting-table of a sewing-machine and an umbrella See also REBayes package on CRAN For simplicity: ϕ(y, µ) = ϕ(y µ), and the latter is standard normal density

22 21 A. Empirical prior MLE of P: Kiefer and Wolfowitz (1956) i ( ) log ϕ(y i u) dp(u) min! P The regularizer is the fact that it is a mixture No tuning parameter needed (but known form of ϕ!) The resulting ˆP is atomic ( empirical prior ) However, it is an infinite-dimensional problem...

23 22 EM nonsense Laird (1978), Jiang and Zhang (2009): Use a grid {u 1,...u m } (m = 1000) containing the support of the observed sample and estimate the prior density via EM iterations ˆp (k+1) j = 1 n n i=1 ˆp (k) j ϕ(y i u j ) m l=1 ˆp(k) l ϕ(y i u l ), Sloooooow... (original versions: 55 hours for 1000 replications)

24 23 Convex optimization! Koenker and Mizera (2014): it is a convex problem! i When discretized ( ) log ϕ(y i u) dp(u) min! P i log ( m ϕ(y i u j )p j ) min p! or in a more technical form log y i min! Az = y and z S y i where A = (ϕ(y i u j )) and S = {s R m : 1 s = 1, s 0}.

25 24 With a dual The solution is an atomic probability measure, with not more than n atoms. The locations, ˆµ j, and the masses, ˆp j, at these locations can be found via the following dual characterization: the solution, ˆν, of n log ν i max! n ν i ϕ(y i µ) n for all µ µ i=1 i=1 satisfies the extremal equations ϕ(y i ˆµ j ) ˆp j = 1ˆν, i j and ˆµ j are exactly those µ where the dual constraint is active. And one can use modern convex optimization methods again... (And note: everything goes through for general ϕ(y, µ)) (And one can also handle - numerically - alternative loss functions!)

26 Koenker and Mizera A typical result: µ i drawn from U(5, 15) g(y) δ(y) y y Figure 2. Estimated mixture density, ĝ, and corresponding Left: mixture Bayes rule, ˆδ, density (blue: target) for a simple compound decision problem. The target Right: Bayes decision rule rule and (blue: its target) mixture density are again plotted in dashed blue. In contrast to the shape constrained estima- 25

27 26 B. Empirical prediction rule Lawrence Brown, personal communication Also, looks like in Maritz and Lwin (1989) Do not estimate P, but rather the prediction rule Tweedie formula: for known (general) P, and hence known g, the Bayes rule is δ(y) = y + σ 2 g (y) g(y) One may try to estimate g and plug it in - when knowing σ 2 (=1, for instance) Brown and Greenshtein (2009) by an exponential family argument, δ(y) is nondecreasing in y (van Houwelingen & Stijnen, 1983) (that came automatic when the prior is estimated)

28 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex

29 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 log g(x i ) min g! g 0, g = 1

30 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 log g(x i ) min g! log g convex g 0 gdx = 1

31 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h 0 e h dx = 1

32 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h 0 e h dx = 1

33 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h dx = 1

34 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h dx = 1

35 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) + e h dx min! h convex h

36 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) + e h dx min! h 1 2 y2 + h(y) convex

37 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) + e h dx min! h 1 2 y2 + h(y) convex The regularizer is the monotonicity constraint No tuning parameter, or knowledge of ϕ - but knowing all the time that σ 2 = 1 A convex problem again

38 28 Some remarks After reparametrization, omitting constants, etc. one can write it as a solution of an equivalent problem 1 n K(Y i ) + e K(y) dφ c (y) min n! K K K Compare: 1 n i=1 n i=1 h(x i ) + e h dx min! h K h

39 29 Dual formulation Analogous to Koenker and Mizera (2010): The solution, ˆK, exists and is piecewise linear. It admits a dual characterization: e ˆK(y) = ˆf, where ˆf is the solution of f(y) log f(y)dφ(y) min! f = d(p n G), G K f dφ The estimated decision rule, ˆδ, is piecewise constant and has no jumps at min Y i and max Y i.

40 30 the monotonicity A typical requirement result: and perform µ i drawn quitefrom well asu(5, we shall 15) see in Section 5. g(y) δ(y) y y Figure 1. Estimated mixture density, ĝ, and corresponding Left: mixture density (blue: target) Bayes rule, ˆδ, for a simple compound decision problem. Right: The target piecewise Bayes constant, rule and empirical its mixture decision density are rule plotted as smooth (blue) lines. The local maxima give y for which ˆδ(y) = y.

41 31 Doable also for some other exponential families However: a version of the Tweedie formula may be obtainable only for the canonical parameter (binomial!) and depends on the loss function For the Poisson case: - the optimal prediction with respect to the quadratic loss function is, for x = 0, 1, 2,..., ˆµ(x) = (x + 1)g(x + 1) g(x), where g is the marginal density of the Y i s - for the loss function (µ ˆµ) 2 /µ, the optimal prediction is, for x = 1, 2,... ˆµ(x) = xg(x) g(x 1).

42 32 What can be done with that? One can estimate g(x) by the relative frequency, as Robbins (1956): ˆµ(x) = (x + 1)#{Y i = x + 1} n #{Y i = x} n = (x + 1)#{Y i = x + 1} #{Y i = x} however, the predictions obtained this way are not monotone, and also erratic, especially when some denominator is 0 - the latter can be rectified by the adjustment of Maritz and Lwin (1989): ˆµ(x) = (x + 1)#{Y i = x + 1} 1 + #{Y i = x}

43 33 Better: monotonizations The suggestion of van Houwelingen & Stijnen (1983): pool adjacent violators - also requires a grid Or one can estimate the marginal density under the shape-restriction that the resulting prediction is monotone: (x + 1)ĝ(x + 1) ĝ(x) (x + 2)ĝ(x + 2) ĝ(x + 1) After reparametrization in terms of logarithms, the problem is almost linear: linear constraint resulting from the one above, and linear objective function - with a nonlinear Lagrange term ensuring that the result is a probability mass function. At any rate, again a convex problem - and the number of variables is the number of the x s

44 34 Why all this is feasible: interior point methods (Leave optimization to experts) Andersen, Christiansen, Conn, and Overton (2000) We acknowledge using Mosek, a Danish optimization software Mosek: E. D. Andersen (2010) PDCO: Saunders (2003) Nesterov and Nemirovskii (1994) Boyd, Grant and Ye: Disciplined Convex Programming Folk wisdom: If it is convex, it will fly.

45 35 Simulations - or how to be highly cited Johnstone and Silverman (2004): empirical Bayes for sparsity n = 1000 observations k of which have µ all equal to one of the 4 values, 3, 4, 5, 7 the remaining n k have µ = 0 there are three choices of k: 5, 50, 500 Criterion: sum of squared errors, averaged over replications, and rounded Seems like this scenario (or similar ones) became popular

46 The first race Koenker and Mizera 7 Estimator k = 5 k = 50 k = 500 µ =3 µ =4 µ =5 µ =7 µ =3 µ =4 µ =5 µ =7 µ =3 µ =4 µ =5 µ =7 ˆδ ˆδGMLEBIP ˆδGMLEBEM δ J-S Min Table 2. Risk of Shape Constrained Rule, ˆδ compared to: two versions of prediction Jaing and Zhang s rule GMLEB procedure, one using 100 EM empirical empirical iterationsprior, denoted implementation GMLEBEM and the via other, convex GMLEBIP, optimization using the empirical interior point prior, algorithm implementation described in the viatext, EMthe kernel procedure, Brown δ 1.15 and of Brown Greenshtein and Greenshtein, (2009): and50best replications procedure of Johnstone and report Silverman. (best?) Sum results of squared for bandwith-related errors in n = 1000 observations. constant 1.15 Johnstone Reportedand entries Silverman are based on (2004): 1000, 100, 100, replications, 50 and 100 replications, (only respectively. their winner reported here, J-S 18 methods Min) 36

47 37 A new lineup BL DL(1/n) DL(1/2) HS EBMW EBB EBKM oracle Bhattacharya, Pati, Pillai, Dunson (2012): Bayesian shrinkage BL: Bayesian Lasso DL: Dirichlet-Laplace priors (with different strengths) HS: Carvalho, Polson, and Scott (2009) horseshoe priors EBMW: asympt. minimax EB of Martin and Walker (2013) elsewhere: Castillo & van der Vaart (2012) posterior concentration

48 38 Comments (Conclusions?) both approaches typically outperform other methods Kiefer-Wolfowitz empirical prior typically outperforms monotone empirical Bayes (for the examples we considered!) both methods adapt to general P, in particular to those with multiple modes however, Kiefer-Wolfowitz empirical prior is more flexible: (much) better adapts to certain peculiarities vital in practical data analysis, like unequal σ i, inclusion of covariates, etc in particular, it also exhibits certain independence of the choice of the loss function (the estimate of the prior, and hence posterior is always the same) but, in certain situations Kiefer-Wolfowitz (on the grid!) may be more computationally demanding

49 39 NBA data again player n prop k ast sigma ebkw jsmm glmm lmer 1 Yao Frye Camby Okur Blount Mihm Ilgauskas Brown Curry Miller Haywood Olowokandi Mourning Wallace Ostertag

50 A (partial) picture nba$prop p kw glmm lmer js nba$prop 40

51 Mixing distribution ( empirical prior )

52 Mixing distribution for glmm nba.glmm$p pgr 42

53 The auto insurance predictions acd.mon$robbmon acd$rate 43

54 44

55 45 That s it? What if P is unimodal? Cannot we do better in such a case?

56 45 That s it? What if P is unimodal? Cannot we do better in such a case? And if we can, will it be (significantly) better than James-Stein?

57 45 That s it? What if P is unimodal? Cannot we do better in such a case? And if we can, will it be (significantly) better than James-Stein? Joint work with Mu Lin

58 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context)

59 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context) However, the resulting problem is not convex!

60 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context) However, the resulting problem is not convex! Nevertheless, given that: log-concavity of P + that of ϕ implies that of the convolution g(y) = ϕ(y µ)dp(µ)

61 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context) However, the resulting problem is not convex! Nevertheless, given that: log-concavity of P + that of ϕ implies that of the convolution g(y) = ϕ(y µ)dp(µ) one can impose log-concavity on the mixture! (So that the resulting formulation then a convex problem is.)

62 47 3. Unimodal Kiefer-Wolfowitz g min P! g = i ( ) log ϕ(y i u) dp(u) (Works, but needs a special version of Mosek) May be demanding for large sample sizes

63 47 3. Unimodal Kiefer-Wolfowitz g min P! g = i ( ) log ϕ(y i u) dp(u) and g convex (Works, but needs a special version of Mosek) May be demanding for large sample sizes

64 47 3. Unimodal Kiefer-Wolfowitz g min g,p! g i ( ) log ϕ(y i u) dp(u) and g convex (Works, but needs a special version of Mosek) May be demanding for large sample sizes

65 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! h 1 2 y2 + h(y) convex

66 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! 1 + h (y) > 0 h

67 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! h (y) > 1 h

68 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! 0 > h (y) > 1 h

69 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! 0 > h (y) > 1 h Very easy, very fast

70 49 A typical result, again from U(5, 15) Mixture density Prediction rule y y (Empirical prior, mixture unimodal)

71 50 A typical result, again from U(5, 15) Mixture density Prediction rule y y (Empirical prediction rule, mixture unimodal)

72 51 Some simulations Sum of squared errors, averaged over replications, rounded U[5, 15] t 3 χ br kw brlc kwlc mle js oracle Last four: the mixtures of Johnstone and Silverman (2004): n = 1000 observations, with 5% or 50% of µ equal to 2 or 5 and the remaining ones are 0

73 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate

74 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay

75 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay - unimodal Kiefer-Wolfowitz still appears to outperform the unimodal monotonized empirical Bayes by small margin

76 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay - unimodal Kiefer-Wolfowitz still appears to outperform the unimodal monotonized empirical Bayes by small margin - and both outperform James-Stein, significantly for asymmetric mixing distribution

77 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay - unimodal Kiefer-Wolfowitz still appears to outperform the unimodal monotonized empirical Bayes by small margin - and both outperform James-Stein, significantly for asymmetric mixing distribution - computationally, unimodal monotonized empirical Bayes is much more painless than unimodal Kiefer-Wolfowitz

Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective

Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign University of Tokyo: 25 November 2013 Joint work with Jiaying Gu (UIUC)

More information

SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES

SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES ROGER KOENKER AND IVAN MIZERA Abstract. A shape constrained maximum likelihood variant of the kernel based empirical Bayes rule proposed

More information

Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective

Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign Johns Hopkins: 28 October 2015 Joint work with Jiaying Gu (Toronto) and

More information

CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES

CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES CONVEX OPTIMIATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES ROGER KOENKER AND IVAN MIERA Abstract. Estimation of mixture densities for the classical Gaussian compound decision

More information

Regularization prescriptions and convex duality: density estimation and Renyi entropies

Regularization prescriptions and convex duality: density estimation and Renyi entropies Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz,

More information

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973] Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein

More information

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES

EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES Statistica Sinica 23 (2013), 333-357 doi:http://dx.doi.org/10.5705/ss.2011.071 EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES Noam Cohen 1, Eitan Greenshtein 1 and Ya acov Ritov 2 1 Israeli CBS

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Penalty Methods for Bivariate Smoothing and Chicago Land Values

Penalty Methods for Bivariate Smoothing and Chicago Land Values Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001

More information

L. Brown. Statistics Department, Wharton School University of Pennsylvania

L. Brown. Statistics Department, Wharton School University of Pennsylvania Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

The Poisson Compound Decision Problem Revisited

The Poisson Compound Decision Problem Revisited The Poisson Compound Decision Problem Revisited L. Brown, University of Pennsylvania; e-mail: lbrown@wharton.upenn.edu E. Greenshtein Israel Census Bureau of Statistics; e-mail: eitan.greenshtein@gmail.com

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Total Variation Regularization for Bivariate Density Estimation

Total Variation Regularization for Bivariate Density Estimation Total Variation Regularization for Bivariate Density Estimation Roger Koenker University of Illinois, Urbana-Champaign Oberwolfach: November, 2006 Joint work with Ivan Mizera, University of Alberta Roger

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Augmented Likelihood Estimators for Mixture Models

Augmented Likelihood Estimators for Mixture Models Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich What is mixture degeneracy? mixtures under study are finite convex

More information

Horseshoe, Lasso and Related Shrinkage Methods

Horseshoe, Lasso and Related Shrinkage Methods Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Supplementary Information for False Discovery Rates, A New Deal

Supplementary Information for False Discovery Rates, A New Deal Supplementary Information for False Discovery Rates, A New Deal Matthew Stephens S.1 Model Embellishment Details This section details some of the model embellishments only briefly mentioned in the paper.

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Machine learning, shrinkage estimation, and economic theory

Machine learning, shrinkage estimation, and economic theory Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such

More information

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Testing for homogeneity in mixture models

Testing for homogeneity in mixture models Testing for homogeneity in mixture models Jiaying Gu Roger Koenker Stanislav Volgushev The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP09/13 TESTING FOR HOMOGENEITY

More information

Topics in empirical Bayesian analysis

Topics in empirical Bayesian analysis Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2016 Topics in empirical Bayesian analysis Robert Christian Foster Iowa State University Follow this and additional

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

arxiv: v3 [stat.me] 10 Aug 2018

arxiv: v3 [stat.me] 10 Aug 2018 Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

High-dimensional classification via nonparametric empirical Bayes and maximum likelihood

High-dimensional classification via nonparametric empirical Bayes and maximum likelihood High-dimensional classification via nonparametric empirical Bayes and maximum likelihood Lee H. Dicker Sihai D. Zhao Department of Statistics and Biostatistics Rutgers University Piscataway, NJ 08854 e-mail:

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Constrained estimation for binary and survival data

Constrained estimation for binary and survival data Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline

More information

On a Problem of Robbins

On a Problem of Robbins doi:10.1111/insr.12098 On a Problem of Robbins Jiaying Gu and Roger Koenker Department of Economics, University of Illinois, Urbana, IL 61801, US E-mail: rkoenker@uiuc.edu Summary An early example of a

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

LECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?

LECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples? LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position

More information

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Forecasting with Dynamic Panel Data Models

Forecasting with Dynamic Panel Data Models Forecasting with Dynamic Panel Data Models Laura Liu 1 Hyungsik Roger Moon 2 Frank Schorfheide 3 1 University of Pennsylvania 2 University of Southern California 3 University of Pennsylvania, CEPR, and

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

Transformations and Bayesian Density Estimation

Transformations and Bayesian Density Estimation Transformations and Bayesian Density Estimation Andrew Bean 1, Steve MacEachern, Xinyi Xu The Ohio State University 10 th Conference on Bayesian Nonparametrics June 25, 2015 1 bean.243@osu.edu Transformations

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

arxiv: v1 [math.st] 13 Feb 2017

arxiv: v1 [math.st] 13 Feb 2017 Adaptive posterior contraction rates for the horseshoe arxiv:72.3698v [math.st] 3 Feb 27 Stéphanie van der Pas,, Botond Szabó,,, and Aad van der Vaart, Leiden University and Budapest University of Technology

More information