Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization
|
|
- Phoebe Morgan
- 5 years ago
- Views:
Transcription
1 Borrowing Strength from Experience: Empirical Bayes Methods and Convex Optimization Ivan Mizera University of Alberta Edmonton, Alberta, Canada Department of Mathematical and Statistical Sciences ( Edmonton Eulers ) Wien, January 2016 Research supported by the Natural Sciences and Engineering Research Council of Canada co-authors: Roger Koenker (Illinois), Mu Lin, Sile Tao (Alberta) 1
2 Compound decision problem Estimate (predict) a vector µ = (µ 1,, µ n ) Many quantities here, and not that much of sample for those Observing one Y i for every µ i, with (conditionally) known distribution Example: Y i Binomial(n i, µ i ), n i known Another example: Y i N(µ i, 1) Yet another example: Y i Poisson(µ i ) µ i s assumed to be sampled ( random) iid-ly from P Thus, when the conditional density (...) of the Y i s is ϕ(y, µ), then the marginal density of the Y i s is g(y) = ϕ(y, µ)dp(µ) 2
3 A sporty example Data: known performance of individual players, typically summarized as of successes, k i, in a number, n i, of some repeated trials (bats, penalties) - typically, data not very extensive (start of the season, say); the objective is to predict true capabilities of individual players One possibility: Y i = k i Binomial(n i, µ i ) Another possibility: take Y i = arcsin k ( ) i + 1/4 n i + 1/2 N 1 µ i, 4n i Solutions via maximum likelihood ˆµ i = k i /n i or ˆµ i = µ i The overall mean (or marginal MLE) is often better than this Efron and Morris (1975), Brown (2008), Koenker and Mizera (2014): bayesball 3
4 NBA data (Agresti, 2002) player n k prop 1 Yao Frye Camby Okur Blount Mihm it may be better to take 7 Ilgauskas the overall mean! 8 Brown Curry Miller Haywood Olowokandi Mourning Wallace Ostertag
5 An insurance example Y i - known number of accidents of individual insured motorists Predict their expected number - rate, µ i (in next year, say) Y i Poisson(µ i ) Maximum likelihood: ˆµ i = Y i Nothing better? 5
6 The data of Simar (1976) 6
7 So, what is better? First, what is better? We express it via some (expected) loss function Most often it is averaged or aggregated squared error loss ( ˆµ i µ i ) 2 i But it could be also some other loss... And then? Well, it is sooo simple... 7
8 ... if P is known! µ i s are sampled iid-ly from P - prior distribution Conditionally on µ i, the distribution of Y i is, say, N(µ i, 1) The optimal prediction is the posterior mean, the mean of the posterior distribution: conditional distribution of µ i given Y i (given that the loss function is quadratic!) For instance, if P is N(0, σ 2 ), then (homework) the best predictor is ˆµ i = Y i 1 σ Y i Borrowing strength via shrinkage neither will be the good that good, nor the bad that bad More generally, µ i can be N(µ, σ 2 ) and Y i then N(µ i, σ 2 0 ), And then ˆµ i = Y i σ2 0 σ 2 + σ 2 (Y i µ) (if σ 2 = σ 2 0, halfway to µ) 0 8
9 If only all of them published posthumously... Thomas Bayes ( ) 9
10 10 But do we know P (or σ 2 )? Hierarchical model Random effects Smoothing Empirical Bayes no less Bayes than empirical Bayes we know it is frequentist, but frequentists think it is Bayesian, so this is why we discuss it here Many inventors...
11 11 What is mathematics? Herbert Ellis Robbins ( )
12 12 On experience in statistical decision theory (1954) Antonín Špaček ( )
13 13 I. J. Good (2000) Alann Mathison Turing ( )
14 14 So, how A. we may try to estimate the prior - f-modeling, Efron (2014) B. or more directly, the prediction rule - g-modeling A. Estimated normal prior (parametric) (Nonparametric ouverture) A. Empirical prior (nonparametric) B. Empirical prediction rule (nonparametric) Simulation contests
15 15 A. Estimated normal prior James-Stein (JS): if P is N(0, σ 2 ) 1 then the unknown part, σ 2, of the prediction rule + 1 can be estimated by n 2 S, where S = Yi 2 i
16 15 A. Estimated normal prior James-Stein (JS): if P is N(0, σ 2 ) 1 then the unknown part, σ 2, of the prediction rule + 1 can be estimated by n 2 S, where S = Yi 2 i For general µ in place of 0, the rule is ˆµ i = Y i n 3 S (Y i Ȳ), with Ȳ = 1 Y i and S = n i i (Y i Ȳ) 2
17 16 JS as empirical Bayes: Efron and Morris (1975) Charles Stein (1920 )
18 17 Nonparametric ouverture: MLE of density Density estimation: given the datapoints X 1, X 2,..., X n, solve or equivalently n i=1 g(x i ) max g! n i=1 under the side conditions log g(x i ) min g! g 0, g = 1
19 18 Doesn t work
20 How to prevent Dirac catastrophe? 19
21 20 Reference Koenker and Mizera (2014)... and those that cite it (Google Scholar)... the chance meeting on a dissecting-table of a sewing-machine and an umbrella See also REBayes package on CRAN For simplicity: ϕ(y, µ) = ϕ(y µ), and the latter is standard normal density
22 21 A. Empirical prior MLE of P: Kiefer and Wolfowitz (1956) i ( ) log ϕ(y i u) dp(u) min! P The regularizer is the fact that it is a mixture No tuning parameter needed (but known form of ϕ!) The resulting ˆP is atomic ( empirical prior ) However, it is an infinite-dimensional problem...
23 22 EM nonsense Laird (1978), Jiang and Zhang (2009): Use a grid {u 1,...u m } (m = 1000) containing the support of the observed sample and estimate the prior density via EM iterations ˆp (k+1) j = 1 n n i=1 ˆp (k) j ϕ(y i u j ) m l=1 ˆp(k) l ϕ(y i u l ), Sloooooow... (original versions: 55 hours for 1000 replications)
24 23 Convex optimization! Koenker and Mizera (2014): it is a convex problem! i When discretized ( ) log ϕ(y i u) dp(u) min! P i log ( m ϕ(y i u j )p j ) min p! or in a more technical form log y i min! Az = y and z S y i where A = (ϕ(y i u j )) and S = {s R m : 1 s = 1, s 0}.
25 24 With a dual The solution is an atomic probability measure, with not more than n atoms. The locations, ˆµ j, and the masses, ˆp j, at these locations can be found via the following dual characterization: the solution, ˆν, of n log ν i max! n ν i ϕ(y i µ) n for all µ µ i=1 i=1 satisfies the extremal equations ϕ(y i ˆµ j ) ˆp j = 1ˆν, i j and ˆµ j are exactly those µ where the dual constraint is active. And one can use modern convex optimization methods again... (And note: everything goes through for general ϕ(y, µ)) (And one can also handle - numerically - alternative loss functions!)
26 Koenker and Mizera A typical result: µ i drawn from U(5, 15) g(y) δ(y) y y Figure 2. Estimated mixture density, ĝ, and corresponding Left: mixture Bayes rule, ˆδ, density (blue: target) for a simple compound decision problem. The target Right: Bayes decision rule rule and (blue: its target) mixture density are again plotted in dashed blue. In contrast to the shape constrained estima- 25
27 26 B. Empirical prediction rule Lawrence Brown, personal communication Also, looks like in Maritz and Lwin (1989) Do not estimate P, but rather the prediction rule Tweedie formula: for known (general) P, and hence known g, the Bayes rule is δ(y) = y + σ 2 g (y) g(y) One may try to estimate g and plug it in - when knowing σ 2 (=1, for instance) Brown and Greenshtein (2009) by an exponential family argument, δ(y) is nondecreasing in y (van Houwelingen & Stijnen, 1983) (that came automatic when the prior is estimated)
28 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex
29 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 log g(x i ) min g! g 0, g = 1
30 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 log g(x i ) min g! log g convex g 0 gdx = 1
31 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h 0 e h dx = 1
32 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h 0 e h dx = 1
33 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h dx = 1
34 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) min h! h convex e h dx = 1
35 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) + e h dx min! h convex h
36 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) + e h dx min! h 1 2 y2 + h(y) convex
37 27 Monotone (estimate of) empirical Bayes rule Maximum likelihood again (h = log g) - but with some shape-constraint regularization, - like log-concavity: (log g) 0 - but we rather want y + g (y) g(y) = y + (log g(y)) nondecreasing - that is, 1 2 y2 + log g(y) = 1 2 y2 + h(y) convex n i=1 h(x i ) + e h dx min! h 1 2 y2 + h(y) convex The regularizer is the monotonicity constraint No tuning parameter, or knowledge of ϕ - but knowing all the time that σ 2 = 1 A convex problem again
38 28 Some remarks After reparametrization, omitting constants, etc. one can write it as a solution of an equivalent problem 1 n K(Y i ) + e K(y) dφ c (y) min n! K K K Compare: 1 n i=1 n i=1 h(x i ) + e h dx min! h K h
39 29 Dual formulation Analogous to Koenker and Mizera (2010): The solution, ˆK, exists and is piecewise linear. It admits a dual characterization: e ˆK(y) = ˆf, where ˆf is the solution of f(y) log f(y)dφ(y) min! f = d(p n G), G K f dφ The estimated decision rule, ˆδ, is piecewise constant and has no jumps at min Y i and max Y i.
40 30 the monotonicity A typical requirement result: and perform µ i drawn quitefrom well asu(5, we shall 15) see in Section 5. g(y) δ(y) y y Figure 1. Estimated mixture density, ĝ, and corresponding Left: mixture density (blue: target) Bayes rule, ˆδ, for a simple compound decision problem. Right: The target piecewise Bayes constant, rule and empirical its mixture decision density are rule plotted as smooth (blue) lines. The local maxima give y for which ˆδ(y) = y.
41 31 Doable also for some other exponential families However: a version of the Tweedie formula may be obtainable only for the canonical parameter (binomial!) and depends on the loss function For the Poisson case: - the optimal prediction with respect to the quadratic loss function is, for x = 0, 1, 2,..., ˆµ(x) = (x + 1)g(x + 1) g(x), where g is the marginal density of the Y i s - for the loss function (µ ˆµ) 2 /µ, the optimal prediction is, for x = 1, 2,... ˆµ(x) = xg(x) g(x 1).
42 32 What can be done with that? One can estimate g(x) by the relative frequency, as Robbins (1956): ˆµ(x) = (x + 1)#{Y i = x + 1} n #{Y i = x} n = (x + 1)#{Y i = x + 1} #{Y i = x} however, the predictions obtained this way are not monotone, and also erratic, especially when some denominator is 0 - the latter can be rectified by the adjustment of Maritz and Lwin (1989): ˆµ(x) = (x + 1)#{Y i = x + 1} 1 + #{Y i = x}
43 33 Better: monotonizations The suggestion of van Houwelingen & Stijnen (1983): pool adjacent violators - also requires a grid Or one can estimate the marginal density under the shape-restriction that the resulting prediction is monotone: (x + 1)ĝ(x + 1) ĝ(x) (x + 2)ĝ(x + 2) ĝ(x + 1) After reparametrization in terms of logarithms, the problem is almost linear: linear constraint resulting from the one above, and linear objective function - with a nonlinear Lagrange term ensuring that the result is a probability mass function. At any rate, again a convex problem - and the number of variables is the number of the x s
44 34 Why all this is feasible: interior point methods (Leave optimization to experts) Andersen, Christiansen, Conn, and Overton (2000) We acknowledge using Mosek, a Danish optimization software Mosek: E. D. Andersen (2010) PDCO: Saunders (2003) Nesterov and Nemirovskii (1994) Boyd, Grant and Ye: Disciplined Convex Programming Folk wisdom: If it is convex, it will fly.
45 35 Simulations - or how to be highly cited Johnstone and Silverman (2004): empirical Bayes for sparsity n = 1000 observations k of which have µ all equal to one of the 4 values, 3, 4, 5, 7 the remaining n k have µ = 0 there are three choices of k: 5, 50, 500 Criterion: sum of squared errors, averaged over replications, and rounded Seems like this scenario (or similar ones) became popular
46 The first race Koenker and Mizera 7 Estimator k = 5 k = 50 k = 500 µ =3 µ =4 µ =5 µ =7 µ =3 µ =4 µ =5 µ =7 µ =3 µ =4 µ =5 µ =7 ˆδ ˆδGMLEBIP ˆδGMLEBEM δ J-S Min Table 2. Risk of Shape Constrained Rule, ˆδ compared to: two versions of prediction Jaing and Zhang s rule GMLEB procedure, one using 100 EM empirical empirical iterationsprior, denoted implementation GMLEBEM and the via other, convex GMLEBIP, optimization using the empirical interior point prior, algorithm implementation described in the viatext, EMthe kernel procedure, Brown δ 1.15 and of Brown Greenshtein and Greenshtein, (2009): and50best replications procedure of Johnstone and report Silverman. (best?) Sum results of squared for bandwith-related errors in n = 1000 observations. constant 1.15 Johnstone Reportedand entries Silverman are based on (2004): 1000, 100, 100, replications, 50 and 100 replications, (only respectively. their winner reported here, J-S 18 methods Min) 36
47 37 A new lineup BL DL(1/n) DL(1/2) HS EBMW EBB EBKM oracle Bhattacharya, Pati, Pillai, Dunson (2012): Bayesian shrinkage BL: Bayesian Lasso DL: Dirichlet-Laplace priors (with different strengths) HS: Carvalho, Polson, and Scott (2009) horseshoe priors EBMW: asympt. minimax EB of Martin and Walker (2013) elsewhere: Castillo & van der Vaart (2012) posterior concentration
48 38 Comments (Conclusions?) both approaches typically outperform other methods Kiefer-Wolfowitz empirical prior typically outperforms monotone empirical Bayes (for the examples we considered!) both methods adapt to general P, in particular to those with multiple modes however, Kiefer-Wolfowitz empirical prior is more flexible: (much) better adapts to certain peculiarities vital in practical data analysis, like unequal σ i, inclusion of covariates, etc in particular, it also exhibits certain independence of the choice of the loss function (the estimate of the prior, and hence posterior is always the same) but, in certain situations Kiefer-Wolfowitz (on the grid!) may be more computationally demanding
49 39 NBA data again player n prop k ast sigma ebkw jsmm glmm lmer 1 Yao Frye Camby Okur Blount Mihm Ilgauskas Brown Curry Miller Haywood Olowokandi Mourning Wallace Ostertag
50 A (partial) picture nba$prop p kw glmm lmer js nba$prop 40
51 Mixing distribution ( empirical prior )
52 Mixing distribution for glmm nba.glmm$p pgr 42
53 The auto insurance predictions acd.mon$robbmon acd$rate 43
54 44
55 45 That s it? What if P is unimodal? Cannot we do better in such a case?
56 45 That s it? What if P is unimodal? Cannot we do better in such a case? And if we can, will it be (significantly) better than James-Stein?
57 45 That s it? What if P is unimodal? Cannot we do better in such a case? And if we can, will it be (significantly) better than James-Stein? Joint work with Mu Lin
58 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context)
59 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context) However, the resulting problem is not convex!
60 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context) However, the resulting problem is not convex! Nevertheless, given that: log-concavity of P + that of ϕ implies that of the convolution g(y) = ϕ(y µ)dp(µ)
61 46 OK, so just impose unimodality on P or more precisely, constrain P to be log-concave (or q-convex) (unimodality does not work well in this context) However, the resulting problem is not convex! Nevertheless, given that: log-concavity of P + that of ϕ implies that of the convolution g(y) = ϕ(y µ)dp(µ) one can impose log-concavity on the mixture! (So that the resulting formulation then a convex problem is.)
62 47 3. Unimodal Kiefer-Wolfowitz g min P! g = i ( ) log ϕ(y i u) dp(u) (Works, but needs a special version of Mosek) May be demanding for large sample sizes
63 47 3. Unimodal Kiefer-Wolfowitz g min P! g = i ( ) log ϕ(y i u) dp(u) and g convex (Works, but needs a special version of Mosek) May be demanding for large sample sizes
64 47 3. Unimodal Kiefer-Wolfowitz g min g,p! g i ( ) log ϕ(y i u) dp(u) and g convex (Works, but needs a special version of Mosek) May be demanding for large sample sizes
65 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! h 1 2 y2 + h(y) convex
66 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! 1 + h (y) > 0 h
67 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! h (y) > 1 h
68 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! 0 > h (y) > 1 h
69 48 4. Unimodal monotone empirical Bayes 1 2 y2 + h(y) convex h(y) concave n i=1 h(x i ) + e h dx min! 0 > h (y) > 1 h Very easy, very fast
70 49 A typical result, again from U(5, 15) Mixture density Prediction rule y y (Empirical prior, mixture unimodal)
71 50 A typical result, again from U(5, 15) Mixture density Prediction rule y y (Empirical prediction rule, mixture unimodal)
72 51 Some simulations Sum of squared errors, averaged over replications, rounded U[5, 15] t 3 χ br kw brlc kwlc mle js oracle Last four: the mixtures of Johnstone and Silverman (2004): n = 1000 observations, with 5% or 50% of µ equal to 2 or 5 and the remaining ones are 0
73 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate
74 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay
75 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay - unimodal Kiefer-Wolfowitz still appears to outperform the unimodal monotonized empirical Bayes by small margin
76 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay - unimodal Kiefer-Wolfowitz still appears to outperform the unimodal monotonized empirical Bayes by small margin - and both outperform James-Stein, significantly for asymmetric mixing distribution
77 52 Conclusions II - when the mixing (and then the mixture) distribution is unimodal, it pays to enforce this shape constraint for the estimate - if it is not, then it does not pay - unimodal Kiefer-Wolfowitz still appears to outperform the unimodal monotonized empirical Bayes by small margin - and both outperform James-Stein, significantly for asymmetric mixing distribution - computationally, unimodal monotonized empirical Bayes is much more painless than unimodal Kiefer-Wolfowitz
Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective
Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign University of Tokyo: 25 November 2013 Joint work with Jiaying Gu (UIUC)
More informationSHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES
SHAPE CONSTRAINTS, COMPOUND DECISIONS AND EMPIRICAL BAYES RULES ROGER KOENKER AND IVAN MIZERA Abstract. A shape constrained maximum likelihood variant of the kernel based empirical Bayes rule proposed
More informationUnobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective
Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign Johns Hopkins: 28 October 2015 Joint work with Jiaying Gu (Toronto) and
More informationCONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES
CONVEX OPTIMIATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES ROGER KOENKER AND IVAN MIERA Abstract. Estimation of mixture densities for the classical Gaussian compound decision
More informationRegularization prescriptions and convex duality: density estimation and Renyi entropies
Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz,
More informationLecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationEmpirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationEMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES
Statistica Sinica 23 (2013), 333-357 doi:http://dx.doi.org/10.5705/ss.2011.071 EMPIRICAL BAYES IN THE PRESENCE OF EXPLANATORY VARIABLES Noam Cohen 1, Eitan Greenshtein 1 and Ya acov Ritov 2 1 Israeli CBS
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationNonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood
Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November
More informationLocal Polynomial Wavelet Regression with Missing at Random
Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationPenalty Methods for Bivariate Smoothing and Chicago Land Values
Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001
More informationL. Brown. Statistics Department, Wharton School University of Pennsylvania
Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationNonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood
Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,
More informationTweedie s Formula and Selection Bias. Bradley Efron Stanford University
Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationAsymptotic efficiency of simple decisions for the compound decision problem
Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com
More informationThe Poisson Compound Decision Problem Revisited
The Poisson Compound Decision Problem Revisited L. Brown, University of Pennsylvania; e-mail: lbrown@wharton.upenn.edu E. Greenshtein Israel Census Bureau of Statistics; e-mail: eitan.greenshtein@gmail.com
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationTotal Variation Regularization for Bivariate Density Estimation
Total Variation Regularization for Bivariate Density Estimation Roger Koenker University of Illinois, Urbana-Champaign Oberwolfach: November, 2006 Joint work with Ivan Mizera, University of Alberta Roger
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationLatent factor density regression models
Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationAugmented Likelihood Estimators for Mixture Models
Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich What is mixture degeneracy? mixtures under study are finite convex
More informationHorseshoe, Lasso and Related Shrinkage Methods
Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationSupplementary Information for False Discovery Rates, A New Deal
Supplementary Information for False Discovery Rates, A New Deal Matthew Stephens S.1 Model Embellishment Details This section details some of the model embellishments only briefly mentioned in the paper.
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationA Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints
Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationLasso & Bayesian Lasso
Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE
Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationTesting for homogeneity in mixture models
Testing for homogeneity in mixture models Jiaying Gu Roger Koenker Stanislav Volgushev The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP09/13 TESTING FOR HOMOGENEITY
More informationTopics in empirical Bayesian analysis
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2016 Topics in empirical Bayesian analysis Robert Christian Foster Iowa State University Follow this and additional
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationUncertain Inference and Artificial Intelligence
March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini
More informationarxiv: v3 [stat.me] 10 Aug 2018
Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,
More informationNew Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER
New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationHigh-dimensional classification via nonparametric empirical Bayes and maximum likelihood
High-dimensional classification via nonparametric empirical Bayes and maximum likelihood Lee H. Dicker Sihai D. Zhao Department of Statistics and Biostatistics Rutgers University Piscataway, NJ 08854 e-mail:
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationSmall Area Modeling of County Estimates for Corn and Soybean Yields in the US
Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationEmpirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm
Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier
More informationPractice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:
Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation
More informationConstrained estimation for binary and survival data
Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline
More informationOn a Problem of Robbins
doi:10.1111/insr.12098 On a Problem of Robbins Jiaying Gu and Roger Koenker Department of Economics, University of Illinois, Urbana, IL 61801, US E-mail: rkoenker@uiuc.edu Summary An early example of a
More informationMaximum likelihood estimation of a log-concave density based on censored data
Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen
More informationPackage horseshoe. November 8, 2016
Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationLECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?
LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position
More informationDiscussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance
Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationForecasting with Dynamic Panel Data Models
Forecasting with Dynamic Panel Data Models Laura Liu 1 Hyungsik Roger Moon 2 Frank Schorfheide 3 1 University of Pennsylvania 2 University of Southern California 3 University of Pennsylvania, CEPR, and
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationVerifying Regularity Conditions for Logit-Normal GLMM
Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal
More informationTransformations and Bayesian Density Estimation
Transformations and Bayesian Density Estimation Andrew Bean 1, Steve MacEachern, Xinyi Xu The Ohio State University 10 th Conference on Bayesian Nonparametrics June 25, 2015 1 bean.243@osu.edu Transformations
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationarxiv: v1 [math.st] 13 Feb 2017
Adaptive posterior contraction rates for the horseshoe arxiv:72.3698v [math.st] 3 Feb 27 Stéphanie van der Pas,, Botond Szabó,,, and Aad van der Vaart, Leiden University and Budapest University of Technology
More information