The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods
|
|
- Griselda Shepherd
- 5 years ago
- Views:
Transcription
1 1/25 The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods Nils Lid Hjort (with Ingrid Van Keilegom and Ian McKeague) Department of Mathematics, University of Oslo Building Bridges (at Bislett) May 2017, Teknologihuset
2 2/25 Bridging parametrics and nonparametrics Building Bridges, connecting parametrics and nonparametrics: Here I discuss one such bridge operation (nonparametric extension of a parametric likelihood, or parametric focus for a nonparametric method). I use FIC tools (Focused Information Criteria) for fine-tuning and for selecting ingredients for the bridge.
3 delta method giving n( ψml ψ) d N(0, κ 2 ), Theme: Combining parametrics with nonparametrics Suppose y 1,..., y n i.i.d. f, with inference needed for focus parameter ψ = ψ(f ). Parametric likelihood approach (perfect if model is perfect): Fit f to {f θ : θ Θ} via maximum likelihood, θ ML maximising log-likelihood l n (θ) = log L n (θ). Then n( θml θ) d N p (0, J(θ) 1 ), with κ 2 = c t J(θ) 1 c and c = ψ(θ)/ θ. Also: Wilks theorem, χ 2 1. Nonparametric likelihood approach (no conditions needed): Identify ψ via E f m(y, ψ) = 0. The empirical likelihood R n (ψ) is the maximum of n i=1 (nw i) under n i=1 w i = 1, n i=1 w im(y i, ψ) = 0, each w i > 0. Then 2 log R n (ψ) d χ /25
4 How to combine parametric and empirical likelihood? Main idea (with details and variations and applications to come): Decide on control parameters µ = (µ 1,..., µ q ), identified via E m j (Y, µ) = 0 for j = 1,..., q; put the parametric model through the EL, giving R n (µ(θ)); and form H n (θ) = L n (θ) 1 a R n (µ(θ)) a. I will show that the hybrid likelihood estimator θ HL maximising h n (θ) = (1 a)l n (θ) + a log R n (µ(θ)), along with focus parameter estimator ψ HL = ψ(f (, θ HL )), have good properties. I ll invent FIC type schemes to assist in selecting balance parameter a in [0, 1] and the control parameters µ 1,..., µ q. 4/25
5 5/25 Plan General setup (so far for i.i.d., extensions later): With working model f (y, θ), leading to log-likelihood l n (θ), and control parameters µ: h n (θ) = (1 a)l n (θ) + a log R n (µ(θ)). A Examples B Basics for the EL C Theory: under the model D Theory: outside the model E Fine-tuning the balance parameter a F Choosing the control parameters µ 1,..., µ q G Concluding remarks (and questions)
6 6/25 A: Examples Example 1. Let f θ be the normal (ξ, σ 2 ), and use m j (y, µ j ) = I {y µ j } j/4 for j = 1, 2, 3. Then HL means estimating (ξ, σ) factoring in that the three quartiles ought to be estimated well too. Example 2. Let f θ be the Beta with parameters (b, c). ML means moment matching for log y i and log(1 y i ). Add to these functions m 1 (y, µ 1 ) = y µ 1 and m 2 (y, µ 2 ) = y 2 µ 2. Then HL is Beta fitting with getting mean and variance not far from E Beta Y = b b + c and Var Beta Y = 1 b c b + c + 1 b + c b + c.
7 7/25 Example 3. Consider f (y, θ) = θy θ 1 on (0, 1). The log-likelihood is n{log θ (θ 1)Z n }, with Z n = (1/n) n i=1 log(1/y i), and θ ML = 1/Z n. Then put the EL for the mean µ through the model, yielding R n (µ(θ)) with µ(θ) = θ/(θ + 1). This is HL with a = 1 2 : logl, logel, loghl θ
8 8/25 Example 4. I use Newcomb s 1889 speed of light data, with n = 66 and two grand outliers at 44 and 2. True value is ML (a = 0): bad estimates (26.21, 10.75). Removing outliers: (27.75, 5.08). I now use HL with histogram associated control parameters, with k = 6 cells (, 10.5], (10.5, 20.5], (20.5, 25.5], (25.5, 30.5], (30.5, 35.5], (35.5, ). The HL, with a = 0.50: (28.23, 6.37). a = 1: Close to minimum chi-squared.
9 9/25 kernel estimate and two fitted normals y
10 10/25 Two (related) viewpoints Which µ 1,..., µ q should I use in (1 a)l n (θ) + a log R n (µ(θ))? Robustify a parametric model, and/or helping to focus the nonparametric method? Viewpoint One (focused robustness): Using control parameters to help the parametric fit do well for these too. For the normal (ξ, σ 2 ), I might want not only mean and standard deviation to be ok, but also ξ σ, ξ σ to reasonably match quartiles F 1 n ( 1 4 ), F 1 n ( 3 4 ). Viewpoint Two (with focus parameter): I wish the fitted model to give a particularly good estimate of ψ = ψ(f ) via ψ HL = ψ(f (, θ HL )). Then I use the HL with p + 1 parameters, the working model plus my focus ψ. For the normal, I may put in m(y, µ) = I {y µ} 3/4, and use ξ HL σ HL to estimate F 1 ( 3 4 ).
11 B: Empirical Likelihood (1-page course) For q-vectors m 1,..., m n, consider { n n Λ n = max (nw i ): w i = 1, Let G n (λ) = i=1 i=1 n i=1 } w i m i = 0, each w i > 0. n 2 log(1 + λ t m i / n) and Gn (λ) = 2λ t V n λ t W n λ, i=1 where V n = n 1/2 n i=1 m i and W n = n 1 n i=1 m im t i. First: 2 log Λ n = max G n = G n ( λ). Second: With the m i random; eigenvalues of W n away from zero and infinity; n 1/2 max i n m i pr 0; V n bounded in probability: then G n G n where it matters, and 2 log Λ n = V t nw 1 n V n + o pr (1). This machinery is then used with m i = m(y i, µ(θ)). 11/25
12 C: Theory: under the model First aim: working out how the HL behaves under model conditions (it will lose some to ML there, but how much?). With and θ 0 the true value, define h n (θ) = (1 a)l n (θ) + a log R n (µ(θ)), A n (s) = h n (θ 0 + s/ n) h n (θ 0 ) = (1 a){l n (θ 0 + s/ n) l n (θ 0 )} +a{log R n (µ(θ 0 + s/ n)) log R n (µ(θ 0 ))}. Understanding behaviour of A n = understanding behaviour of θ HL (et al.). Under mild conditions, with u(y, θ) the score function: ( ) ( Un,0 n 1/2 n = i=1 u(y ) i, θ 0 ) V n,0 n 1/2 n i=1 m(y i, µ(θ 0 )) ( ) ( ) U0 J C d N V p+q (0, 0 C t ). W Note that J = J fish is the information matrix of the working model. 12/25
13 13/25 Lemma: there is a well-defined and well-behaved limiting quadratic process: where A n (s) = h n (θ 0 + s/ n) h n (θ 0 ) d A(s) = s t U 1 2 st J s, U = (1 a)u 0 aξ t 0W 1 V 0, J = (1 a)j + aξ t 0W 1 ξ 0. Here ξ 0 = E m(y, µ(θ 0 ))/ θ. Also, U N p (0, K ) with K = (1 a) 2 J + a 2 ξ t 0W 1 ξ 0 a(1 a)(cw 1 ξ 0 + ξ t 0W 1 C t ). The most important aspects of how θ HL behaves can be read off from A n (s) d A(s).
14 14/25 Fact 1 [using argmax(a n ) d argmax(a)]: n( θhl θ 0 ) d Λ = (J ) 1 U N p (0, (J ) 1 K (J ) 1 ). Fact 2 [using max A n d max A]: Z n (θ 0 ) = 2{h n ( θ HL ) h n (θ 0 )} d Z = (U ) t (J ) 1 U. Fact 3 [applying the delta method]: With ψ = φ(θ) and ψ HL = φ( θ HL ), and ψ 0 = φ(θ 0 ) at true value, n( ψhl ψ 0 ) d c t Λ N(0, κ 2 ), with κ 2 = c t (J ) 1 K (J ) 1 c, and c = φ(θ 0 )/ θ. May then compute (J ) 1 K (J ) 1 and compare to J 1 fish. Result: The HL loses rather little compared to the ML, under model conditions, if say a 0.15: (J ) 1 K (J ) 1 = J 1 fish + O(a2 ).
15 15/25 D: Theory: outside the model Results so far: behaviour of θ HL and consequent ψ HL well understood under parametric model conditions, where they lose a little but not much compared to ML. Will now show (though a bigger machinery and more efforts are required) that HL is (often) better than ML just outside the parametric model. Framework: extend f (y, θ) model (with dim(θ) = p) to a bigger f (y, θ, γ) model (with dim(γ) = r), and such that γ = γ 0 corresponds to the start model; f (y, θ, γ 0 ) = f (y, θ). Local neighbourhood model framework: f true (y) = f (y, θ 0, γ 0 + δ/ n). Thus ψ true = ψ(θ 0, γ 0 + δ/ n), etc.
16 16/25 Under f (y, θ 0, γ 0 + δ/ n), suppose an estimation strategy θ has the property n( θ θ0 ) d N p (Bδ, Ω), for appropriate B (p r matrix, related to how the model bias affects the estimator) and Ω. For ψ = ψ(f ) = ψ(θ, γ), may use ψ = ψ( θ, γ 0 ). Then analysis leads to n( ψ ψtrue ) d N(b t δ, τ 2 ), with b = B t ψ θ ψ γ and τ 2 = ( ψ θ )t Ω ψ θ with derivatives at narrow model (θ 0, γ 0 ). Hence limit mean squared error is mse(δ) = (b t δ) 2 + τ 2. Next: Examining estimation strategies ML and HL, to find B and Ω, and hence the mse(δ). For ML: as in Hjort and Claeskens (2003); for HL: new.
17 17/25 The story for the ML: Essentially from Hjort and Claeskens (2003), Claeskens and Hjort (2008). Need the (p + r) (p + r) Fisher information matrix ( ) J00 J J wide = 01 J 10 J 11 at the narrow model. From this (via various efforts): n( θml θ 0 ) d N p (J 1 00 J 01δ, J 1 00 ). This implies n( ψml ψ true ) d N(ω t δ, τ 2 0 ) with Hence we know ω = J 10 J00 1 ψ θ ψ γ and τ0 2 = ( ψ mse(δ) = (ω t δ) 2 + τ 2 0 θ )t J00 1 ψ θ. and should compare this with what we may find for the HL.
18 The story for the HL: For S(y) = log f (y, θ 0, γ 0 )/ γ, let of dimension q r, along with K 01 = E m(y, µ(θ 0 ))S(Y ) L 01 = (1 a)j 01 a( ψ θ )t W 1 K 01. Then (via various efforts): n( θhl θ 0 ) d N p (Bδ, Ω) with B = (J ) 1 L 01 and Ω = (J ) 1 K (J ) 1. This yields n( ψhl ψ true ) d N(ω t HLδ, τ 2 0,HL) with ω HL = ω HL,a = L 10 (J 1 ψ ) θ ψ γ, τ0,hl 2 = τ0,hl,a 2 = ( ψ θ )t (J ) 1 K (J 1 ψ ) θ. Here J, K, L 10 depend on the balance parameter a. 18/25
19 May then compare mse ML (δ) = (ω t δ) 2 + τ0 2, mse HL,a (δ) = (ωhl,aδ) t 2 + τ0,hl,a, 2 in different special setups. root mse for HL and ML a 19/25
20 E: Fine-tuning the balance parameter The precision of ψ HL for estimating ψ true depends on the underlying truth and on the balance parameter a. In the f (y, θ 0, γ 0 + δ/ n) framework, the best balance a is the minimiser of Here risk(a) = mse HL,a (δ) = (ω t HL,aδ) 2 + τ 2 0,HL,a. ω HL,a = L 10,a (Ja 1 ψ ) θ ψ γ, τ0,hl,a 2 = ( ψ θ )t (Ja ) 1 Ka (Ja 1 ψ ) θ. may be estimated consistently from data, with δ less visible: D n = n( γ ML γ 0 ) d N r (δ, Q), with Q = J 11 from J 1 wide. 20/25
21 Since D n = n( γ ML γ 0 ) d N r (δ, Q), D n Dn t overestimates δδ t, and E (c t D n ) 2. = (c t δ) 2 + c t Qc. Hence we estimate the squared bias in the FIC way, using sqb = (ω t HL,aδ) 2 ŝqb = max{( ω HL,aD t n ) 2 ω HL,a t Q ω HL,a, 0} { n{ ω HL,a t = ( γ ML γ 0 )} 2 ω HL,a t Q ω HL,a if nonnegative, 0 if else. This leads to risk(a) = ( ψ θ )t (Ĵ a ) 1 K a (Ĵ a ) 1 ψ θ + ŝqb. Via this FIC scheme we select balance parameter a as the minimiser of risk(a). 21/25
22 Example: n = 100 data points on (0, 1), fitted to f (y, θ) = θy θ 1, with control parameter (now equal to the focus parameter) µ = E Y 2. FIC plot for selecting a in the HL estimation strategy: root fic(a) Note: Sometimes a = 0 is best, i.e. choose ML. a 22/25
23 23/25 F: Choosing the control parameters The general hybrid likelihood estimation method is via constructing h n (θ) = (1 a)l n (θ) + a log R n (µ(θ)), which starts with choosing control parameters µ 1,..., µ q. These aim at fitting models such that certain issues are well calibrated outside those taken care of by the ML, which concentrates on the score functions u 1 (y, θ),..., u p (y, θ). Can choose m(y, µ) = g(y) µ to make sure that the HL incorporates aspects of µ = E g(y i ). Favourite case: For a given focus parameter ψ = ψ(f ), use this as the single control parameter. For a given focus parameter ψ = ψ(f ), may also select among candidate µ j controls via FIC schemes. May stretch the idea, including a slowly increasing sequence of µ 1, µ 2,..., with a FIC (or AFIC) stopping criterion.
24 24/25 G: Concluding remarks (and questions) A. The methodology works for multidimensional data y i, and can be extended to regression settings. B. We fine-tune the balance parameter a by minimising the curve risk(a) over [0, 1]. If the model gives a good fit, risk(a) is minimal at a = 0, and we use the ML, after all. This is also an implied goodness-of-fit test. C. So far: large-sample approximation framework and methodology, with fixed p (dimension of θ), q (number of control parameters), r (number of extra γ j model extension parameters). It is of interest to let these grow with n but more difficult mathematically.
25 D. Instead of h n (θ) = (1 a)l n (θ) + a log R n (µ(θ)), may work with a certain cousin operation, with hn (θ) = (1 a)l n (θ) + a log R n (µ(θ)), log R n (µ(θ)) = 1 2 V n(µ(θ)) t W n (µ(θ)) 1 V n (µ(θ)), i.e. using the quadratic approximation to the EL rather than using the EL itself. This is (a) easier, numerically and mathematically; (b) equivalent, up to O(1/ n) neighbourhoods; (c) valid more generally. This gives (with some efforts) a minimum divergence method based on with d(f, f θ ) = (1 a)kl(f, f θ ) + ad Q (f, f θ ), d Q (f, f θ ) = 1 2 (µ θ µ 0 ) t Ω 1 0 (µ θ µ 0 ) 1 + (µ θ µ 0 ) t Ω 1 0 (µ θ µ 0 ). 25/25
Hybrid combinations of parametric and empirical likelihoods
Hybrid combinations of parametric and empirical likelihoods Nils Lid Hjort 1, Ian W. McKeague 2, and Ingrid Van Keilegom 3 University of Oslo, Columbia University, and KU Leuven Abstract: This paper develops
More informationHybrid combinations of parametric and empirical likelihoods
Hybrid combinations of parametric and empirical likelihoods Nils Lid Hjort 1, Ian W. McKeague 2, and Ingrid Van Keilegom 3 University of Oslo, Columbia University, and KU Leuven Abstract: This paper develops
More informationFocuStat, FIC themes, this workshop... with more to come
1/14 FocuStat, FIC themes, this workshop... with more to come Nils Lid Hjort Department of Mathematics, University of Oslo FICology, May 2016, Oslo 2/14 Outline A FocuStat (2014-2018) B FIC: a mini-history
More informationFocused Information Criteria for Time Series
Focused Information Criteria for Time Series Gudmund Horn Hermansen with Nils Lid Hjort University of Oslo May 10, 2015 1/22 Do we really need more model selection criteria? There is already a wide range
More informationHYBRID COMBINATIONS OF PARAMETRIC AND EMPIRICAL LIKELIHOODS
Statistica Sinica 28 (2018), 2389-2407 doi:https://doi.org/10.5705/ss.202017.0291 HYBRID COMBINATIONS OF PARAMETRIC AND EMPIRICAL LIKELIHOODS Nils Lid Hjort, Ian W. McKeague and Ingrid Van Keilegom University
More informationPARAMETRIC OR NONPARAMETRIC: THE FIC APPROACH
Statistica Sinica: Supplement PARAMETRIC OR NONPARAMETRIC: THE FIC APPROACH Martin Jullum and Nils Lid Hjort Department of Mathematics, University of Oslo Supplementary Material This is a supplement to
More informationThe loss function and estimating equations
Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is
More informationEco517 Fall 2014 C. Sims MIDTERM EXAM
Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood
ICSA Applied Statistics Symposium 1 Balanced adjusted empirical likelihood Art B. Owen Stanford University Sarah Emerson Oregon State University ICSA Applied Statistics Symposium 2 Empirical likelihood
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationHigh Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data
High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department
More informationStatistica Sinica Preprint No: SS R3
Statistica Sinica Preprint No: SS-2015-0364R3 Title Parametric or nonparametric: The FIC approach Manuscript ID SS-2015-0364 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202015.0364 Complete
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationSize Distortion and Modi cation of Classical Vuong Tests
Size Distortion and Modi cation of Classical Vuong Tests Xiaoxia Shi University of Wisconsin at Madison March 2011 X. Shi (UW-Mdsn) H 0 : LR = 0 IUPUI 1 / 30 Vuong Test (Vuong, 1989) Data fx i g n i=1.
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationSize and Shape of Confidence Regions from Extended Empirical Likelihood Tests
Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationThe Adequate Bootstrap
The Adequate Bootstrap arxiv:1608.05913v1 [stat.me] 21 Aug 2016 Toby Kenney Department of Mathematics and Statistics, Dalhousie University and Hong Gu Department of Mathematics and Statistics, Dalhousie
More informationChoice is Suffering: A Focused Information Criterion for Model Selection
Choice is Suffering: A Focused Information Criterion for Model Selection Peter Behl, Holger Dette, Ruhr University Bochum, Manuel Frondel, Ruhr University Bochum and RWI, Harald Tauchmann, RWI Abstract.
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationTheory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk
Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments
More informationarxiv: v2 [stat.me] 15 Jan 2018
Robust Inference under the Beta Regression Model with Application to Health Care Studies arxiv:1705.01449v2 [stat.me] 15 Jan 2018 Abhik Ghosh Indian Statistical Institute, Kolkata, India abhianik@gmail.com
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationRobust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach
Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach Federico Belotti Giuseppe Ilardi CEIS, University of Rome Tor Vergata Bank of Italy Workshop on the Econometrics and Statistics
More informationStatistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )
Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv:1809.05778) Workshop on Advanced Statistics for Physics Discovery aspd.stat.unipd.it Department of Statistical Sciences, University of
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationZ-estimators (generalized method of moments)
Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationEmpirical Likelihood Tests for High-dimensional Data
Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on
More informationEXTENDING THE SCOPE OF EMPIRICAL LIKELIHOOD
The Annals of Statistics 2009, Vol. 37, No. 3, 1079 1111 DOI: 10.1214/07-AOS555 Institute of Mathematical Statistics, 2009 EXTENDING THE SCOPE OF EMPIRICAL LIKELIHOOD BY NILS LID HJORT, IAN W. MCKEAGUE
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationSanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore
AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationTime Series Analysis Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 4.384 Time Series Analysis Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Indirect Inference 4.384 Time
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationEstimation of the Bivariate and Marginal Distributions with Censored Data
Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new
More informationASSESSING A VECTOR PARAMETER
SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;
More informationWhy (Bayesian) inference makes (Differential) privacy easy
Why (Bayesian) inference makes (Differential) privacy easy Christos Dimitrakakis 1 Blaine Nelson 2,3 Aikaterini Mitrokotsa 1 Benjamin Rubinstein 4 Zuhe Zhang 4 1 Chalmers university of Technology 2 University
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationAN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY
Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationInvestigation of goodness-of-fit test statistic distributions by random censored samples
d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit
More informationarxiv: v4 [math.st] 16 Nov 2015
Influence Analysis of Robust Wald-type Tests Abhik Ghosh 1, Abhijit Mandal 2, Nirian Martín 3, Leandro Pardo 3 1 Indian Statistical Institute, Kolkata, India 2 Univesity of Minnesota, Minneapolis, USA
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More information10.4 Comparison Tests
0.4 Comparison Tests The Statement Theorem Let a n be a series with no negative terms. (a) a n converges if there is a convergent series c n with a n c n n > N, N Z (b) a n diverges if there is a divergent
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationNon-Stationary Time Series and Unit Root Testing
Econometrics II Non-Stationary Time Series and Unit Root Testing Morten Nyboe Tabor Course Outline: Non-Stationary Time Series and Unit Root Testing 1 Stationarity and Deviation from Stationarity Trend-Stationarity
More informationNormalising constants and maximum likelihood inference
Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationEstimation for two-phase designs: semiparametric models and Z theorems
Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for
More informationDecentralized Clustering based on Robust Estimation and Hypothesis Testing
1 Decentralized Clustering based on Robust Estimation and Hypothesis Testing Dominique Pastor, Elsa Dupraz, François-Xavier Socheleau IMT Atlantique, Lab-STICC, Univ. Bretagne Loire, Brest, France arxiv:1706.03537v1
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationFocused fine-tuning of ridge regression
Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationDSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.
DSGE Methods Estimation of DSGE models: GMM and Indirect Inference Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@wiwi.uni-muenster.de Summer
More informationExtending the two-sample empirical likelihood method
Extending the two-sample empirical likelihood method Jānis Valeinis University of Latvia Edmunds Cers University of Latvia August 28, 2011 Abstract In this paper we establish the empirical likelihood method
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationQuantile Regression for Residual Life and Empirical Likelihood
Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationOn Testing for Informative Selection in Survey Sampling 1. (plus Some Estimation) Jay Breidt Colorado State University
On Testing for Informative Selection in Survey Sampling 1 (plus Some Estimation) Jay Breidt Colorado State University Survey Methods and their Use in Related Fields Neuchâtel, Switzerland August 23, 2018
More informationCombining diverse information sources with the II-CC-FF paradigm
Combining diverse information sources with the II-CC-FF paradigm Céline Cunen (joint work with Nils Lid Hjort) University of Oslo 18/07/2017 1/25 The Problem - Combination of information We have independent
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationEmpirical likelihood-based methods for the difference of two trimmed means
Empirical likelihood-based methods for the difference of two trimmed means 24.09.2012. Latvijas Universitate Contents 1 Introduction 2 Trimmed mean 3 Empirical likelihood 4 Empirical likelihood for the
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationGMM - Generalized method of moments
GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationChapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer
Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationLikelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona
Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationOn robust and efficient estimation of the center of. Symmetry.
On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract
More informationTime Series. Anthony Davison. c
Series Anthony Davison c 2008 http://stat.epfl.ch Periodogram 76 Motivation............................................................ 77 Lutenizing hormone data..................................................
More informationMath 533 Extra Hour Material
Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared
More informationLeast Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006
Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging
More informationDensity Estimation. We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics)
Density Estimation Density Estimation: Deals with the problem of estimating probability density functions (PDFs) based on some data sampled from the PDF. May use assumed forms of the distribution, parameterized
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationThe Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series
The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient
More informationSTAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.
STAB57: Quiz-1 Tutorial 1 1. random variable X has a continuous distribution for which the p.d.f. is as follows: { kx 2.5 0 < x < 1 f(x) = 0 otherwise where k > 0 is a constant. (a) (4 points) Determine
More information