Semiparametric Regression Based on Multiple Sources

Size: px
Start display at page:

Download "Semiparametric Regression Based on Multiple Sources"

Transcription

1 Semiparametric Regression Based on Multiple Sources Benjamin Kedem Department of Mathematics & Inst. for Systems Research University of Maryland, College Park Give me a place to stand and rest my lever on, and I can move the Earth, (Archimedes, B.C.) AISC, October 5-7, 2012

2 1 Cheng, K.F., and Chu, C.K. (2004), Semiparametric density estimation under a two-sample density ratio model, Bernoulli, 10(4), Fokianos, K. (2004). Merging information for semiparametric density estimation. Journal of the Royal Statistical Society, Series B, 66, Fokianos, K., Kedem, B., Qin, J., Short, D.A. (2001). A semiparametric approach to the one-way layout. Technometrics, 43, McGlynn K.A., Sakoda1 L.C., Rubertone M.V., Sesterhenn I.A., Lyu C., Graubard B.I., Erickson R.L. (2007). Body size, dairy consumption, puberty, and risk of testicular germ cell tumors. American Journal of Epidemiology, 165, Qin, J., Zhang, B. (1997). A goodness of fit test for logistic regression models based on case-control data. Biometrika, 84, Qin, J., and Zhang, B. (2005). Density Estimation Under A Two-Sample Semiparametric Model. Nonparametric Statistics, 1, Kedem, B., Kim, E-Y, Voulgaraki, A., and Graubard, B. (2009). Two-Dimensional Semiparametric Density Ratio Modeling of Testicular Germ Cell Data. Statistics in Medicine, 2009, 28, Voulgaraki, A., Kedem, B., and Graubard, B.(2012). "Semiparametric Regression in Testicular Germ Cell Data", Annals of Applied Statistics, 6, Supplement: DOI: /12-AOAS552SUPP

3 Abstract 1 Information is combined from multiple multivariate sources. 2 Density ratio model: Multivariate distributions are regressed" on a reference distribution. 3 Random cocariates. 4 Kernel density estimator from many data sources: more efficient than the traditional single-sample kernel density estimator. 5 Each multivariate distribution and the corresponding conditional expectation (regression) of interest are estimated from the combined data using all sources. 6 Graphical and quantitative diagnostic tools are suggested to assess model validity. 7 The method is applied in quantifying the effect of height and age on weight of germ cell testicular cancer patients. 8 Comparison with multiple regression, GAM, and nonparametric kernel regression.

4 Main Points Density Ratio Examples Some Simulation Results Summary

5 Main Points Density Ratio Examples Some Simulation Results Summary

6 Main Points Density Ratio Examples Some Simulation Results Summary

7 Main Points Density Ratio Examples Some Simulation Results Summary

8 Main Points Density Ratio Examples Some Simulation Results Summary

9 Main Points Density Ratio Examples Some Simulation Results Summary

10 Outline 1 Density Ratio Examples

11 Multiple filtering of a signal f 1 (ω) = H 1 (ω) 2 f (ω).. (1). f q (ω) = H q (ω) 2 f (ω) That is, q distortions or multiple tilting of the same reference spectral density f.

12 Linear system with Gaussian noise X t = αx t 1 + ɛ t, ɛ = (ɛ 1t,..., ɛ qt, ɛ mt ) ɛ jt g j N(0, σ 2 j ), j = 1,..., q, m. Choose g m g. We get many distortions of the same reference g: g 1 (x) = e α 1+β 1 x 2 g(x)... g q (x) = e αq+βqx2 g(x)

13 Analysis of variance Consider the classical one-way ANOVA with m = q + 1 independent normal random samples: x 11,..., x 1n1 g 1 (x)... x q1,..., x qnq g q (x) x m1,..., x mnm g m (x) g j (x) N( µ j, σ 2 ), j = 1,..., m.

14 Then, holding g m (x) g(x) as a reference: g 1 (x) = exp(α 1 + β 1 x)g(x)... g q (x) = exp(α q + β q x)g(x) α j = µ2 m µ 2 j 2 σ 2, β j = µ j µ m σ 2, j = 1,..., q

15 k-parameter exponential families { k } g(x, θ) = d(θ)s(x) exp c i (θ)t i (x) i=1 Let g j (x) g(x, θ j ), g(x) g(x, θ m ). Again, q distortions of a reference g(x): g 1 (x) = exp{α 1 + β 1 h(x)}g(x)... g q (x) = exp{α q + β qh(x)}g(x)

16 Case-control: Multinomial logistic regression (Prentice and Pyke 1979). RV y s.t. P(y = j) = π j, m j=1 π j = 1. Assume: For j = 1,..., m, and any h(x), P(y = j x) = exp( α j + β jh(x)) 1+ q k=1 exp( α k + β kh(x)) Define: f (x y = j) = g j (x), j = 1,..., m Then with α j = α j + log[ π m /π j ], j = 1,..., q, and g m g, g j (x) = exp( α j + β j h(x))g(x), j = 1,..., q.

17 Weighted Distributions Rao(1965) unified the concept of weighted distributions of the form: W (x; α)p(x; θ) p w (x; θ, α) = E[W (X; α)] where W (x; α) is known. This is an example of tilting of a reference distribution. By changing the weight W (x; α) we obtain different distortions of the same reference p(x; θ).

18 Biased sampling Length-biased Sampling: Vardi (1982) introduced F(y) = 1/µ y 0 xdg(x), y 0, where µ = 0 xdg(x) <. This is a tilt model. Biased Sampling/Selection Bias: Vardi (1985), and Gill, Vardi, Wellner (1988) considered the more general biased sampling model F(y) = W (G) 1 y w(x)dg(x), where w(x) is known and W (G) = w(x)dg(x). This is a tilt model. F, G are obtained by NPMLE.

19 Biased sampling Length-biased Sampling: Vardi (1982) introduced F(y) = 1/µ y 0 xdg(x), y 0, where µ = 0 xdg(x) <. This is a tilt model. Biased Sampling/Selection Bias: Vardi (1985), and Gill, Vardi, Wellner (1988) considered the more general biased sampling model F(y) = W (G) 1 y w(x)dg(x), where w(x) is known and W (G) = w(x)dg(x). This is a tilt model. F, G are obtained by NPMLE.

20 Comparison Distributions (Parzen 1977,...,2009) Sequence of cdf s: {F 1,..., F q } G, with cont. densities f 1,..., f q, g. Comparison Distributions are defined as: D j (u; G, F j ) = F j (G 1 (u)), Then by differentiation, with x = G 1 (u): 0 < u < 1, j = 1,..., q f 1 (x) = d(g(x); G, F 1 )g(x) f q (x) = d(g(x); G, F q )g(x)

21 Outline 1 Density Ratio Examples

22 m = q + 1 independent p-dim multivariate samples: x ij = (x ij1, x ij2,..., x ijp ) g i (x 1,..., x p ), i = 1,..., q, m, j = 1,..., n i and x i1, x i2,..., x ini iid gi Reference or baseline probability density function (pdf): g g m (x) g m (x 1,..., x p )

23 Density ratio model: or equivalently g i (x) g(x) = w(x, θ i) (2) g i (x) = w(x, θ i )g(x) (3) 1. g i (x), g(x) are unknown. 2. w is a known positive and continuous function. 3. θ i are unknown d-dimensional parameters. 4. pdf s may be continuous or discrete. 5. Normal assumption is not made.

24 Problem: Use the combined data from all the samples x 11,..., x 1n1,...x m1,..., x mnm of size n = n n m to estimate all the parameters θ i, i = 1,..., q, and all the densities g 1,..., g q and g m = g.

25 G(x) G m (x), p ij = dg(x ij ) = dg m (x ij ). The empirical log-likelihood is given by: l = log L = m n i q n i log(p ij ) + log(w(x ij, θ i )) (4) i=1 j=1 i=1 j=1 subject to the constraints: p ij 0, m n i p ij = 1, i=1 j=1 m n i p ij w(x ij, θ k ) = 1 for k = 1,..., q. i=1 j=1 (5)

26 ˆp ij = 1 n 1 + q k=1 ˆµ k 1 [ w(x ij, ˆθ k ) 1 ], (6) Ĝ(x) = Ĝm(x) = m n i ˆp ij I(x ij x) (7) i=1 j=1 More generally, for l = 1,..., m and w(x ij, ˆθ m ) 1, Ĝ l (x) = m n i ˆp ij w(x ij, ˆθ l )I(x ij x) (8) i=1 j=1

27 Outline 1 Density Ratio Examples

28 Traditional single sample kernel density estimator: ˆf (x) = 1 nh p n n ( ) x xi K i=1 h n (9) h n is a sequence of bandwidths such that: 1. h n 0, nh p n as n. 2. K (x) is defined for p-dimensional x. 3 K (x) 0, symmetric around R p K (x)dx = 1. Under certain conditions, ˆf (x) is a consistent estimator of f (x) (Parzen 1962, Shao 2003).

29 Fokianos (2004), Cheng and Chu (2004), Qin and Zhang (2005), Voulgaraki, Kedem, Graubard (VKG) (2012), use the the probabilities ˆp ij instead of 1/n: ĝ l (x) = 1 h p n m n i ( ) x xij ˆp ij ŵ l (x ij )K i=1 j=1 h n (10)

30 VKG 2012: Assume that K ( ) is a nonnegative bounded symmetric function with K (x)dx = 1, x xk (x)dx = k 2 > 0. Assume that g l is continuous at x and twice differentiable in a neighborhood of x. If, as n, h n = O(n 1 4+p ), then ( nhn p ĝ l (x) g l (x) 1 2 h2 n u 2 g l (x ) ) D x x uk (u)du N(0, σ 2 (x)) where σ 2 (x) = w l(x)g l (x) m k=1 ζ kw k (x) K 2 (u)du for any fixed x.

31 The mean integrated square error (MISE) is defined as: ( ĝl MISE(ĝ l (x)) = E (x) g l (x) ) 2 dx (11) Minimizing the MISE with respect to h n, the optimal bandwidth is: hn (p/n) w l (x)g l (x)/ [ m k=1 = ζ kw k (x) ] dx K 2 (u)du ( ) 2 u ( 2 g l (x)/ x x )uk (u)du dx 1 4+p (12)

32 For sufficient large n, an alternative way to find h n is by minimizing h p n n n i=1 i =1 ˆp(t i )ŵ l (t i )ˆp(t i )ŵ l (t i ) K (z)k ( z + t i t i h n 2 ( xli x lj n l (n l 1)hn p K h n i j ) dz ). (13)

33 VKG 2012: If ˆf is the classical single-sample multivariate kernel density estimator of g l, then As n, h n 0 and nh p n, (a) AMISE(ĝ l ) AMISE(ˆf ) (b) Using optimal bandwidths, the proposed semiparametric density estimator ĝ l (x) is more efficient than ˆf (x), i.e for every l eff (ˆf, ĝ l ) AMISE (ĝ l ) AMISE (ˆf ) 1 where AMISE is the optimal AMISE.

34 Diagnostic Plots and Goodness of Fit Define ( ) } k Rα,k { 2 = 1 exp xα n i x α (14) n i : Sample size of ith sample. x α: The number of times the estimated semiparametric cdf falls in the estimated 1 α confidence interval obtained from the corresponding empirical cdf, both evaluated at the sample points. k > 0 abd α chosen by the user. R 2 α,k takes values between 0 and 1. Computing R 2 α,k is both simple and fast.

35 Qin and Zhang (1997): R 2 3 = exp( n max G i Ĝi ) (15) Clearly, R 2 3 takes values between 0 and 1.

36 Simulation Results: Ĝi vs. G i, i = 1, 2 Simulation ( 1: g ) 1 N((0, 0), Σ)) (case), g 2 N((0, 0), Σ)) (control), 4 2 Σ =, n = 40, n 2 = 30. Simulation 1: G1 Simulation 1: G2 Joint Sem. CDF case Joint Sem. ref. CDF control Joint Benjamin Empcdf case Kedem AISC: October Joint Empcdf 5-7, control 2012

37 Simulation ( 2: g ) 1 N((0, 0), Σ)) (case), g 2 N((1, 1), Σ)) (control) with 3 1 Σ =, n = 200, n 2 = 200. Simulation 2: G1 Simulation 2: G2 Joint Sem. CDF case Joint Sem. ref. CDF control Joint Empcdf case Joint Empcdf control

38 Simulation 3: g 1 from standard two dimensional Multivariate Cauchy (case) ( and g from two dimensional Multivariate Cauchy (control) with µ = (1, 1), V = 5 10 n 1 = 200, n 2 = 200. ), Simulation 3: G1 Simulation 3: G2 Joint Sem. CDF case Joint Sem. ref. CDF control Joint Empcdf case Joint Empcdf control

39 Simulation 4: g 1 from standard two dimensional Multivariate Cauchy (case) and g 2 from uniform distribution on the triangle (0, 0), (6, 0), ( 3, 4) (control), and n 1 = 200, n 2 = 200. Simulation 4: G1 Simulation 4: G2 Joint Sem. CDF case Joint Sem. ref. CDF control Joint Empcdf case Joint Empcdf control

40 Table: Comparison of R3 2 and R2.05,2 for 100 repetitions of case and control. Simulation Group R3 2 R.05,2 2 (1) Case Control (2) Case Control (3) Case Control (4) Case Control

41 Bandwidth (BW) selection using cross validation. Case Control Same BW Diff. BW s Same BW Diff. BW s h h 1 h 2 h h 1 h 2 Leave One Out Simulation Simulation Alternative (12) Simulation Simulation

42 Outline 1 Density Ratio Examples

43 Suppose we have m multivariate samples of p-dim vectors: (Random Covariates, Response) = (x 1,..., x (p 1), y) We wish to estimate m conditional expectations: E(y x 1,..., x (p 1) )

44 Data: i = 1,..., q, m, j = 1,..., n i, (x ij1, x ij2,..., x ij(p 1), y ij ) g i (x 1,..., x (p 1), y). Reference: Distortions: g g m (x 1,..., x (p 1), y) g i (x 1,..., x (p 1), y), i = 1,..., q

45 Model: As in ratio of pdf s from N(µ 1, Σ), N(µ 2, Σ) assume g i (x) g(x) = w(x, α i, β i ) exp(α i + β ix), i = 1,..., q (16) where x = (x 1,..., x (p 1), y) and β i = (β i1,..., β ip ).

46 Let t denote the vector of combined data of length n = n 1 + n n m. Following the method of constrained empirical likelihood we obtain score equations for ˆα j and ˆβ j : l α j = l β j = n i=1 n i=1 ρ j w j (t i ) 1 + ρ 1 w 1 (t i ) + + ρ q w q (t i ) + n j = 0 (17) ρ j w j (t i )t i 1 + ρ 1 w 1 (t i ) + + ρ q w q (t i ) + n j (x ji1,..., y ji ) (18) = 0 i=1 j = 1,..., q and ρ j = n j /n m w j (t i ) = exp(α j + β j t i)

47 ˆp i = 1 n m Ĝ(t) = 1 n m ρ 1 ŵ 1 (t i ) + + ρ q ŵ q (t i ) n I(t i t) 1 + ρ 1 ŵ 1 (t i ) + + ρ q ŵ q (t i ) i=1 (19) (20) where ŵ j (t i ) = exp(ˆα j + ˆβ jt i ) As n, the estimators ˆθ = (ˆα 1,, ˆα q, ˆβ 1,, ˆβ q ) are asymptotically normal (Qin and Zhang 1997, Lu 2007).

48 Under the p-dimensional density ratio model we can predict the response y given the covariate information x 1, x 2,..., x (p 1) for any of the m data sets as follows: n j ĝ j (x 1,..., x (p 1), y i ) Ê j (y x 1,..., x (p 1) ) = y i, j = 1,..., q, m. y i ĝ j (x 1,..., x (p 1), y i ) i (21) The ĝ j in (21) are the semiparametric kernel density estimates.

49 Assume that the data are bounded. Then: (a) As n, h 0 and nh p, Ê(y x) E(y x) g(x)dx 0 in the mean square sense. (b) If, in addition 0 < A < g(x), then Ê(y x) E(y x) dx 0 in the mean square sense.

50 Outline 1 Density Ratio Examples

51 Testicular germ cell tumor (TGCT) is a common cancer among U.S. men, mainly in the age group of years (McGlynn et al 2003). Increased risk is significantly related to height, whereas body mass index was not found significant (McGlynn et al 2007). Using a two dimensional semiparametric model, it was shown that jointly height and weight are significant risk factors (Kedem et al 2009). We use TGCT data with 3 variables: age, height (cm), weight (kg). Case: n 1 = 763. Control: n 2 = 928. n = 1691 individuals. We consider two cases: 2D with variables height and weight 3D with variables height, weight and age. In both cases the control distribution is the reference distribution.

52 2D Problem: E(Weight Height) Diagnostic plots of Ĝi versus G i, i = 1, 2 evaluated at (height,weight) pairs. Case Group Control Group Joint Sem. CDF Joint Sem. CDF Joint Empirical CDF Joint Empirical CDF

53 2D Case Regression 2D TGCT: E[Weight/Height] for G1 Residual plot for 2D TGCT Case weight Actual points Ref. Regression Semiparametric GAM NW hat(e) height Sem. E[weight/height]

54 2D Control Regression 2D TGCT: E[Weight/Height] for G2 Residual plot for 2D TGCT Control weight Actual points Ref. Regression Semiparametric GAM NW hat(e) height Sem. E[weight/height]

55 3D Problem: E(Weight Height, Age) Diagnostic plots of Ĝi versus G i, i = 1, 2 evaluated at (age,height,weight) triplets. Case Group Control Group Joint Sem. CDF Joint Sem. CDF Joint Empirical CDF Joint Empirical CDF

56 3D Residuals Residual plots for the semiparametric model in the 3D TGCT data set. Residual plot for 3D TGCT Case Residual plot for 3D TGCT Control hat(e) hat(e) Sem. E[weight/height, age] Sem. E[weight/height, age]

57 Some joint probabilities of age, height and weight in the case and control groups. Probability Case Control Pr(A 45, H , W ) Pr(A 26, H , W ) Pr(A 29, H , W ) Pr(A 33, H , W ) Pr(A 34, H , W ) Pr(A 37, H , W ) Pr(A 40, H , W ) Pr(A 43, H , W ) Pr(A 45, H , W )

58 Case-control weight and Ê[weight height, age]. Empty entries in the table correspond to subjects with the same height and age, but possibly different weights. Control Case Age Height Weight Ê[W H, A] Weight Ê[W H, A]

59 Case-control weight and Ê[weight height, age]. Empty entries in the table correspond to subjects with the same height and age, but possibly different weights. Control Case Age Height Weight Ê[W H, A] Weight Ê[W H, A]

60 Case-control weight and Ê[weight height, age]. Empty entries in the table correspond to subjects with the same height and age, but possibly different weights. Control Case Age Height Weight Ê[W H, A] Weight Ê[W H, A]

Semiparametric Regression Based on Multiple Sources

Semiparametric Regression Based on Multiple Sources Semiparametric Regression Based on Multiple Sources Benjamin Kedem Department of Mathematics & Inst. for Systems Research University of Maryland, College Park Give me a place to stand and rest my lever

More information

ABSTRACT. Semiparametric Regression and Mortality Rate Prediction. Anastasia Voulgaraki, Doctor of Philosophy, 2011

ABSTRACT. Semiparametric Regression and Mortality Rate Prediction. Anastasia Voulgaraki, Doctor of Philosophy, 2011 ABSTRACT Title of dissertation: Semiparametric Regression and Mortality Rate Prediction Anastasia Voulgaraki, Doctor of Philosophy, 2011 Dissertation directed by: Professor Benjamin Kedem Department of

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

MIT Spring 2015

MIT Spring 2015 MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Mathematical Preliminaries

Mathematical Preliminaries Mathematical Preliminaries Economics 3307 - Intermediate Macroeconomics Aaron Hedlund Baylor University Fall 2013 Econ 3307 (Baylor University) Mathematical Preliminaries Fall 2013 1 / 25 Outline I: Sequences

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

ON SOME TWO-STEP DENSITY ESTIMATION METHOD UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,

More information

Test Problems for Probability Theory ,

Test Problems for Probability Theory , 1 Test Problems for Probability Theory 01-06-16, 010-1-14 1. Write down the following probability density functions and compute their moment generating functions. (a) Binomial distribution with mean 30

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

1 Empirical Likelihood

1 Empirical Likelihood Empirical Likelihood February 3, 2016 Debdeep Pati 1 Empirical Likelihood Empirical likelihood a nonparametric method without having to assume the form of the underlying distribution. It retains some of

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Chapter 10. U-statistics Statistical Functionals and V-Statistics Chapter 10 U-statistics When one is willing to assume the existence of a simple random sample X 1,..., X n, U- statistics generalize common notions of unbiased estimation such as the sample mean and the

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications Fumiya Akashi Research Associate Department of Applied Mathematics Waseda University

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data International Mathematical Forum, 3, 2008, no. 33, 1643-1654 Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data A. Al-khedhairi Department of Statistics and O.R. Faculty

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Package Rsurrogate. October 20, 2016

Package Rsurrogate. October 20, 2016 Type Package Package Rsurrogate October 20, 2016 Title Robust Estimation of the Proportion of Treatment Effect Explained by Surrogate Marker Information Version 2.0 Date 2016-10-19 Author Layla Parast

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Anna Kiriliouk 1 Holger Rootzén 2 Johan Segers 1 Jennifer L. Wadsworth 3 1 Université catholique de Louvain (BE) 2 Chalmers

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Random fraction of a biased sample

Random fraction of a biased sample Random fraction of a biased sample old models and a new one Statistics Seminar Salatiga Geurt Jongbloed TU Delft & EURANDOM work in progress with Kimberly McGarrity and Jilt Sietsma September 2, 212 1

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information