Statstical models, selection effects and sampling bias

Size: px
Start display at page:

Download "Statstical models, selection effects and sampling bias"

Transcription

1 Statstical models, selection effects and sampling bias Department of Statistics University of Chicago Statslab Cambridge Nov

2 Outline 1 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models 2 Sampling bias Volunteer samples Joint distributions of Y given x Randomization 3 Ewes and lambs Notation Estimating functions Interference 4

3 Conventional regression model Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Fixed index set U (always infinite): u 1, u 2,... subjects, plots... Covariate x(u 1 ), x(u 2 ),... (non-random, vector-valued) Response Y (u 1 ), Y (u 2 ),... (random, real-valued) Regression model: Sample = finite ordered subset u 1,..., u n (distinct in U) For each sample with configuration x = (x(u 1 ),..., x(u n )) Response distribution p x (y) on R n depends on x Kolmogorov consistency condition on distributions p x ( ) x is not a random variable p x ( ) is not the conditional distribution given x

4 Gaussian regression model Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Covariate x(u) = (x 1 (u), x 2 (u)) partitioned into two components x 1 (u) = (variety(u), treatment(u),...) affecting the mean x 2 (u) = (block(u), coordinates(u)) affecting covariances Example: response distribution for a fixed sample of n plots p x (y A; θ) = N n (X 1 β, σ 2 0 I n + σ 2 1 K [x 2])(A) A R n, K [x] = {K (x i, x j )} block-factor models: K (i, j) = 1 if block(i) = block(j) spatial/temporal models: K (i, j) = exp( x 2 (i) x 2 (j) /τ) Generalized random field: K (i, j) = ave x i, x j log x x Equivalent to Y (u) = µ(u) + ɛ(u) + η(u) with ɛ η,...

5 Binary regression model (GLMM) Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Units: u 1, u 2,... subjects, patients, plots (labelled) Covariate x(u 1 ), x(u 2 ),... (non-random, X -valued) Latent process η on X (Gaussian, for example) Responses Y (u 1 ),... conditionally independent given η logit pr(y (u) = 1 η) = α + βx(u) + η(x(u)) Joint distribution for sample having configuration x p x (y) = E η n i=1 e (α+βx i +η(x i ))y i 1 + e α+βx i +η(x i ) parameters α, β, K, K (x, x ) = cov(η(x), η(x )).

6 Attenuation of treatment effect Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models η is a random effect (associated with clinical centre) logit pr(y i = 1 η, Trt = 0) = α + η(x i ) logit pr(y i = 1 η, Trt = 1) = α + η(x i ) + τ Unconditionally pr(y i = 1 Trt = t) = logit pr(y i = 1 Trt = t) = α + τ t e α+η+τ t 1 + e α+η+τ t φ(η; σ 2 ) dη Attenuation: τ τ/ σ 2 Treatment effect = odds(success w/treat)/odds(success without) Treatment effect: τ or τ?

7 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Binary regression model: computation GLMM computational problem: Options: p x (y) = n R n i=1 e (α+βx i +η(x i ))y i 1 + e α+βx φ(η; K ) dη i +η(x i ) Taylor approx: Laird and Ware; Schall; Breslow and Clayton, McC and Nelder, Drum and McC,... Laplace approximation: Wolfinger 1993; Shun and McC 1994 Numerical approximation: Egret E.M. algorithm: McCulloch 1994 for probit models Monte Carlo: Z&L,...

8 But,..., wait a minute... Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models p x (y) is the distribution for each fixed sample of n units. But... the sample might not be predetermined volunteer samples in clinical trials; pre-screening of patients to increase compliance; behavioural studies in ecology; marketing studies, with purchase events as units; public policy: crime type with crime events as units Ergo, x is also random, so we need a joint distribution. Q1: For a bivariate process, what does p x (y) represent? Q2: Is it necessarily the case that p x (y) = p(y x)?... even if sample size is random?

9 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models

10 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Problems in the application of conventional models Clinical trials / market research / traffic studies / crime... (i) Operational interpretation of a sample as a fixed subset or as a random subset independent of the process (ii) Sample units generated by a random process sequential recruitment, purchase events, traffic studies... (iii) Population also generated by a random process in time animal populations, purchase events, crime events,... (iv) Samples: random, sequential, quota,... (v) Conditional distribution given observed random x versus stratum distribution for fixed x

11 y = 2 Sampling bias Volunteer samples Joint distributions of Y given x Randomization y = 1 y = 0 X A point process on C X for C = {0, 1, 2}, and the superposition process on X. Intensity λ r (x) for class r: r = 0, 1, 2. x-values auto-generated by the superposition process with intensity λ.(x). To each auto-generated unit there corresponds an x-value and a y-value.

12 Binary point process model Sampling bias Volunteer samples Joint distributions of Y given x Randomization Intensity process λ 0 (x) for class 0, λ 1 (x) for class 1 Log ratio: η(x) = log λ 1 (x) log λ 0 (x) Events form a PP with intensity λ on {0, 1} X. Conventional GLMM calculation (Bayesian and frequentist): pr(y = 1 x, λ) = λ 1(x) λ. (x) = eη(x) 1 + e η(x) ( ) ( ) λ1 (x) e η(x) pr(y = 1 x) = E = E λ. (x) 1 + e η(x) GLMM calculation is correct in a sense, but irrelevant there might not be an event at x!

13 Sampling bias Volunteer samples Joint distributions of Y given x Randomization Correct calculation for auto-generated units pr(event of type r in dx λ) = λ r (x) dx + o(dx) pr(event of type r in dx) = E(λ r (x)) dx + o(dx) pr(event in SPP in dx λ) = λ. (x) dx + o(dx) pr(event in SPP in dx) = E(λ. (x)) dx + o(dx) pr(y (x) = r SPP event at x) = Eλ r (x) Eλ. (x) ( λr (x) ) = E x SPP = Eλ ( ) r (x) λ. (x) Eλ. (x) E λr (x) λ. (x) Sampling bias: Distn for fixed x versus distn for autogenerated x.

14 Two ways of thinking Sampling bias Volunteer samples Joint distributions of Y given x Randomization First way: waiting for Godot! Fix x X and wait for an event to occur at x pr(y = 1 λ, x) = λ 1(x) ( λ.(x) ) pr(y = 1; x) = E λ1 (x) = E(Y i i : X i = x) λ.(x) Conventional, mathematically correct, but seldom relevant Second way: come what may! SPP event occurs at x, a random point in X joint density at (y, x) proportional to E(λ y (x)) = m y (x) x has marginal density proportional to E(λ. (x)) = m. (x) pr(y = 1 x SPP) = Eλ ( ) 1(x) Eλ. (x) E λ1 (x) = E(Y i i : X i = x) λ. (x)

15 Ewes and lambs Notation Estimating functions Interference Log Gaussian illustration of sampling bias η 0 (x) GP(0, K ), λ 0 (x) = exp(η 0 (x)) η 1 (x) GP(α + βx, K ), λ 1 (x) = exp(η 1 (x)) η(x) = η 1 (x) η 0 (x) GP(α + βx, 2K ), K (x, x) = σ 2 One-dimensional sampling distributions: ( ) e η(x) ρ(x) = p x (Y = 1) = E 1 + e η(x) logit(ρ(x)) α + β x ( β < β ) e α+βx+σ2/2 π(x) = pr(y = 1 x SPP) = Eλ 1(x) Eλ. (x) = e σ2 /2 + e α+βx+σ2 /2 logit pr(y = 1 x SPP) = α + βx No approximation; no attenuation

16 Ewes and lambs Notation Estimating functions Interference Alternative formulation involving selection Fix the population U Associate with each u a random response intensity λ (u) y,t ( t = 0 t = 1 y = 0 λ (u) 00 λ (u) ) 01 y = 1 λ (u) 10 λ (u) 11 implying a volunteer intensity λ (u).. (random and small) pr(u Sample) = E(λ (u).. ) pr(u S & t u = t) = E(λ (u).t ) = m (u).t pr(y u = 1 u S & t u = t) = m (u) 1t /m (u).t E(λ 1t /λ.t )

17 Ewes and lambs Notation Estimating functions Interference Conditional joint distributions of Y given x Sampling with fixed x quota ( ) λyi (x i ) p x (y) = E λ. (x i ) ( ) e y i η(x i ) = E 1 + e η(x i ) coincides with standard GLMM model Sequential sampling fixed time: #x random p(y x) = E λ yi (x i ) e R λ.(x)ν(dx) E λ. (x i ) e R λ.(x)ν(dx)

18 Ewes and lambs Notation Estimating functions Interference Randomization: Simple treatment effect Event x > pair (t(x), y(x)), treatment status, outcome H 0 H a Expected intensity (t, y) λ y (x)γ t λ y (x)γ t,y m y (x)γ t,y (0, 0) λ 0 (x)γ 0 λ 0 (x)γ 00 m 0 (x)γ 00 (0, 1) λ 1 (x)γ 0 λ 1 (x)γ 01 m 1 (x)γ 01 (1, 0) λ 0 (x)γ 1 λ 0 (x)γ 10 m 0 (x)γ 10 (1, 1) λ 1 (x)γ 1 λ 1 (x)γ 11 m 1 (x)γ 11 Conditional on λ: odds(y (x) = 1 t, λ) = λ 1 (x)γ t1 /(λ 0 (x)γ t0 ) odds ratio = τ = γ 11 γ 00 /(γ 01 γ 10 ) Treatment effect constant in x and independent of λ

19 GLMM analysis for fixed x Ewes and lambs Notation Estimating functions Interference Sample (X, Y, t) 1, (X, Y, t) 2,... observed sequentially GLMM analysis for a single event at x λ 1 (x)γ t1 pr(y (x) = 1 t, λ) = λ 1 (x)γ t1 + λ 0 (x)γ ( t0 ) λ 1 (x)γ t1 pr(y (x) = 1 t) = E λ 1 (x)γ t1 + λ 0 (x)γ t0 odds(y (x) = 1 t = 1) odds ratio = odds(y (x) = 1 t = 0) = τ Conclusion: treatment effect is attenuated log τ < log τ

20 Ewes and lambs Notation Estimating functions Interference Point process analysis for autogenerated units Sample (X, Y, t) 1, (X, Y, t) 2,... observed sequentially Correct analysis for a single event at x λ 1 (x)γ t1 pr(y (x) = 1 t, λ) = λ 1 (x)γ t1 + λ 0 (x)γ t0 E(λ 1 (x))γ t1 pr(y (x) = 1 t, x SPP) = E(λ 1 (x))γ t1 + E(λ 0 (x))γ t0 odds(y (x) = 1 t, x SPP) = m 1 (x)γ t1 /(m 0 (x)γ t0 ) odds ratio = γ 11 γ 00 /(γ 01 γ 10 ) = τ Conclusion: No attenuation of treatment effect.

21 ... Problem: assess the effectiveness of treatment for scour in lambs Randomized trial: flock of 100 ewes due to produce lambs in Feb-March 10% zero; 40% single; 50% twins 140 lambs Randomize assignment of treatment (t = 0 or t = 1) to lambs randomization protocol... Response Y = 1 if lamb suffers from 6 weeks Lambs are the experimental units, but... Ewe = litter = cluster: (use index i for ewe, l for lamb) Responses Y il, Y il in same cluster are correlated

22 Correlation in clusters Standard GLMM model for correlation in clusters Introduce an iid cluster-specific random effect {η i }. eη i +α+βt il pr(y il = 1 η, t) = 1 + e η i +α+βt il Assume Y -values conditionally independent given η. Hence, the model distributions are e η i +α+βt il pr(y il = 1; t) = E η 1 + e η eα +β til i +α+βt il 1 + e α +β t il p t (y) = ( e η i +α+βt il ) E η 1 + e η i +α+βt il i l Use GLMM software to estimate β or β... or Bayesian software to obtain posterior distribution Does this make sense in the context?

23 Simple model: correct analysis Response distn w/o treatment for ewe i given λ w.p. λ (0) y w.p. λ (1) y (y {0, 1}) (y 1, y 2 ) w.p. λ (2) y 1,y 2 (y 1, y 2 {0, 1}) Distribution of #lambs: iid E(λ (0), λ (1)., λ (2).. ) Joint response distn incl treatment: (y, t) w.p. λ (1) y γ ty (y, t {0, 1}) (y 1, y 2 ) w.p. λ (2) y 1,y 2 γ t1 y 1 γ t2 y 2 such that γ.y = 1. (γ 11 γ 00 /(γ 10 γ 01 ) = e β ) constant

24 Simple model: correct analysis contd. pr(m i = 1, Y i = y, t i = t λ) = λ (1) y γ ty pr(m i = 1, Y i = y, t i = t) = E(λ (1) y )γ ty y, t {0, 1} odds(y i = 1 t i = 1) odds(y i = 1 t i = 0) = E(λ(1) 1 )E(λ(1) 0 ) γ 11γ 00 E(λ (1) 1 )E(λ(1) 0 ) γ = e β 10γ 01 (t i = 1 implies m i = 1): Compare with incorrect analysis pr(y il = 1 λ, t il = t) = λ (1) 1 γ t1 λ (1) 1 γ t1 + λ (1) 0 γ t0 log odds(y il = 1 t il = 1) log odds(y il = 1 t il = 0) β with β < β, giving the illusion of attenuation. = eη i +βt 1 + e η i +βt (GL

25 Attenuation: π(t) = eα+βt e η+α+βt 1 + e α+βt ρ(t) = E η 1 + e η+α+βt eα +β t 1 + e α +β t Estimating functions (sum over lambs) S(t, y) = lambs Y l π(t l ) S (t, y) = lambs Y l ρ(t l ) (GLMM, PA) Mean and variance of S and S E(S(t, y)) = 0; E(S (t, y)) 0 var(s(y, y)) =?

26 Notation: meaning of E(Y i X i = x) Exchangeable sequence (Y 1, X 1 ), (Y 2, X 2 ),... with binary Y implies conditionally iid given λ Stratum x: U x = {i : X i = x} an infinite random subsequence Stratum average: ave{y i : i U x } = λ 1 (x)/λ. (x) Stratum mean = expected value of stratum average: ( ) λ1 (x) ρ(x) = E eα +β x λ. (x) 1 + e α +β x is declared target in much biostatistical work (PA) Correct calculation for a random stratum in SPP: ( λ1 (x) ) π(x) = E x SPP = E(λ 1(x)) λ. (x) E(λ. (x)) = eα+βx 1 + e α+βx Conditional mean π(x) = E(Y i X i = x) versus stratum mean ρ(x) = E(Y i i : X i = x)

27 Consequences of ambiguous notation Sample (Y 1, X 1 ), (Y 2, X 2 ),... observed sequentially ( λ1 (x) ) π(x) = E x SPP = eα+βx λ. (x) 1 + e α+βx = E(Y i X i = x) ( ) λ1 (x) ρ(x) = E eα +β x λ. (x) 1 + e α +β x = E(Y i i : X i = x) (ρ(x) computed by logistic-normal integral) Conventional PA estimating function Y i ρ(x i ) is such that and same for Y 2,.... E(Y 1 ρ(x) X 1 = x) = π(x) ρ(x) 0

28 Estimating functions done correctly Mean intensity for class r: m r (x) = E(λ r (x)) π r (x) = m r (x)/m. (x); ρ r (x) = E(λ r (x)/λ. (x)) E(Y i ) = ρ(x) for each i in U x = {u : X u = x} For autogenerated x, E(Y x SPP) = π(x) ρ(x) T (x, y) = x SPP h(x)(y (x) π(x)) has zero mean for auto-generated configurations x. Note: E(T x) 0; average is also over configurations

29 Explanation of unbiasedness z = {(x 1, y 1 ),...} configuration generated in (0, t). x = {x 1,..., } marginal configuration (SPP) z is a random measure with mean t m r (x) ν(dx) at (r, x) x is marginal random measure with mean t m. (x) ν(dx) at x π r (x) = m r (x)/m. (x) Hence E(z(r, dx)) = π r (x)e(x(dx)) for all (r, x) implies T (x, y) = = x SPP X h(x)(y (x) π(x)) h(x) has zero expectation. But E(T x) 0. ( ) z(r, dx) π r (x)x(dx)

30 variance calculation: binary case (y, x) generated by point process; T (x, y) = h(x)(y (x) π(x)) x SPP E(T (x, y)) = 0; E(T x) 0 var(t ) = h 2 (x)π(x)(1 π(x)) m. (x) dx X + h(x)h(x ) V (x, x ) m.. (x, x ) dx dx X 2 + h(x)h(x ) 2 (x, x )m.. (x, x ) dx dx X 2 V : spatial or within-cluster correlation; : interference

31 What is interference? Physical/biological interference: distribution of Y (u) depends on x(u ) Sampling interference for autogenerated units m r (x) = E(λ r (x)); m rs (x, x ) = E(λ r (x)λ s (x )) Univariate distributions: π r (x) = m r (x)/m. (x) = pr(y (x) = r x SPP) Bivariate distributions: π rs (x, x ) = m rs (x, x )/m.. (x, x ) π rs (x, x ) = pr(y (x) = r, Y (x ) = s x, x SPP) Hence π r. (x, x ) = pr(y (x) = r x, x SPP) r (x, x ) = π r. (x, x ) π r (x) No second-order sampling interference if r (x, x ) = 0

32 Inference: Conventional Gaussian model Model p x (A) = N n (Xβ, Σ x = σ0 2 I n + K [x])(a) Rationalization Y (i) = x i i + η(x(i)) Stratum average: Ȳ (U x) = ave{y i i : x i = x} = x β + η(x) Conditional distribution of Y u for u U x given observation y, X Y u data N(x β + k Σ 1 x (y Xβ), Σ uu k Σ 1 x k) Ȳ (U x ) data N(x β + k Σ 1 x (y Xβ), K (x, x) k Σ 1 x k) k i = K (x, x i ), (such as e x x i or x x i 3 ) Stratum average is a random variable, not a parameter Estimate is a distribution (not a function of sufficient statistic) Likewise for GLMMs

33 Inference and predicton for the PP model For a sequential sample Observation (x, y) (x (0),..., x (k 1) ) Product density m r (x (r) ) = E( x x (r) λ r (x)) for class r Conditional distribution as a random labelled partition of x: p(y x) m 0 (x (0) ) m k 1 (x (k 1) ) For a subsequent autogenerated event p(y (x ) = r data, x SPP) m r (x (r), x )/m r (x (r) ) Use likelihood or estimating function to estimate parameters Use conditional distribution for inference/prediction

34 Brief summary of conclusions (i) Reasonable case for fixed-population model in certain areas laboratory work; field trials; veterinary trials;... (ii) Good case for autogenerated units in other areas clinical trials; marketing; crime; animal behaviour (iii) The choice matters in random-effects models (iv) π(x) = E(Y i X i = x) versus ρ(x) = E(Y i i : X i = x) conditional distribution versus stratum distribution attenuation or non-attenuation (v) What is modelled and estimated by PA? claims to estimate ρ(x) but actually estimates π(x) (vi) What does the GLMM likelihood estimate? Difficult to say; probably neither

Regression models, selection effects and sampling bias

Regression models, selection effects and sampling bias Regression models, selection effects and sampling bias Peter McCullagh Department of Statistics University of Chicago Hotelling Lecture I UNC Chapel Hill Dec 1 2008 www.stat.uchicago.edu/~pmcc/reports/bias.pdf

More information

Sampling bias in logistic models

Sampling bias in logistic models Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models

More information

Generalized linear models III Log-linear and related models

Generalized linear models III Log-linear and related models Generalized linear models III Log-linear and related models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Log-linear models Binomial models

More information

Sampling bias and logistic models

Sampling bias and logistic models J. R. Statist. Soc. B (2008) 70, Part 4, pp. 643 677 Sampling bias and logistic models Peter McCullagh University of Chicago, USA [Read before The Royal Statistical Society at a meeting organized by the

More information

Sampling bias and logistic models

Sampling bias and logistic models Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 0 words. Contributions longer than 0 words will be cut by the editor. 1 1 1 1 1 1

More information

Partition models and cluster processes

Partition models and cluster processes and cluster processes and cluster processes With applications to classification Jie Yang Department of Statistics University of Chicago ICM, Madrid, August 26 and cluster processes utline 1 and cluster

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Improper mixtures and Bayes s theorem

Improper mixtures and Bayes s theorem and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2 Tetsuya Matsubayashi University of North Texas November 9, 2010 Learn multiple responses models that do not require the assumption

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

A New Generalized Gumbel Copula for Multivariate Distributions

A New Generalized Gumbel Copula for Multivariate Distributions A New Generalized Gumbel Copula for Multivariate Distributions Chandra R. Bhat* The University of Texas at Austin Department of Civil, Architectural & Environmental Engineering University Station, C76,

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as Econ 513, USC, Fall 2005 Lecture 15 Discrete Response Models: Multinomial, Conditional and Nested Logit Models Here we focus again on models for discrete choice with more than two outcomes We assume that

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and

More information

Covariance modelling for longitudinal randomised controlled trials

Covariance modelling for longitudinal randomised controlled trials Covariance modelling for longitudinal randomised controlled trials G. MacKenzie 1,2 1 Centre of Biostatistics, University of Limerick, Ireland. www.staff.ul.ie/mackenzieg 2 CREST, ENSAI, Rennes, France.

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1 Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes 1 JunXuJ.ScottLong Indiana University 2005-02-03 1 General Formula The delta method is a general

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

Missing Covariate Data in Matched Case-Control Studies

Missing Covariate Data in Matched Case-Control Studies Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information