Penalized D-optimal design for dose finding

Size: px
Start display at page:

Download "Penalized D-optimal design for dose finding"

Transcription

1 Penalized for dose finding Luc Pronzato Laboratoire I3S, CNRS-Univ. Nice Sophia Antipolis, France

2 2/ 35 Outline

3 3/ 35 Bernoulli-type experiments: Y i {0,1} (success or failure) η(x,θ) = Prob(Y i = 1 x i = x,θ) ˆθ n the maximum-likelihood estimator: ˆθ n = argmax θ Θ n {Y i log[η(x i,θ)]+(1 Y i )log[1 η(x i,θ)]} i=1 Fisher information matrix: M(ξ,θ) = with X f θ (x)f θ (x)ξ(dx) f θ (x) = {η(x,θ)[1 η(x,θ)]} 1/2 η(x,θ) θ

4 3/ 35 Bernoulli-type experiments: Y i {0,1} (success or failure) η(x,θ) = Prob(Y i = 1 x i = x,θ) ˆθ n the maximum-likelihood estimator: ˆθ n = argmax θ Θ n {Y i log[η(x i,θ)]+(1 Y i )log[1 η(x i,θ)]} i=1 Fisher information matrix: M(ξ,θ) = with X f θ (x)f θ (x)ξ(dx) f θ (x) = {η(x,θ)[1 η(x,θ)]} 1/2 η(x,θ) θ

5 4/ 35 Maximize log det M(ξ,θ) under the contraint Φ(ξ,θ) C with Φ(ξ,θ) = X φ(x,θ)ξ(dx) (φ(x,θ) = cost of one observation at x) NSC for the optimality of ξ : Φ(ξ,θ) C and there exists λ = λ (C,θ) 0 such that { λ [C Φ(ξ,θ)] = 0 x X, f θ (x)m 1 (ξ,θ)f θ (x) p +λ [φ(x,θ) Φ(ξ,θ)] In practice: maximize H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ) for an increasing sequence {λ i } of Lagrange multipliers, starting from λ 0 = 0 and stopping at the first λ i for which ξ (λ i ) satisfies Φ[ξ (λ i ),θ] C [Mikulecká 1983] not more complicated than the determination of (a sequence of) (s)!

6 4/ 35 Maximize log det M(ξ,θ) under the contraint Φ(ξ,θ) C with Φ(ξ,θ) = X φ(x,θ)ξ(dx) (φ(x,θ) = cost of one observation at x) NSC for the optimality of ξ : Φ(ξ,θ) C and there exists λ = λ (C,θ) 0 such that { λ [C Φ(ξ,θ)] = 0 x X, f θ (x)m 1 (ξ,θ)f θ (x) p +λ [φ(x,θ) Φ(ξ,θ)] In practice: maximize H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ) for an increasing sequence {λ i } of Lagrange multipliers, starting from λ 0 = 0 and stopping at the first λ i for which ξ (λ i ) satisfies Φ[ξ (λ i ),θ] C [Mikulecká 1983] not more complicated than the determination of (a sequence of) (s)!

7 4/ 35 Maximize log det M(ξ,θ) under the contraint Φ(ξ,θ) C with Φ(ξ,θ) = X φ(x,θ)ξ(dx) (φ(x,θ) = cost of one observation at x) NSC for the optimality of ξ : Φ(ξ,θ) C and there exists λ = λ (C,θ) 0 such that { λ [C Φ(ξ,θ)] = 0 x X, f θ (x)m 1 (ξ,θ)f θ (x) p +λ [φ(x,θ) Φ(ξ,θ)] In practice: maximize H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ) for an increasing sequence {λ i } of Lagrange multipliers, starting from λ 0 = 0 and stopping at the first λ i for which ξ (λ i ) satisfies Φ[ξ (λ i ),θ] C [Mikulecká 1983] not more complicated than the determination of (a sequence of) (s)!

8 5/ 35 Dose-response problem in clinical trials: Define φ(x,θ) from the success probability (efficacy, no toxicity) [Dragalin & Fedorov 2006; Dragalin, Fedorov & Wu 2008] λ in H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ) sets a compromise between the information gain (a pb. of collective ethics) and the rejection of non efficient or toxic doses (a pb. of individual ethics for the enroled patients)

9 6/ 35 ξ (λ) maximizes H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ), with Φ(ξ,θ) = X φ(x,θ)ξ(dx) 3 remarks: 1. If Φ(ξ,θ) = 1 p logψ(ξ,θ) ] maximize log det [ M(ξ,θ) Ψ λ (ξ,θ) maximize log det[nm(ξ,θ)] with the total cost constraint NΨ λ (ξ,θ) C, any C > 0 [Dragalin & Fedorov 2006] 2. Φ[ξ (λ),θ] min x φ(x,θ)+ dim(θ) λ

10 6/ 35 ξ (λ) maximizes H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ), with Φ(ξ,θ) = X φ(x,θ)ξ(dx) 3 remarks: 1. If Φ(ξ,θ) = 1 p logψ(ξ,θ) ] maximize log det [ M(ξ,θ) Ψ λ (ξ,θ) maximize log det[nm(ξ,θ)] with the total cost constraint NΨ λ (ξ,θ) C, any C > 0 [Dragalin & Fedorov 2006] 2. Φ[ξ (λ),θ] min x φ(x,θ)+ dim(θ) λ

11 6/ 35 ξ (λ) maximizes H θ (ξ,λ) = log detm(ξ,θ) λφ(ξ,θ), with Φ(ξ,θ) = X φ(x,θ)ξ(dx) 3 remarks: 1. If Φ(ξ,θ) = 1 p logψ(ξ,θ) ] maximize log det [ M(ξ,θ) Ψ λ (ξ,θ) maximize log det[nm(ξ,θ)] with the total cost constraint NΨ λ (ξ,θ) C, any C > 0 [Dragalin & Fedorov 2006] 2. Φ[ξ (λ),θ] min x φ(x,θ)+ dim(θ) λ

12 7/ If φ(x,θ) has a unique minimum in X at x and is flat enough around x, the support points of ξ (λ) minimizing log detm(ξ,θ) λφ(ξ,θ), tend to gather around x when λ [LP, JSPI, 2009] Suppose X finite with φ(x (,θ)... φ(x (m),θ) }{{}... φ(x(k),θ) Define X m = {x (,...,x (m) }, δ = φ(x (m+,θ) φ(x (m),θ) and suppose that f θ (x ( ),...,f θ (x (m) ) span R p Define φ(x,θ) = [φ(x,θ) φ(x (,θ)] and consider Φ q (ξ,θ) = X φ q (x,θ)ξ(dx) Then ξ (λ) maximizing log detm(ξ,θ) λφ q (ξ,θ) is supported on X m when λ and q large enough [q > log(2ˆt m )/ log{1+δ/[φ(x (m),θ) φ(x (,θ)]} λ > λ m,q = p/φ q (ˆξ m,θ) with ˆt m = f θ (x)m 1 (ˆξ m,θ)f θ (x) and ˆξ m = argmin ξ supp. on X m max x X\Xm f θ (x)m 1 (ξ,θ)f θ (x)]

13 7/ If φ(x,θ) has a unique minimum in X at x and is flat enough around x, the support points of ξ (λ) minimizing log detm(ξ,θ) λφ(ξ,θ), tend to gather around x when λ [LP, JSPI, 2009] Suppose X finite with φ(x (,θ)... φ(x (m),θ) }{{}... φ(x(k),θ) Define X m = {x (,...,x (m) }, δ = φ(x (m+,θ) φ(x (m),θ) and suppose that f θ (x ( ),...,f θ (x (m) ) span R p Define φ(x,θ) = [φ(x,θ) φ(x (,θ)] and consider Φ q (ξ,θ) = X φ q (x,θ)ξ(dx) Then ξ (λ) maximizing log detm(ξ,θ) λφ q (ξ,θ) is supported on X m when λ and q large enough [q > log(2ˆt m )/ log{1+δ/[φ(x (m),θ) φ(x (,θ)]} λ > λ m,q = p/φ q (ˆξ m,θ) with ˆt m = f θ (x)m 1 (ˆξ m,θ)f θ (x) and ˆξ m = argmin ξ supp. on X m max x X\Xm f θ (x)m 1 (ξ,θ)f θ (x)]

14 7/ If φ(x,θ) has a unique minimum in X at x and is flat enough around x, the support points of ξ (λ) minimizing log detm(ξ,θ) λφ(ξ,θ), tend to gather around x when λ [LP, JSPI, 2009] Suppose X finite with φ(x (,θ)... φ(x (m),θ) }{{}... φ(x(k),θ) Define X m = {x (,...,x (m) }, δ = φ(x (m+,θ) φ(x (m),θ) and suppose that f θ (x ( ),...,f θ (x (m) ) span R p Define φ(x,θ) = [φ(x,θ) φ(x (,θ)] and consider Φ q (ξ,θ) = X φ q (x,θ)ξ(dx) Then ξ (λ) maximizing log detm(ξ,θ) λφ q (ξ,θ) is supported on X m when λ and q large enough [q > log(2ˆt m )/ log{1+δ/[φ(x (m),θ) φ(x (,θ)]} λ > λ m,q = p/φ q (ˆξ m,θ) with ˆt m = f θ (x)m 1 (ˆξ m,θ)f θ (x) and ˆξ m = argmin ξ supp. on X m max x X\Xm f θ (x)m 1 (ξ,θ)f θ (x)]

15 7/ If φ(x,θ) has a unique minimum in X at x and is flat enough around x, the support points of ξ (λ) minimizing log detm(ξ,θ) λφ(ξ,θ), tend to gather around x when λ [LP, JSPI, 2009] Suppose X finite with φ(x (,θ)... φ(x (m),θ) }{{}... φ(x(k),θ) Define X m = {x (,...,x (m) }, δ = φ(x (m+,θ) φ(x (m),θ) and suppose that f θ (x ( ),...,f θ (x (m) ) span R p Define φ(x,θ) = [φ(x,θ) φ(x (,θ)] and consider Φ q (ξ,θ) = X φ q (x,θ)ξ(dx) Then ξ (λ) maximizing log detm(ξ,θ) λφ q (ξ,θ) is supported on X m when λ and q large enough [q > log(2ˆt m )/ log{1+δ/[φ(x (m),θ) φ(x (,θ)]} λ > λ m,q = p/φ q (ˆξ m,θ) with ˆt m = f θ (x)m 1 (ˆξ m,θ)f θ (x) and ˆξ m = argmin ξ supp. on X m max x X\Xm f θ (x)m 1 (ξ,θ)f θ (x)]

16 7/ If φ(x,θ) has a unique minimum in X at x and is flat enough around x, the support points of ξ (λ) minimizing log detm(ξ,θ) λφ(ξ,θ), tend to gather around x when λ [LP, JSPI, 2009] Suppose X finite with φ(x (,θ)... φ(x (m),θ) }{{}... φ(x(k),θ) Define X m = {x (,...,x (m) }, δ = φ(x (m+,θ) φ(x (m),θ) and suppose that f θ (x ( ),...,f θ (x (m) ) span R p Define φ(x,θ) = [φ(x,θ) φ(x (,θ)] and consider Φ q (ξ,θ) = X φ q (x,θ)ξ(dx) Then ξ (λ) maximizing log detm(ξ,θ) λφ q (ξ,θ) is supported on X m when λ and q large enough [q > log(2ˆt m )/ log{1+δ/[φ(x (m),θ) φ(x (,θ)]} λ > λ m,q = p/φ q (ˆξ m,θ) with ˆt m = f θ (x)m 1 (ˆξ m,θ)f θ (x) and ˆξ m = argmin ξ supp. on X m max x X\Xm f θ (x)m 1 (ξ,θ)f θ (x)]

17 8/ 35 Example 1 [Dragalin & Fedorov 2006] Cox model for bivariate responses: Y efficacy, Z toxicity 11 doses available, evenly spread in [ 3,3], X = {x (,...,x (1 }, x (i) < x (i+, i = 1,...,10 Prob{Y = y,z = z x,θ} = π yz (x,θ), Y,y,Z,z {0,1} θ = (a 11,b 11,a 10,b 10,a 01,b 01 ) = (3,3,4,2,0,2) R 6 π 11 (x,θ) = π 10 (x,θ) = π 01 (x,θ) = π 00 (x,θ) = e a 11+b 11 x 1+e a 01+b 01 x + e a 10+b 10 x + e a 11+b 11 x e a 10+b 10 x 1+e a 01+b 01 x + e a 10+b 10 x + e a 11+b 11 x e a 01+b 01 x 1+e a 01+b 01 x + e a 10+b 10 x + e a 11+b 11 x (1+e ) a 01+b 01 x + e a 10+b 10 x + e a 1 11+b 11 x

18 8/ 35 Example 1 [Dragalin & Fedorov 2006] Cox model for bivariate responses: Y efficacy, Z toxicity 11 doses available, evenly spread in [ 3,3], X = {x (,...,x (1 }, x (i) < x (i+, i = 1,...,10 Prob{Y = y,z = z x,θ} = π yz (x,θ), Y,y,Z,z {0,1} θ = (a 11,b 11,a 10,b 10,a 01,b 01 ) = (3,3,4,2,0,2) R 6 π 11 (x,θ) = π 10 (x,θ) = π 01 (x,θ) = π 00 (x,θ) = e a 11+b 11 x 1+e a 01+b 01 x + e a 10+b 10 x + e a 11+b 11 x e a 10+b 10 x 1+e a 01+b 01 x + e a 10+b 10 x + e a 11+b 11 x e a 01+b 01 x 1+e a 01+b 01 x + e a 10+b 10 x + e a 11+b 11 x (1+e ) a 01+b 01 x + e a 10+b 10 x + e a 1 11+b 11 x

19 9/ 35 φ 1 (x,θ) = π 1 10 (x,θ) (π 10 = efficacy and no toxicity) Optimal dose x : minimizes φ 1 (x,θ) x = x (5) =

20 10/ 35 φ 1 (x,θ) = π 1 10 (x,θ) (π 10 = efficacy and no toxicity) Optimal dose x : minimizes φ 1 (x,θ) x = x (5) = 0.6 ξ * {x (i) } λ

21 11/ 35 Other cost function: φ 2 (x,θ) = {π10 1 (x,θ) [max x π 10 (x,θ)] 1 } 2 (more flat than φ 1 (x,θ) around x )

22 11/ 35 Other cost function: φ 2 (x,θ) = {π10 1 (x,θ) [max x π 10 (x,θ)] 1 } 2 (more flat than φ 1 (x,θ) around x )

23 12/ 35 φ 2 (x,θ) = {π 1 10 (x,θ) [max x π 10 (x,θ)] 1 } 2 (more flat than φ 1 (x,θ) around x ) ξ * {x (i) } λ [1,100] λ

24 12/ 35 φ 2 (x,θ) = {π 1 10 (x,θ) [max x π 10 (x,θ)] 1 } 2 (more flat than φ 1 (x,θ) around x ) ξ * {x (i) } λ [1,1000] λ

25 13/ 35 yet another cost function: (more importance given to toxicity) φ 3 (x,θ) = π 1 10 (x,θ)[1 π 1(x,θ)] 1 with π 1 (x,θ) = π 01 (x,θ)+π 11 (x,θ) = marginal probability of toxicity (φ 3 (x,θ) is minimum at x (4) = 1.2)

26 13/ 35 yet another cost function: (more importance given to toxicity) φ 3 (x,θ) = π 1 10 (x,θ)[1 π 1(x,θ)] 1 with π 1 (x,θ) = π 01 (x,θ)+π 11 (x,θ) = marginal probability of toxicity (φ 3 (x,θ) is minimum at x (4) = 1.2)

27 14/ 35 φ 3 (x,θ) = π 1 10 (x,θ)[1 π 1(x,θ)] 1 with π 1 (x,θ) = π 01 (x,θ)+π 11 (x,θ) = marginal probability of toxicity (φ 3 (x,θ) is minimum at x (4) = 1.2) ξ * {x (i) } λ

28 15/ 35 Motivation: why sequential? Regression model η(x,θ), nonlinear in θ the optimal for estimating θ depends on θ! Information matrix M(ξ,θ) = X f θ(x)fθ (x)ξ(dx) with ξ a probability measure on X f θ (x) = 1 η(x,θ) σ θ (cont. diff. w.r.t. θ for any x X) ξd (θ) is for θ: ξ D (θ) maximizes log det[m(ξ,θ)]

29 16/ 35 Examples [Box & Lucas, 1959]: η(x,θ) = θ 1 θ 1 θ 2 [exp( θ 2 x) exp( θ 1 x)] for θ = (0.7, 0.2), ξd = 1 2 δ x ( δ x (2) with x ( 1.25 and x (2) 6.60 Michaelis-Menten: η(x,θ) = θ 1 x θ 2 +x, x (0,x], θ 1,θ 2 > 0 ξd = 1 2 δ x ( δ x (2) with x( = θ 2x 2θ 2 +x and x(2) = x Exponential decrease: η(x,θ) = θ 1 exp( θ 2 x), x x, θ 1,θ 2 > 0 ξd = 1 2 δ x ( δ x (2) with x( = x and x (2) = x + 1 θ 2...

30 16/ 35 Examples [Box & Lucas, 1959]: η(x,θ) = θ 1 θ 1 θ 2 [exp( θ 2 x) exp( θ 1 x)] for θ = (0.7, 0.2), ξd = 1 2 δ x ( δ x (2) with x ( 1.25 and x (2) 6.60 Michaelis-Menten: η(x,θ) = θ 1 x θ 2 +x, x (0,x], θ 1,θ 2 > 0 ξd = 1 2 δ x ( δ x (2) with x( = θ 2x 2θ 2 +x and x(2) = x Exponential decrease: η(x,θ) = θ 1 exp( θ 2 x), x x, θ 1,θ 2 > 0 ξd = 1 2 δ x ( δ x (2) with x( = x and x (2) = x + 1 θ 2...

31 16/ 35 Examples [Box & Lucas, 1959]: η(x,θ) = θ 1 θ 1 θ 2 [exp( θ 2 x) exp( θ 1 x)] for θ = (0.7, 0.2), ξd = 1 2 δ x ( δ x (2) with x ( 1.25 and x (2) 6.60 Michaelis-Menten: η(x,θ) = θ 1 x θ 2 +x, x (0,x], θ 1,θ 2 > 0 ξd = 1 2 δ x ( δ x (2) with x( = θ 2x 2θ 2 +x and x(2) = x Exponential decrease: η(x,θ) = θ 1 exp( θ 2 x), x x, θ 1,θ 2 > 0 ξd = 1 2 δ x ( δ x (2) with x( = x and x (2) = x + 1 θ 2...

32 full sequential : choose x 1,...,x n0, estimate ˆθ n 0, set k = n 0 then x k+1 observe Y k+1 re-estimate ˆθ k+1 k k ity: [ k ] x k+1 = arg max x X det i=1 fˆθ (x k i )f ˆθ k(x i)+fˆθ (x)f ˆθ k k(x) or equivalently x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθ k (x) with ξ k = 1 k k i=1 δ x i the empirical measure defined by x 1,...,x k M(ξ k,θ) = 1 k k i=1 f θ(x i )f θ (x i) we hope that ˆθ n a.s. θ and n(ˆθ n θ) d N(0,M 1 [ξ D ( θ), θ]) 17/ 35

33 full sequential : choose x 1,...,x n0, estimate ˆθ n 0, set k = n 0 then x k+1 observe Y k+1 re-estimate ˆθ k+1 k k ity: [ k ] x k+1 = arg max x X det i=1 fˆθ (x k i )f ˆθ k(x i)+fˆθ (x)f ˆθ k k(x) or equivalently x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθ k (x) with ξ k = 1 k k i=1 δ x i the empirical measure defined by x 1,...,x k M(ξ k,θ) = 1 k k i=1 f θ(x i )f θ (x i) we hope that ˆθ n a.s. θ and n(ˆθ n θ) d N(0,M 1 [ξ D ( θ), θ]) 17/ 35

34 18/ 35 LS estimation in nonlinear regression Y i = η(x i, θ)+ε i, x i X R d, θ Θ R p, {ε i } i.i.d., variance σ 2 LS estimation: ˆθ n = argmin θ Θ S n (θ) with S n (θ) = n i=1 [Y i η(x i,θ)] 2 Conditions for strong consistency of ˆθ n (ˆθ n a.s. θ) very much differ depending whether the x k are constants or depend on ε i, i < k

35 19/ 35 Sequential For n 0 k < n, x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθ k (x) How to ensure that ˆθ n a.s. θ and n(ˆθn θ) d N(0,M 1 [ξd ( θ), θ]) when n? ❶ deterministic choice of x k when k {k 1,k 2,...}, with k i i α, α (1,2) [Lai 1994] ❷ let n 0 tend to when n ❸ suppose that X is a finite set replication of observations

36 19/ 35 Sequential For n 0 k < n, x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθ k (x) How to ensure that ˆθ n a.s. θ and n(ˆθn θ) d N(0,M 1 [ξd ( θ), θ]) when n? ❶ deterministic choice of x k when k {k 1,k 2,...}, with k i i α, α (1,2) [Lai 1994] ❷ let n 0 tend to when n ❸ suppose that X is a finite set replication of observations

37 19/ 35 Sequential For n 0 k < n, x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθ k (x) How to ensure that ˆθ n a.s. θ and n(ˆθn θ) d N(0,M 1 [ξd ( θ), θ]) when n? ❶ deterministic choice of x k when k {k 1,k 2,...}, with k i i α, α (1,2) [Lai 1994] ❷ let n 0 tend to when n ❸ suppose that X is a finite set replication of observations

38 19/ 35 Sequential For n 0 k < n, x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθ k (x) How to ensure that ˆθ n a.s. θ and n(ˆθn θ) d N(0,M 1 [ξd ( θ), θ]) when n? ❶ deterministic choice of x k when k {k 1,k 2,...}, with k i i α, α (1,2) [Lai 1994] ❷ let n 0 tend to when n ❸ suppose that X is a finite set replication of observations

39 20/ 35 X is a finite set X = {x (,...,x (K) } Convergence of ˆθ n to θ: everything is fine if D n (θ, θ) = n k=1 [η(x i,θ) η(x i, θ)] 2 grows to fast enough for all θ θ Theorem 1: convergence [LP, S&P Letters, 2009] If D n (θ, θ) = n k=1 [η(x i,θ) η(x i, θ)] 2 satisfies [ ] for all δ > 0, inf D n (θ, θ) /(log log n) a.s. θ θ δ/τ n with X finite and {τ n } a non-decreasing sequence of positive constants, then ˆθ n satisfies τ n ˆθ n a.s. θ 0

40 20/ 35 X is a finite set X = {x (,...,x (K) } Convergence of ˆθ n to θ: everything is fine if D n (θ, θ) = n k=1 [η(x i,θ) η(x i, θ)] 2 grows to fast enough for all θ θ Theorem 1: convergence [LP, S&P Letters, 2009] If D n (θ, θ) = n k=1 [η(x i,θ) η(x i, θ)] 2 satisfies [ ] for all δ > 0, inf D n (θ, θ) /(log log n) a.s. θ θ δ/τ n with X finite and {τ n } a non-decreasing sequence of positive constants, then ˆθ n satisfies τ n ˆθ n a.s. θ 0

41 21/ 35 Theorem 2: asymptotic normality [LP, S&P Letters, 2009] If there exists a sequence of matrices C n symmetric pos. def. such that C 1 n M 1/2 (ξ n, θ) p I with c n = λ min (C n ) and D n (θ, θ) satisfying n 1/4 c n and [ ] for all δ > 0, inf θ θ c nδ 2 D n (θ, θ) /(log log n) a.s. then ˆθ n satisfies n M 1/2 (ξ n,ˆθ n )(ˆθ n θ) d N(0,I) Apply that to x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθk(x) with X finite for any sequence {ˆθ k }, the sampling rate of a non-singular is > 0

42 21/ 35 Theorem 2: asymptotic normality [LP, S&P Letters, 2009] If there exists a sequence of matrices C n symmetric pos. def. such that C 1 n M 1/2 (ξ n, θ) p I with c n = λ min (C n ) and D n (θ, θ) satisfying n 1/4 c n and [ ] for all δ > 0, inf θ θ c nδ 2 D n (θ, θ) /(log log n) a.s. then ˆθ n satisfies n M 1/2 (ξ n,ˆθ n )(ˆθ n θ) d N(0,I) Apply that to x k+1 = arg max x X f ˆθ k(x)m 1 (ξ k,ˆθ k )fˆθk(x) with X finite for any sequence {ˆθ k }, the sampling rate of a non-singular is > 0

43 22/ 35 x ( x (2)... x (j)... x (l)... x (K) }{{} lim inf n n i /n > α > 0 ˆθ n a.s. θ M(ξ n,ˆθ n ) a.s. M[ξ D ( θ), θ] [nm(ξ n,ˆθ n )] 1/2 (ˆθ n θ) d N(0,I)

44 22/ 35 x ( x (2)... x (j)... x (l)... x (K) }{{} lim inf n n i /n > α > 0 ˆθ n a.s. θ M(ξ n,ˆθ n ) a.s. M[ξ D ( θ), θ] [nm(ξ n,ˆθ n )] 1/2 (ˆθ n θ) d N(0,I)

45 22/ 35 x ( x (2)... x (j)... x (l)... x (K) }{{} lim inf n n i /n > α > 0 ˆθ n a.s. θ M(ξ n,ˆθ n ) a.s. M[ξ D ( θ), θ] [nm(ξ n,ˆθ n )] 1/2 (ˆθ n θ) d N(0,I)

46 22/ 35 x ( x (2)... x (j)... x (l)... x (K) }{{} lim inf n n i /n > α > 0 ˆθ n a.s. θ M(ξ n,ˆθ n ) a.s. M[ξ D ( θ), θ] [nm(ξ n,ˆθ n )] 1/2 (ˆθ n θ) d N(0,I)

47 23/ 35 } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) Again, sequential construction independency is lost Use the assumption that X is finite: X = {x (,...,x (K) } If λ k = constant λ: ˆθ n a.s. θ and M(ξ n, θ) M ( θ) (optimal for criterion log det M(ξ, θ) λ Φ(ξ, θ)) and nm 1/2 (ξ n,ˆθ n )(ˆθ n θ) d N(0,I) Also true if λ k = bounded measurable function of x 1,Y 1,...,x k,y k (e.g., λ k = λ (ˆθ k ) = optimal Lagrange coefficient for minimization of log detm(ξ,ˆθ k ) under the constraint Φ(ξ,ˆθ k ) C)

48 23/ 35 } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) Again, sequential construction independency is lost Use the assumption that X is finite: X = {x (,...,x (K) } If λ k = constant λ: ˆθ n a.s. θ and M(ξ n, θ) M ( θ) (optimal for criterion log det M(ξ, θ) λ Φ(ξ, θ)) and nm 1/2 (ξ n,ˆθ n )(ˆθ n θ) d N(0,I) Also true if λ k = bounded measurable function of x 1,Y 1,...,x k,y k (e.g., λ k = λ (ˆθ k ) = optimal Lagrange coefficient for minimization of log detm(ξ,ˆθ k ) under the constraint Φ(ξ,ˆθ k ) C)

49 23/ 35 } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) Again, sequential construction independency is lost Use the assumption that X is finite: X = {x (,...,x (K) } If λ k = constant λ: ˆθ n a.s. θ and M(ξ n, θ) M ( θ) (optimal for criterion log det M(ξ, θ) λ Φ(ξ, θ)) and nm 1/2 (ξ n,ˆθ n )(ˆθ n θ) d N(0,I) Also true if λ k = bounded measurable function of x 1,Y 1,...,x k,y k (e.g., λ k = λ (ˆθ k ) = optimal Lagrange coefficient for minimization of log detm(ξ,ˆθ k ) under the constraint Φ(ξ,ˆθ k ) C)

50 24/ 35 If λ k ր, (λ k log log k)/k 0, then ˆθ n a.s. θ moreover, convergence to minimum-cost : Φ(ξ n, θ) = 1 n n i=1 φ(x i, θ) a.s. φ θ = min x X φ(x, θ)...and ξ N (x (i ) ) a.s. 1 if φ(x, θ) has a unique minimum at x (i ) X = {x (,...,x (K) } We can thus optimize n i=1 φ(x i, θ) without knowing θ: self-tuning optimization } x k+1 = arg max x X {f ˆθ km 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) Already suggested for linear regression (η(x,θ) linear in θ) [Åström & Wittenmark, 1989], condition on λ k in [LP, AS 2000] Here, LS in nonlinear regression, or ML, but with X finite Beware: x k+1 = arg min x X φ(x,ˆθ k ) may not work!

51 24/ 35 If λ k ր, (λ k log log k)/k 0, then ˆθ n a.s. θ moreover, convergence to minimum-cost : Φ(ξ n, θ) = 1 n n i=1 φ(x i, θ) a.s. φ θ = min x X φ(x, θ)...and ξ N (x (i ) ) a.s. 1 if φ(x, θ) has a unique minimum at x (i ) X = {x (,...,x (K) } We can thus optimize n i=1 φ(x i, θ) without knowing θ: self-tuning optimization } x k+1 = arg max x X {f ˆθ km 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) Already suggested for linear regression (η(x,θ) linear in θ) [Åström & Wittenmark, 1989], condition on λ k in [LP, AS 2000] Here, LS in nonlinear regression, or ML, but with X finite Beware: x k+1 = arg min x X φ(x,ˆθ k ) may not work!

52 24/ 35 If λ k ր, (λ k log log k)/k 0, then ˆθ n a.s. θ moreover, convergence to minimum-cost : Φ(ξ n, θ) = 1 n n i=1 φ(x i, θ) a.s. φ θ = min x X φ(x, θ)...and ξ N (x (i ) ) a.s. 1 if φ(x, θ) has a unique minimum at x (i ) X = {x (,...,x (K) } We can thus optimize n i=1 φ(x i, θ) without knowing θ: self-tuning optimization } x k+1 = arg max x X {f ˆθ km 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) Already suggested for linear regression (η(x,θ) linear in θ) [Åström & Wittenmark, 1989], condition on λ k in [LP, AS 2000] Here, LS in nonlinear regression, or ML, but with X finite Beware: x k+1 = arg min x X φ(x,ˆθ k ) may not work!

53 Example 2 (self-tuning regulation) Observe Y i = θ 1 x θ 2 +x +ε i, {ε i } i.i.d. N(0,0. Find x such that Ψ(x, θ) = T Ψ(x,θ) = θ 1 [1 exp( θ 2 x/3)] η(x, θ) ( θ = (1, Ψ(x, θ) = 1/2 for x = 3log(2) 2.08) minimise φ(x,θ) = [Ψ(x,θ) T] 2 note that we do not observe Ψ(x, θ)! x 1 = 1, x 2 = 10, then } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k λ )fˆθk k φ(x,ˆθ k ) for three sequences {λ k } (a) λk = log 2 k (b) λk = k/(1+log 2 k) (c) λk = k / 35

54 Example 2 (self-tuning regulation) Observe Y i = θ 1 x θ 2 +x +ε i, {ε i } i.i.d. N(0,0. Find x such that Ψ(x, θ) = T Ψ(x,θ) = θ 1 [1 exp( θ 2 x/3)] η(x, θ) ( θ = (1, Ψ(x, θ) = 1/2 for x = 3log(2) 2.08) minimise φ(x,θ) = [Ψ(x,θ) T] 2 note that we do not observe Ψ(x, θ)! x 1 = 1, x 2 = 10, then } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k λ )fˆθk k φ(x,ˆθ k ) for three sequences {λ k } (a) λk = log 2 k (b) λk = k/(1+log 2 k) (c) λk = k / 35

55 Example 2 (self-tuning regulation) Observe Y i = θ 1 x θ 2 +x +ε i, {ε i } i.i.d. N(0,0. Find x such that Ψ(x, θ) = T Ψ(x,θ) = θ 1 [1 exp( θ 2 x/3)] η(x, θ) ( θ = (1, Ψ(x, θ) = 1/2 for x = 3log(2) 2.08) minimise φ(x,θ) = [Ψ(x,θ) T] 2 note that we do not observe Ψ(x, θ)! x 1 = 1, x 2 = 10, then } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k λ )fˆθk k φ(x,ˆθ k ) for three sequences {λ k } (a) λk = log 2 k (b) λk = k/(1+log 2 k) (c) λk = k / 35

56 Example 2 (self-tuning regulation) Observe Y i = θ 1 x θ 2 +x +ε i, {ε i } i.i.d. N(0,0. Find x such that Ψ(x, θ) = T Ψ(x,θ) = θ 1 [1 exp( θ 2 x/3)] η(x, θ) ( θ = (1, Ψ(x, θ) = 1/2 for x = 3log(2) 2.08) minimise φ(x,θ) = [Ψ(x,θ) T] 2 note that we do not observe Ψ(x, θ)! x 1 = 1, x 2 = 10, then } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k λ )fˆθk k φ(x,ˆθ k ) for three sequences {λ k } (a) λk = log 2 k (b) λk = k/(1+log 2 k) (c) λk = k / 35

57 26/ 35 Ψ(x k, θ), k = 1,...,500 (a) λ k = log 2 k (b) λ k = k/(1+log 2 k) (c) λ k = k (a) 1 (b) 1 (c) ψ(x n,θ) n n n

58 27/ 35 Replace φ(x,θ) = [ψ(x,θ) T] 2 by φ(x,θ) = [ψ(x,θ) T] 4, take λ k = 10 3 log 2 k the support points of ξ k tend to concentrate around x 1 10 ψ(x n,θ) x n n n

59 28/ 35 Example 1 [Dragalin & Fedorov 2006] (continued) Clinical trials: 11 doses, Y for efficacy, Z for toxicity, 36 patients up and down method [Ivanova 2003] max{x (in,x ( } ց si Z N = 1, x N+1 = x (i N) si Y N = 1 and Z N = 0, min{x (in+,x (1 } ր si Y N = 0 and Z N = 0, for the first 10 patients, then sequential switching at first observed toxicity (with maximum increase of one dose at each step)[dragalin & Fedorov 2006])

60 29/ 35 for (Y = 0,Z = 0), for (Y = 1,Z = 0), for (Y = 1,Z = and for (Y = 0,Z = Penalty function φ 1 (x,θ) = π10 1 (x,θ) x N N

61 30/ 35 Penalty function φ 3 (x,θ) = π 1 (x,θ)[1 π 1(x,θ)] 1 x N N

62 31/ 35 (36 patients, 1000 repetitions) Φ 1 (ξ,θ) J(ξ,θ) x {t<4} x {t=4} x {t=5} x {t=6} x {t>6} #x (1 S % 38.6% 36.9% 8.6% 13.9% 0 ξ u&d ( θ) S % 70.5% 7.8% 1.9% 5% ξd ( θ) S % 68.2% 7% 2.5% 2.3% ξλ=2 ( θ) φ 1 (ξ,θ) = π 1 10 (x,θ), J(ξ,θ) = det 1/6 [M(ξ,θ)], OSD = x (5) S1 = up & down S2 = up & down + sequential S3 = up & down + sequential

63 32/ 35 (240 patients, 150 repetitions) Φ 1 (ξ,θ) J(ξ,θ) x {t<4} x {t=4} x {t=5} x {t=6} x {t>6} #x (1 S % 77.3% 7.3% 1.3% 0 ξ u&d ( θ) S % 90% 0.7% 0.7% 0.1% S4 = up & down + sequential with logarithmic increase λ N ր switching when σ(optimal estimated dose) < x x = interval between 2 consecutive doses cautious increase of doses when σ(optimal estimated dose) > x /2 S4 better than S1 (= up & down)

64 33/ 35 Convergence and asymptotic normality of estimators under sequential x k+1 = arg max x X f ˆθk (x)m 1 (ξ k,ˆθ k )fˆθ k (x) and sequential } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) when λ k bounded...assuming that X is finite (only a technical assumption?) λ n not too fast self-tuning regulation/optimization

65 33/ 35 Convergence and asymptotic normality of estimators under sequential x k+1 = arg max x X f ˆθk (x)m 1 (ξ k,ˆθ k )fˆθ k (x) and sequential } x k+1 = arg max x X {f ˆθk M 1 (ξ k,ˆθ k )fˆθ λ k k φ(x,ˆθ k ) when λ k bounded...assuming that X is finite (only a technical assumption?) λ n not too fast self-tuning regulation/optimization

66 34/ 35 Asymptotic normality when λ n slowly enough? λ min [M(ξ n,ˆθ n )] A/λ n? Find a sequence of matrices C n symmetric pos. def. such that C 1 n M1/2 (ξ n, θ) p I: construct Cn from x k+1 = argmax x X { f θ M 1 (ξ k, θ)f θ λ k φ(x, θ) }? use Cn = M 1/2 (ξ ( θ,λ n ), θ) with ξ (θ,λ) maximizing log det M(ξ,θ) λφ(ξ,θ)?

67 35/ 35 Thank you for your attention!

ON THE REGULARIZATION OF SINGULAR C-OPTIMAL

ON THE REGULARIZATION OF SINGULAR C-OPTIMAL LABORATOIRE INFORMATIQUE, SIGNAUX ET SYSTÈMES DE SOPHIA ANTIPOLIS UMR 6070 ON THE REGULARIZATION OF SINGULAR C-OPTIMAL DESIGNS Luc Pronzato Equipe SYSTEMES Rapport de recherche ISRN I3S/RR 2008-15 FR Septembre

More information

PENALIZED AND RESPONSE-ADAPTIVE OPTIMAL DESIGNS

PENALIZED AND RESPONSE-ADAPTIVE OPTIMAL DESIGNS LABORATOIRE INFORMATIQUE, SIGNAUX ET SYSTÈMES DE SOPHIA ANTIPOLIS UMR 6070 PENALIZED AND RESPONSE-ADAPTIVE OPTIMAL DESIGNS WITH APPLICATION TO DOSE-FINDING Luc Pronzato Equipe SYSTEMES Rapport de recherche

More information

One-step ahead adaptive D-optimal design on a finite design. space is asymptotically optimal

One-step ahead adaptive D-optimal design on a finite design. space is asymptotically optimal Author manuscript, published in "Metrika (2009) 20" DOI : 10.1007/s00184-008-0227-y One-step ahead adaptive D-optimal design on a finite design space is asymptotically optimal Luc Pronzato Laboratoire

More information

ONE-STEP AHEAD ADAPTIVE D-OPTIMAL DESIGN ON A

ONE-STEP AHEAD ADAPTIVE D-OPTIMAL DESIGN ON A LABORATOIRE INFORMATIQUE, SIGNAUX ET SYSTÈMES DE SOPHIA ANTIPOLIS UMR 6070 ONE-STEP AHEAD ADAPTIVE D-OPTIMAL DESIGN ON A FINITE DESIGN SPACE IS ASYMPTOTICALLY OPTIMAL Luc Pronzato Equipe SYSTEMES Rapport

More information

On the efficiency of two-stage adaptive designs

On the efficiency of two-stage adaptive designs On the efficiency of two-stage adaptive designs Björn Bornkamp (Novartis Pharma AG) Based on: Dette, H., Bornkamp, B. and Bretz F. (2010): On the efficiency of adaptive designs www.statistik.tu-dortmund.de/sfb823-dp2010.html

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

ADAPTIVE EXPERIMENTAL DESIGNS. Maciej Patan and Barbara Bogacka. University of Zielona Góra, Poland and Queen Mary, University of London

ADAPTIVE EXPERIMENTAL DESIGNS. Maciej Patan and Barbara Bogacka. University of Zielona Góra, Poland and Queen Mary, University of London ADAPTIVE EXPERIMENTAL DESIGNS FOR SIMULTANEOUS PK AND DOSE-SELECTION STUDIES IN PHASE I CLINICAL TRIALS Maciej Patan and Barbara Bogacka University of Zielona Góra, Poland and Queen Mary, University of

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

The information complexity of best-arm identification

The information complexity of best-arm identification The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

OPTIMAL DESIGN INPUTS FOR EXPERIMENTAL CHAPTER 17. Organization of chapter in ISSO. Background. Linear models

OPTIMAL DESIGN INPUTS FOR EXPERIMENTAL CHAPTER 17. Organization of chapter in ISSO. Background. Linear models CHAPTER 17 Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall OPTIMAL DESIGN FOR EXPERIMENTAL INPUTS Organization of chapter in ISSO Background Motivation Finite sample

More information

Multiple Testing in Group Sequential Clinical Trials

Multiple Testing in Group Sequential Clinical Trials Multiple Testing in Group Sequential Clinical Trials Tian Zhao Supervisor: Michael Baron Department of Mathematical Sciences University of Texas at Dallas txz122@utdallas.edu 7/2/213 1 Sequential statistics

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

On construction of constrained optimum designs

On construction of constrained optimum designs On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct

More information

Math 533 Extra Hour Material

Math 533 Extra Hour Material Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

LECTURE 18: NONLINEAR MODELS

LECTURE 18: NONLINEAR MODELS LECTURE 18: NONLINEAR MODELS The basic point is that smooth nonlinear models look like linear models locally. Models linear in parameters are no problem even if they are nonlinear in variables. For example:

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Testing Algebraic Hypotheses

Testing Algebraic Hypotheses Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Statistics and econometrics

Statistics and econometrics 1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Theoretical Statistics. Lecture 23.

Theoretical Statistics. Lecture 23. Theoretical Statistics. Lecture 23. Peter Bartlett 1. Recall: QMD and local asymptotic normality. [vdv7] 2. Convergence of experiments, maximum likelihood. 3. Relative efficiency of tests. [vdv14] 1 Local

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Optimal experimental design, an introduction, Jesús López Fidalgo

Optimal experimental design, an introduction, Jesús López Fidalgo Optimal experimental design, an introduction Jesus.LopezFidalgo@uclm.es University of Castilla-La Mancha Department of Mathematics Institute of Applied Mathematics to Science and Engineering Books (just

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

arxiv:math/ v2 [math.st] 15 May 2006

arxiv:math/ v2 [math.st] 15 May 2006 The Annals of Statistics 2006, Vol. 34, No. 1, 92 122 DOI: 10.1214/009053605000000859 c Institute of Mathematical Statistics, 2006 arxiv:math/0605322v2 [math.st] 15 May 2006 SEQUENTIAL CHANGE-POINT DETECTION

More information

MULTIPLE-OBJECTIVE DESIGNS IN A DOSE-RESPONSE EXPERIMENT

MULTIPLE-OBJECTIVE DESIGNS IN A DOSE-RESPONSE EXPERIMENT New Developments and Applications in Experimental Design IMS Lecture Notes - Monograph Series (1998) Volume 34 MULTIPLE-OBJECTIVE DESIGNS IN A DOSE-RESPONSE EXPERIMENT BY WEI ZHU AND WENG KEE WONG 1 State

More information

Locally optimal designs for errors-in-variables models

Locally optimal designs for errors-in-variables models Locally optimal designs for errors-in-variables models Maria Konstantinou, Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany maria.konstantinou@ruhr-uni-bochum.de holger.dette@ruhr-uni-bochum.de

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Classical Estimation Topics

Classical Estimation Topics Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min

More information

An effective approach for obtaining optimal sampling windows for population pharmacokinetic experiments

An effective approach for obtaining optimal sampling windows for population pharmacokinetic experiments An effective approach for obtaining optimal sampling windows for population pharmacokinetic experiments Kayode Ogungbenro and Leon Aarons Centre for Applied Pharmacokinetic Research School of Pharmacy

More information

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0 AGEC 661 ote Eleven Ximing Wu M-estimator So far we ve focused on linear models, where the estimators have a closed form solution. If the population model is nonlinear, the estimators often do not have

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-2017-0500 Title Lasso-based Variable Selection of ARMA Models Manuscript ID SS-2017-0500 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0500 Complete

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Optimizing Combination Therapy under a Bivariate Weibull Distribution, with Application to Toxicity and Efficacy Responses

Optimizing Combination Therapy under a Bivariate Weibull Distribution, with Application to Toxicity and Efficacy Responses Optimizing Combination Therapy under a Bivariate Weibull Distribution, with Application to Toxicity and Efficacy Responses Kim, Sungwook (Peter) University of Missouri Issac Newton Institute for Mathematical

More information

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal

More information

Discussion Paper SFB 823. Nearly universally optimal designs for models with correlated observations

Discussion Paper SFB 823. Nearly universally optimal designs for models with correlated observations SFB 823 Nearly universally optimal designs for models with correlated observations Discussion Paper Holger Dette, Andrey Pepelyshev, Anatoly Zhigljavsky Nr. 3/212 Nearly universally optimal designs for

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

COMP2610/COMP Information Theory

COMP2610/COMP Information Theory COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 :

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Z-estimators (generalized method of moments)

Z-estimators (generalized method of moments) Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Construction and estimation of high dimensional copulas

Construction and estimation of high dimensional copulas Construction and estimation of high dimensional copulas Gildas Mazo PhD work supervised by S. Girard and F. Forbes Mistis, Inria and laboratoire Jean Kuntzmann, Grenoble, France Séminaire Statistiques,

More information

What s New in Econometrics. Lecture 15

What s New in Econometrics. Lecture 15 What s New in Econometrics Lecture 15 Generalized Method of Moments and Empirical Likelihood Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Generalized Method of Moments Estimation

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

On the Adversarial Robustness of Robust Estimators

On the Adversarial Robustness of Robust Estimators On the Adversarial Robustness of Robust Estimators Lifeng Lai and Erhan Bayraktar Abstract Motivated by recent data analytics applications, we study the adversarial robustness of robust estimators. Instead

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information