Local Minimax Testing

Size: px
Start display at page:

Download "Local Minimax Testing"

Transcription

1 Local Minimax Testing Sivaraman Balakrishnan and Larry Wasserman Carnegie Mellon June 11, / 32

2 Hypothesis testing beyond classical regimes Example 1: Testing distribution of bacteria in gut microbiome (d n) Example 2: Testing distribution of number of α-particles emitted by a radioactive source over a time window (d = ) June 11, / 32

3 Hypothesis testing beyond classical regimes Example 1: Testing distribution of bacteria in gut microbiome (d n) Example 2: Testing distribution of number of α-particles emitted by a radioactive source over a time window (d = ) Goal: Understand fundamental limits and avoid strong assumptions: June 11, / 32

4 Hypothesis testing beyond classical regimes Example 1: Testing distribution of bacteria in gut microbiome (d n) Example 2: Testing distribution of number of α-particles emitted by a radioactive source over a time window (d = ) Goal: Understand fundamental limits and avoid strong assumptions: Uniform null Fixed-cells asymptotics June 11, / 32

5 Hypothesis testing beyond classical regimes Example: Fit a density, test goodness-of-fit June 11, / 32

6 Hypothesis testing beyond classical regimes Example: Fit a density, test goodness-of-fit Goal: Understand fundamental limits and avoid strong assumptions: Uniform null Bounded domain Unnecessary, strong smoothness assumptions June 11, / 32

7 The basic setup Goodness-of-fit testing Observe samples Z 1,..., Z n P, for some fixed P 0 want to test: H 0 : P = P 0 H 1 : TV(P, P 0 ) ɛ TV(P, Q) = sup P (A) Q(A) A June 11, / 32

8 The basic setup Goodness-of-fit testing Observe samples Z 1,..., Z n P, for some fixed P 0 want to test: H 0 : P = P 0 H 1 : TV(P, P 0 ) ɛ TV(P, Q) = sup P (A) Q(A) A TV natural metric on distributions, invariant to scale June 11, / 32

9 The basic setup Goodness-of-fit testing Observe samples Z 1,..., Z n P, for some fixed P 0 want to test: H 0 : P = P 0 H 1 : TV(P, P 0 ) ɛ TV(P, Q) = sup P (A) Q(A) A TV natural metric on distributions, invariant to scale From Le Cam (refined by Barron): no consistent tests without further structural assumptions June 11, / 32

10 Structural Assumptions Multinomials Distributions under null and alternate are multinomials on d categories. M = { p : p R d, d i=1 } p i = 1, p i 0 i {1,..., d}. June 11, / 32

11 Structural Assumptions Multinomials Distributions under null and alternate are multinomials on d categories. M = { p : p R d, d i=1 } p i = 1, p i 0 i {1,..., d}. Minimally smooth densities Densities under null and alternate are L-Lipschitz. { L = p : p(x)dx = 1, p(x) 0 x, p(x) p(y) L x y 2 x, y R d}. X June 11, / 32

12 Structural Assumptions Multinomials Distributions under null and alternate are multinomials on d categories. M = { p : p R d, Questionable assumptions: d i=1 Uniform null, d fixed, or n d Minimally smooth densities } p i = 1, p i 0 i {1,..., d}. Densities under null and alternate are L-Lipschitz. { L = p : p(x)dx = 1, p(x) 0 x, p(x) p(y) L x y 2 x, y R d}. X Questionable assumptions: Uniform null, bounded domain, L-fixed June 11, / 32

13 Risk A test φ : X n {0, 1} is level α if, P n 0 (φ = 1) α for all P 0 C. June 11, / 32

14 Risk A test φ : X n {0, 1} is level α if, Φ n = all level α tests. P n 0 (φ = 1) α for all P 0 C. June 11, / 32

15 Risk A test φ : X n {0, 1} is level α if, P n 0 (φ = 1) α for all P 0 C. Φ n = all level α tests. The risk of a level-α test is its maximum type-ii error: R n (φ, P 0, ɛ, C) = sup P n (φ = 0). P :TV(P,P 0) ɛ,p C June 11, / 32

16 Risk A test φ : X n {0, 1} is level α if, P n 0 (φ = 1) α for all P 0 C. Φ n = all level α tests. The risk of a level-α test is its maximum type-ii error: R n (φ, P 0, ɛ, C) = sup P n (φ = 0). P :TV(P,P 0) ɛ,p C Local minimix rate { ɛ n (P 0 ) = inf ɛ : inf R n (φ, P 0, ɛ, C) 1/2 φ Φ n } June 11, / 32

17 Risk A test φ : X n {0, 1} is level α if, P n 0 (φ = 1) α for all P 0 C. Φ n = all level α tests. The risk of a level-α test is its maximum type-ii error: R n (φ, P 0, ɛ, C) = sup P n (φ = 0). P :TV(P,P 0) ɛ,p C Local minimix rate { ɛ n (P 0 ) = inf ɛ : inf R n (φ, P 0, ɛ, C) 1/2 φ Φ n } Global minimax rate ɛ n = inf { ɛ : sup P 0 inf R n (φ, P 0, ɛ, C) 1/2 φ Φ n } June 11, / 32

18 Minimax Sample Complexity If you prefer sample complexity (CS literature): Global Minimax Sample Complexity The global minimax sample complexity: n(ɛ, C) = sup P 0 C inf n(φ, ɛ, P 0, C). φ June 11, / 32

19 Minimax Sample Complexity If you prefer sample complexity (CS literature): Global Minimax Sample Complexity The global minimax sample complexity: n(ɛ, C) = sup P 0 C inf n(φ, ɛ, P 0, C). φ Local Minimax Sample Complexity The local minimax sample complexity: n(p 0, ɛ, C) = inf n(φ, ɛ, P 0, C). φ June 11, / 32

20 Minimax Sample Complexity If you prefer sample complexity (CS literature): Global Minimax Sample Complexity The global minimax sample complexity: n(ɛ, C) = sup P 0 C inf n(φ, ɛ, P 0, C). φ Local Minimax Sample Complexity The local minimax sample complexity: n(p 0, ɛ, C) = inf n(φ, ɛ, P 0, C). φ Sufficiently homogenous problems: two rates are identical. Testing: vast variability in minimax rate. Local minimax rate provides a refined picture. June 11, / 32

21 MULTINOMIALS June 11, / 32

22 Multinomials Classical work in statistics: Morris, Feinberg, Barron and Read and Cressie already emphasize importance of moving beyond fixed-cells Minimax rates for uniform null: Paninski, lots of follow-up in CS June 11, / 32

23 Multinomials Classical work in statistics: Morris, Feinberg, Barron and Read and Cressie already emphasize importance of moving beyond fixed-cells Minimax rates for uniform null: Paninski, lots of follow-up in CS Let p = (p(1),..., p(d)) with p(1) p(2) p(d). June 11, / 32

24 Multinomials Classical work in statistics: Morris, Feinberg, Barron and Read and Cressie already emphasize importance of moving beyond fixed-cells Minimax rates for uniform null: Paninski, lots of follow-up in CS Let p = (p(1),..., p(d)) with p(1) p(2) p(d). Global minimax testing rate (well-known): Faster than the estimation rate. ɛ n = d1/4 n June 11, / 32

25 Multinomials Classical work in statistics: Morris, Feinberg, Barron and Read and Cressie already emphasize importance of moving beyond fixed-cells Minimax rates for uniform null: Paninski, lots of follow-up in CS Let p = (p(1),..., p(d)) with p(1) p(2) p(d). Global minimax testing rate (well-known): ɛ n = d1/4 n Faster than the estimation rate. But what about the local minimax rate? June 11, / 32

26 Local Minimax Rate (Valiant and Valiant 2014) Tail Q σ = i : d p(j) σ. j=i June 11, / 32

27 Local Minimax Rate (Valiant and Valiant 2014) Tail Bulk Q σ = B σ = i : d p(j) σ. j=i {i : i > 1, i / Q σ }. June 11, / 32

28 Local Minimax Rate (Valiant and Valiant 2014) Tail Bulk Q σ = B σ = i : d p(j) σ. j=i {i : i > 1, i / Q σ }. V-Functional V σ (p 0 ) = j B σ p 2/3 0 (j) 3/2 June 11, / 32

29 Local Minimax Rate Valiant and Valiant (2014) showed that l n ɛ n (p 0 ) u n where l n and u n solve: Vln (p 0 ) Vun/16(p 0 ) l n =, u n =. n n June 11, / 32

30 Local Minimax Rate Valiant and Valiant (2014) showed that where l n and u n solve: l n ɛ n (p 0 ) u n Roughly: Vln (p 0 ) Vun/16(p 0 ) l n =, u n =. n n Vɛn (p 0 ) ɛ n = n June 11, / 32

31 Local Minimax Rate Valiant and Valiant (2014) showed that l n ɛ n (p 0 ) u n where l n and u n solve: Vln (p 0 ) Vun/16(p 0 ) l n =, u n =. n n Roughly: Vɛn (p 0 ) ɛ n = n We can have d =. June 11, / 32

32 Local Minimax Rate Valiant and Valiant (2014) showed that l n ɛ n (p 0 ) u n where l n and u n solve: Vln (p 0 ) Vun/16(p 0 ) l n =, u n =. n n Roughly: Vɛn (p 0 ) ɛ n = n We can have d =. (sparse) 1 V σ d (uniform) June 11, / 32

33 Multinomial Examples Uniform Null: If p 0 is uniform on d categories, ɛ n (p 0 ) d1/4 n June 11, / 32

34 Multinomial Examples Uniform Null: If p 0 is uniform on d categories, ɛ n (p 0 ) d1/4 n Also the worst-case (global minimax) sample complexity. In contrast to estimation allows for n d. June 11, / 32

35 Multinomial Examples Uniform Null: If p 0 is uniform on d categories, ɛ n (p 0 ) d1/4 n Also the worst-case (global minimax) sample complexity. In contrast to estimation allows for n d. Sparse Null: If p 0 mostly concentrates on s categories: ɛ n (p 0 ) s1/4 n June 11, / 32

36 Multinomial Examples Uniform Null: If p 0 is uniform on d categories, ɛ n (p 0 ) d1/4 n Also the worst-case (global minimax) sample complexity. In contrast to estimation allows for n d. Sparse Null: If p 0 mostly concentrates on s categories: Infinite Multinomials: with tail decay e.g: Power law multinomials Poisson distributions ɛ n (p 0 ) s1/4 n Truncated 2/3-norm finite for infinite multinomials June 11, / 32

37 The VV Test - Upper Bound The local-minimax optimal test is a two-stage test: A tail test: Tests the total mass in the ɛ-tail of the multinomial. June 11, / 32

38 The VV Test - Upper Bound The local-minimax optimal test is a two-stage test: A tail test: Tests the total mass in the ɛ-tail of the multinomial. A bulk modified-χ 2 test: Let X i denote the count of the i-th category. Use the test statistic: T = (X i np 0 (i))2 X i. p i B 0 (i) 2/3 ɛ/8 Two modifications to the usual χ 2 -test statistic Analysis proceeds by studying mean and variance of the test statistic under the null and alternate Involves several difficult inequalities to deal with the 2/3-norm Some deficiencies: need to specify ɛ poorly understood limiting distribution of test statistic June 11, / 32

39 Two simulations n = 200, d = Uniform Null, Sparse Alternate 1 Power Law Null, Sparse Alternate Power Power Chi-sq. LRT 2/3rd and tail Chi-sq. 0.1 LRT 2/3rd and tail l1 Distance l1 Distance June 11, / 32

40 Why do classical tests fail? The most classical goodness-of-fit test is the χ 2 test: T = d i=1 (X i np 0 (i)) 2 np 0 (i). p 0 (i) Small entries of p 0 can dominate the variance. Classical p 0 -fixed asymptotics mask this phenomenon. Related issues plague likelihood-ratio, l 1, l 2 and other test statistics. Classical test statistics are not even globally minimax optimal. June 11, / 32

41 Why do classical tests fail? The most classical goodness-of-fit test is the χ 2 test: T = d i=1 (X i np 0 (i)) 2 np 0 (i). p 0 (i) Small entries of p 0 can dominate the variance. Classical p 0 -fixed asymptotics mask this phenomenon. Related issues plague likelihood-ratio, l 1, l 2 and other test statistics. Classical test statistics are not even globally minimax optimal. Can we directly address deficiencies of the χ 2 -statistic? June 11, / 32

42 A simple, global minimax test Use instead a truncated test statistic: T trunc = d i=1 (X i np 0 (i)) 2 X i. max{p 0 (i), 1/d} If any entry is too small, clip the denominator to limit the contribution to the variance. June 11, / 32

43 A simple, global minimax test Use instead a truncated test statistic: T trunc = d i=1 (X i np 0 (i)) 2 X i. max{p 0 (i), 1/d} If any entry is too small, clip the denominator to limit the contribution to the variance. Theorem (BW17) The test based on T trunc is globally minimax. Test is simple, minimax-optimal, analysis is straightforward Single-stage, no knowledge of ɛ necessary June 11, / 32

44 Simulations re-visited n = 200, d = Uniform Null, Sparse Alternate 1 Power Law Null, Sparse Alternate Power Power Chi-sq. Trunc Var LRT 2/3rd and tail Chi-sq. Trunc Var 0.1 LRT 2/3rd and tail l1 Distance l1 Distance June 11, / 32

45 A new (near)-locally minimax test Inspired by a closely related test in a paper by Diakonikolas and Kane. Basic Insight: Careful modifications to χ 2 are crucial away from uniform. At uniform almost all test statistics are near optimal. June 11, / 32

46 A new (near)-locally minimax test Inspired by a closely related test in a paper by Diakonikolas and Kane. Basic Insight: Careful modifications to χ 2 are crucial away from uniform. At uniform almost all test statistics are near optimal. Slice the multinomial into almost uniform pieces, use Bonferroni. Partition the entries in B ɛ/8 into sets S j for j 1, where S j = { t : p 0 (2) 2 j < p 0 (t) p } 0(2) 2 j 1. T j = t S j [(X t np 0 (t)) 2 X t ] The max test (for Bonferroni adjusted thresholds t j ) is φ max = j I(T j > t j ). June 11, / 32

47 A new (near)-locally minimax test Theorem (BW17) The max test is locally minimax (up to logarithmic factors). Max test is (near)-local minimax optimal Less practical than the modified χ 2 test Analysis is completely transparent and does not require difficult inequalities June 11, / 32

48 A new (near)-locally minimax test Theorem (BW17) The max test is locally minimax (up to logarithmic factors). Max test is (near)-local minimax optimal Less practical than the modified χ 2 test Analysis is completely transparent and does not require difficult inequalities Summary: Testing high-dimensional multinomials, interesting local phenomena Modifications of the χ 2 -test are globally minimax and locally minimax June 11, / 32

49 DENSITIES June 11, / 32

50 Density Testing Recall: Lipschitz Densities Densities under null and alternate are L-Lipschitz. { L = p : p(x)dx = 1, p(x) 0 x, p(x) p(y) L x y 2 x, y R d}. X Focus initially on d = 1 case. June 11, / 32

51 Density Testing Recall: Lipschitz Densities Densities under null and alternate are L-Lipschitz. { L = p : p(x)dx = 1, p(x) 0 x, p(x) p(y) L x y 2 x, y R d}. X Focus initially on d = 1 case. Theorem (Ingster 1984,2000) Suppose L is fixed, and the domain X = [0, 1], p 0 uniform. Then the global minimax rate scales as: ( ) 2/5 1 ɛ n n June 11, / 32

52 Density Testing Recall: Lipschitz Densities Densities under null and alternate are L-Lipschitz. { L = p : p(x)dx = 1, p(x) 0 x, p(x) p(y) L x y 2 x, y R d}. X Focus initially on d = 1 case. Theorem (Ingster 1984,2000) Suppose L is fixed, and the domain X = [0, 1], p 0 uniform. Then the global minimax rate scales as: ( ) 2/5 1 ɛ n n d-dimensional extension in Arias-Castro et al. (2016) Only considers uniform null, suggests quantile transformation Strong assumptions of fixed L, bounded domain - analogous to the fixed-cells assumption in multinomials June 11, / 32

53 Local Minimax Rate for Testing Lipschitz Densities For a density p 0 : Its bulk B ɛ is the set of smallest Lebesgue measure that contains 1 ɛ probability content. Define the truncated 1/2-norm: ( ) 2 T (p 0 ) = p0 (x)dx. B ɛ June 11, / 32

54 Local Minimax Rate for Testing Lipschitz Densities For a density p 0 : Its bulk B ɛ is the set of smallest Lebesgue measure that contains 1 ɛ probability content. Define the truncated 1/2-norm: ( ) 2 T (p 0 ) = p0 (x)dx. B ɛ Theorem For the Lipschitz class L, the local minimax rate is: ( ) 1/5 Ln T (p 0 ) ɛ n (p 0 ). Tight characterization of the local minimax rate, up to constants. No unnecessary assumptions, L n is not treated as fixed, and the domain is not assumed to be bounded n 2 June 11, / 32

55 Examples Uniform Null: If the null p 0 is uniform on [0, B]: ( LB 2 ɛ n n 2 ) 1/5 June 11, / 32

56 Examples Uniform Null: If the null p 0 is uniform on [0, B]: ( LB 2 ɛ n n 2 ) 1/5 Spiky Null: The sparsest Lipschitz density is: ( ) p 0 (x) = max L(1 x ), 0. The minimax rate is completely independent of L and the domain, ɛ n 1 n 2/5 June 11, / 32

57 Examples contd. Can derive rates for other natural testing problems: Gaussian Null: If the null p 0 is Gaussian N(µ, σ 2 ) then the minimax rate for testing is: ( ) Lσ 2 1/5 ɛ n. n 2 June 11, / 32

58 Examples contd. Can derive rates for other natural testing problems: Gaussian Null: If the null p 0 is Gaussian N(µ, σ 2 ) then the minimax rate for testing is: ( ) Lσ 2 1/5 ɛ n. n 2 Cauchy Null: rate is: Let γ denote the shape parameter of p 0. The minimax ( L log 4 (1/ɛ) ɛ n n 2 )1/5. June 11, / 32

59 Examples contd. Can derive rates for other natural testing problems: Gaussian Null: If the null p 0 is Gaussian N(µ, σ 2 ) then the minimax rate for testing is: ( ) Lσ 2 1/5 ɛ n. n 2 Cauchy Null: rate is: Let γ denote the shape parameter of p 0. The minimax ( L log 4 (1/ɛ) ɛ n n 2 )1/5. Pareto Null: p 0 (x) x α 1, for 0 < α < 1. The minimax rate is: ( ) α L 3α+2 ɛ n. n 2 The dependence of the minimax rate as a function of ɛ is non-standard, and degrades rapidly as α 0. June 11, / 32

60 High-Level Proof Ideas Upper Bound: Classical method of goodness-of-fit testing: bin and test the corresponding multinomial using (locally minimax) multinomial test. Key technical challenge: significant flexibility in how to bin p0 June 11, / 32

61 High-Level Proof Ideas Upper Bound: Classical method of goodness-of-fit testing: bin and test the corresponding multinomial using (locally minimax) multinomial test. Key technical challenge: significant flexibility in how to bin p0 Idea 1: Use fixed bin-widths. Choose the largest bin-width that adequately controls the approximation error, i.e. keeps apart p 0 from the alternate densities. Used by Ingster, achieves global minimax rate when L is fixed, and domain is bounded. Inadequate to obtain tight local minimax rate: intuitively the number and size of the bins should be adapted to the density. June 11, / 32

62 High-Level Proof Ideas Upper Bound: Classical method of goodness-of-fit testing: bin and test the corresponding multinomial using (locally minimax) multinomial test. Key technical challenge: significant flexibility in how to bin p0 Idea 1: Use fixed bin-widths. Choose the largest bin-width that adequately controls the approximation error, i.e. keeps apart p 0 from the alternate densities. Used by Ingster, achieves global minimax rate when L is fixed, and domain is bounded. Inadequate to obtain tight local minimax rate: intuitively the number and size of the bins should be adapted to the density. Idea 2: Use adaptive bin-widths: h(x) p 0 (x), where the constants are chosen to control the approximation error. Adaptive bin-widths allow us to optimally re-distribute the approximation error. June 11, / 32

63 High-Level Proof Ideas Lower Bound: Classical method: Create many small perturbations of the null Consider distinguishing p0 from a uniform mixture over these perturbations Analyze the (optimal) Likelihood Ratio Test June 11, / 32

64 High-Level Proof Ideas Lower Bound: Classical method: Create many small perturbations of the null Consider distinguishing p0 from a uniform mixture over these perturbations Analyze the (optimal) Likelihood Ratio Test June 11, / 32

65 High-Level Proof Ideas Technical Challenges: When p 0 is far from uniform: Need to perturb some parts of p0 much more than other parts Smoothness constrains the allowed perturbations significantly June 11, / 32

66 High-Level Proof Ideas Technical Challenges: When p 0 is far from uniform: Need to perturb some parts of p0 much more than other parts Smoothness constrains the allowed perturbations significantly Key Idea: Again use adaptive bin-widths: When the bin-width is large, a larger perturbation is possible without violating smoothness Same adaptive bin-widths as in the upper bound result in optimal (and matching) lower bound June 11, / 32

67 Extending to Higher-Dimensions Define, and the truncated γ-norm: γ = d, ( ) 1/γ T ɛ (p 0 ) = p 0 (x) γ. B ɛ June 11, / 32

68 Extending to Higher-Dimensions Define, and the truncated γ-norm: γ = d, ( ) 1/γ T ɛ (p 0 ) = p 0 (x) γ. B ɛ Theorem (BW17) The local minimax rate is given as: ( ) 1 LT 2 4+d ɛ n = ɛ (p 0 ). Again, obtain significant variability in the minimax rate as a function of p 0. n 2 June 11, / 32

69 High-Level Proof Ideas Upper and lower bounds are based again on an adaptive partition. Roughly, want to partition the support of p 0 into hyper-cubes of different volume, where the volume of each hyper-cube: V (x) p 0 (x) dγ. Unlike in the 1D case, not obvious if such a partition exists, and how to construct it. June 11, / 32

70 High-Level Proof Ideas Upper and lower bounds are based again on an adaptive partition. Roughly, want to partition the support of p 0 into hyper-cubes of different volume, where the volume of each hyper-cube: V (x) p 0 (x) dγ. Unlike in the 1D case, not obvious if such a partition exists, and how to construct it. We provide a proof of existence and a recursive splitting algorithm that constructs the desired partition The existence proof utilizes smoothness in an elegant way: intuitively since p 0 is smooth, the desired volumes inherit this smoothness, and a partition satisfying these volume requirements might exist June 11, / 32

71 High-Level Proof Ideas The recursive partitioning algorithm: June 11, / 32

72 Simulation June 11, / 32

73 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge June 11, / 32

74 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge Typical assumptions, bounded domain, uniform null, fixed smoothness constant can mask these phenomena June 11, / 32

75 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge Typical assumptions, bounded domain, uniform null, fixed smoothness constant can mask these phenomena Provide tight local minimax upper and lower bounds June 11, / 32

76 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge Typical assumptions, bounded domain, uniform null, fixed smoothness constant can mask these phenomena Provide tight local minimax upper and lower bounds Paper also provides extensions which adapt to unknown problem specific parameters (smoothness parameters and ɛ) June 11, / 32

77 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge Typical assumptions, bounded domain, uniform null, fixed smoothness constant can mask these phenomena Provide tight local minimax upper and lower bounds Paper also provides extensions which adapt to unknown problem specific parameters (smoothness parameters and ɛ) Need to use careful, adaptive binning procedures June 11, / 32

78 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge Typical assumptions, bounded domain, uniform null, fixed smoothness constant can mask these phenomena Provide tight local minimax upper and lower bounds Paper also provides extensions which adapt to unknown problem specific parameters (smoothness parameters and ɛ) Need to use careful, adaptive binning procedures We are currently investigating many extensions: composite null, more smoothness, two-sample, etc. June 11, / 32

79 Summary For testing Lipschitz densities, interesting local minimax phenomena emerge Typical assumptions, bounded domain, uniform null, fixed smoothness constant can mask these phenomena Provide tight local minimax upper and lower bounds Paper also provides extensions which adapt to unknown problem specific parameters (smoothness parameters and ɛ) Need to use careful, adaptive binning procedures We are currently investigating many extensions: composite null, more smoothness, two-sample, etc. THE END June 11, / 32

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates Sivaraman Balakrishnan Larry Wasserman Department of Statistics Carnegie Mellon University Pittsburgh, PA 523

More information

arxiv: v1 [stat.ml] 17 Dec 2017

arxiv: v1 [stat.ml] 17 Dec 2017 Hypothesis Testing for High-Dimensional Multinomials: A Selective Review Sivaraman Balakrishnan Larry Wasserman Department of Statistics Carnegie Mellon University, Pittsburgh, PA 523. arxiv:72.62v [stat.ml]

More information

HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL MULTINOMIALS: A SELECTIVE REVIEW

HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL MULTINOMIALS: A SELECTIVE REVIEW Submitted to the Annals of Applied Statistics arxiv: arxiv:. HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL MULTINOMIALS: A SELECTIVE REVIEW By Sivaraman Balakrishnan and Larry Wasserman Carnegie Mellon University

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Minimax Rates. Homology Inference

Minimax Rates. Homology Inference Minimax Rates for Homology Inference Don Sheehy Joint work with Sivaraman Balakrishan, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman Something like a joke. Something like a joke. What is topological

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Exam C Solutions Spring 2005

Exam C Solutions Spring 2005 Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Probabilistic Methods in Asymptotic Geometric Analysis.

Probabilistic Methods in Asymptotic Geometric Analysis. Probabilistic Methods in Asymptotic Geometric Analysis. C. Hugo Jiménez PUC-RIO, Brazil September 21st, 2016. Colmea. RJ 1 Origin 2 Normed Spaces 3 Distribution of volume of high dimensional convex bodies

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

2.3 Analysis of Categorical Data

2.3 Analysis of Categorical Data 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects School of Industrial and Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive NW, Atlanta, GA 30332-0205,

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Robust Inference. A central concern in robust statistics is how a functional of a CDF behaves as the distribution is perturbed.

Robust Inference. A central concern in robust statistics is how a functional of a CDF behaves as the distribution is perturbed. Robust Inference Although the statistical functions we have considered have intuitive interpretations, the question remains as to what are the most useful distributional measures by which to describe a

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Are Declustered Earthquake Catalogs Poisson?

Are Declustered Earthquake Catalogs Poisson? Are Declustered Earthquake Catalogs Poisson? Philip B. Stark Department of Statistics, UC Berkeley Brad Luen Department of Mathematics, Reed College 14 October 2010 Department of Statistics, Penn State

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

19.1 Maximum Likelihood estimator and risk upper bound

19.1 Maximum Likelihood estimator and risk upper bound ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture

More information

Variable Exponents Spaces and Their Applications to Fluid Dynamics

Variable Exponents Spaces and Their Applications to Fluid Dynamics Variable Exponents Spaces and Their Applications to Fluid Dynamics Martin Rapp TU Darmstadt November 7, 213 Martin Rapp (TU Darmstadt) Variable Exponent Spaces November 7, 213 1 / 14 Overview 1 Variable

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty

Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty Takafumi Kanamori and Akiko Takeda Abstract. Uncertain programs have been developed to deal with optimization problems

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Lower Bound Techniques for Statistical Estimation. Gregory Valiant and Paul Valiant

Lower Bound Techniques for Statistical Estimation. Gregory Valiant and Paul Valiant Lower Bound Techniques for Statistical Estimation Gregory Valiant and Paul Valiant The Setting Given independent samples from a distribution (of discrete support): D Estimate # species Estimate entropy

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Lecture I: Asymptotics for large GUE random matrices

Lecture I: Asymptotics for large GUE random matrices Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random

More information

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches Arthur Gretton 1 and László Györfi 2 1. Gatsby Computational Neuroscience Unit London, UK 2. Budapest University of Technology

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Exponential tail inequalities for eigenvalues of random matrices

Exponential tail inequalities for eigenvalues of random matrices Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify

More information

Hypothesis Testing One Sample Tests

Hypothesis Testing One Sample Tests STATISTICS Lecture no. 13 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 12. 1. 2010 Tests on Mean of a Normal distribution Tests on Variance of a Normal

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts Statistical methods for comparing multiple groups Lecture 7: ANOVA Sandy Eckel seckel@jhsph.edu 30 April 2008 Continuous data: comparing multiple means Analysis of variance Binary data: comparing multiple

More information

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A). Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Larry Goldstein, University of Southern California Nourdin GIoVAnNi Peccati Luxembourg University University British

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

Introduction to Statistics and Error Analysis II

Introduction to Statistics and Error Analysis II Introduction to Statistics and Error Analysis II Physics116C, 4/14/06 D. Pellett References: Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson Particle Data Group notes

More information

The tail does not determine the size of the giant

The tail does not determine the size of the giant The tail does not determine the size of the giant arxiv:1710.01208v2 [math.pr] 20 Jun 2018 Maria Deijfen Sebastian Rosengren Pieter Trapman June 2018 Abstract The size of the giant component in the configuration

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,...

More information

18.175: Lecture 17 Poisson random variables

18.175: Lecture 17 Poisson random variables 18.175: Lecture 17 Poisson random variables Scott Sheffield MIT 1 Outline More on random walks and local CLT Poisson random variable convergence Extend CLT idea to stable random variables 2 Outline More

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Statistics of Radioactive Decay

Statistics of Radioactive Decay Statistics of Radioactive Decay Introduction The purpose of this experiment is to analyze a set of data that contains natural variability from sample to sample, but for which the probability distribution

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

RATE ANALYSIS FOR DETECTION OF SPARSE MIXTURES

RATE ANALYSIS FOR DETECTION OF SPARSE MIXTURES RATE ANALYSIS FOR DETECTION OF SPARSE MIXTURES Jonathan G. Ligo, George V. Moustakides and Venugopal V. Veeravalli ECE and CSL, University of Illinois at Urbana-Champaign, Urbana, IL 61801 University of

More information