Can we do statistical inference in a non-asymptotic way? 1

Size: px
Start display at page:

Download "Can we do statistical inference in a non-asymptotic way? 1"

Transcription

1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue ONR Review Meeting@Duke Oct 11, Acknowledge NSF, ONR and Simons Foundation. 2 An exploratory and ongoing work with Z. Shang and Y. Yang. 2 / 30

2 Starting from Stat 101 iid Given that X 1,.., X n N(θ 0, 1), we have ( [ P θ 0 X ± 1.96 ]) = 95%; n 3 / 30

3 Starting from Stat 101 iid Given that X 1,.., X n N(θ 0, 1), we have ( [ P θ 0 X ± 1.96 ]) = 95%; n Without knowing the distribution of X i 's, we obtain via CLT ( [ P θ 0 X ± 1.96 ]) 95%. n But the price is the asymptotic validity; 3 / 30

4 Starting from Stat 101 iid Given that X 1,.., X n N(θ 0, 1), we have ( [ P θ 0 X ± 1.96 ]) = 95%; n Without knowing the distribution of X i 's, we obtain via CLT ( [ P θ 0 X ± 1.96 ]) 95%. n But the price is the asymptotic validity; What if the distribution of X is unknown and n is nite? 3 / 30

5 A rst thought: concentration inequality Concentration inequalities quantify how a random variable X deviates around its mean or median µ. They usually take the form of two-sided bounds for the tails of X µ, such as P ( X µ > t) something small. The simplest concentration inequality is Chebyshev inequality; 4 / 30

6 A rst thought: concentration inequality Concentration inequalities quantify how a random variable X deviates around its mean or median µ. They usually take the form of two-sided bounds for the tails of X µ, such as P ( X µ > t) something small. The simplest concentration inequality is Chebyshev inequality; Concentration inequality holds for any sample size n. 4 / 30

7 Does concentration inequality idea really work? For example, Hoeding inequality gives ( [ P θ 0 X ± 1.27 ]) c X ψ2 95% n for iid X i 's with sub-gaussian tails 3. X ψ2 is the so-called sub-gaussian norm, e.g., X ψ2 = 1/ log 2 for Rademacher X; 3 Relaxable to sub-exponential tails via Bernstein inequality. 5 / 30

8 Does concentration inequality idea really work? For example, Hoeding inequality gives ( [ P θ 0 X ± 1.27 ]) c X ψ2 95% n for iid X i 's with sub-gaussian tails 3. X ψ2 is the so-called sub-gaussian norm, e.g., X ψ2 = 1/ log 2 for Rademacher X; For illustration, let us examine one special cases, i.e., Bernoulli, in the class of sub-gaussian random variables; 3 Relaxable to sub-exponential tails via Bernstein inequality. 5 / 30

9 Does concentration inequality idea really work? For example, Hoeding inequality gives ( [ P θ 0 X ± 1.27 ]) c X ψ2 95% n for iid X i 's with sub-gaussian tails 3. X ψ2 is the so-called sub-gaussian norm, e.g., X ψ2 = 1/ log 2 for Rademacher X; For illustration, let us examine one special cases, i.e., Bernoulli, in the class of sub-gaussian random variables; When X i 's iid Ber(θ 0 ), we have for any sample size n P ( θ 0 [ X ± 1.36/ n] ) 95%, which is sharp in the rate but not the constant in comparison with the asymptotic CI, i.e., [ X ± 0.98/ n]. Set θ 0 = Relaxable to sub-exponential tails via Bernstein inequality. 5 / 30

10 Empirical comparison for Bernoulli Left plot compares width of condence intervals of two methods; right plot demonstrate their coverage probability versus (small) sample size. Simulation were repeated 1000 times. Two competing eects: Conservativness vs Asymptotics 6 / 30

11 How far can we go beyond these simple cases? There are some recent results on more complicated parametric models obtained via various notion of concentration inequality such as Chernozhukov et al (2013) and Strawn et al (2014). 7 / 30

12 How far can we go beyond these simple cases? There are some recent results on more complicated parametric models obtained via various notion of concentration inequality such as Chernozhukov et al (2013) and Strawn et al (2014). Chernozhukov et al (2013) presented non-asymptotic results on bootstrap validity for high dimensional problems such as (1/ n) n i=1 X i for X i R p with p n; 7 / 30

13 How far can we go beyond these simple cases? There are some recent results on more complicated parametric models obtained via various notion of concentration inequality such as Chernozhukov et al (2013) and Strawn et al (2014). Chernozhukov et al (2013) presented non-asymptotic results on bootstrap validity for high dimensional problems such as (1/ n) n i=1 X i for X i R p with p n; Strawn et al (2014) obtained non-asymptotic bounds on the expected concentration of a posterior (around the true parameter) in high dimensional Bayesian linear models; 7 / 30

14 How far can we go beyond these simple cases? There are some recent results on more complicated parametric models obtained via various notion of concentration inequality such as Chernozhukov et al (2013) and Strawn et al (2014). Chernozhukov et al (2013) presented non-asymptotic results on bootstrap validity for high dimensional problems such as (1/ n) n i=1 X i for X i R p with p n; Strawn et al (2014) obtained non-asymptotic bounds on the expected concentration of a posterior (around the true parameter) in high dimensional Bayesian linear models; As far as we are aware, these bounds were mostly derived to justify (asymptotic) statistical estimation/inference in a non-asymptotic manner, rather than used to construct nite-sample inference. 7 / 30

15 Outline 1 Smoothing spline models 2 A non-asymptotic Bahadur representation 3 Statistical application: hypothesis testing 4 Simulations 8 / 30

16 Smoothing spline models Consider a nonparametric regression model y i = f(x i ) + ɛ i. For simplicity, we assume that x I := [0, 1], and that x and ɛ are independent with Eɛ = 0 and Var(ɛ) = 1; 9 / 30

17 Smoothing spline models Consider a nonparametric regression model y i = f(x i ) + ɛ i. For simplicity, we assume that x I := [0, 1], and that x and ɛ are independent with Eɛ = 0 and Var(ɛ) = 1; The standard smoothing spline estimate (or more generally, kernel ridge estimate) is obtained as f n := arg min f H 1 n n i=1 reproducing kernel Hilbert space (RKHS) (y i f(x i )) 2 + λ f 2 H }{{} roughness penalty 9 / 30

18 A non-asymptotic Bahadur representation 10 / 30

19 Starting from a naive concentration inequality We rst investigate the concentration property of T n := f n f 0. Lemma 1 Under proper non-asymptotic conditions on λ > 0 and M > 0, it holds that sup P f0 (T n d n (M, λ)) 2 exp( M), (1) f 0 H m (1) where d n (M, λ) = 2λ 1/m + c K ( 2M + 1)(nλ 1 2m ) 1/2. Note that Lemma 1 holds uniformly over an unit ball for any sample size. H m (1) := {f S m (I) : f H 1} 11 / 30

20 A closer examination of the naive concentration We notice that E{ f n } f 0 P λ f 0 f 0! The term P λ f 0 is the estimation bias caused by penalization; 12 / 30

21 A closer examination of the naive concentration We notice that E{ f n } f 0 P λ f 0 f 0! The term P λ f 0 is the estimation bias caused by penalization; If using T n = f n f 0 to test H 0 : f = f 0, we prove that its minimal separation rate is n m 2m+1, which is sub-optimal in comparison with the minimax rate of testing n 2m 4m+1 ; 12 / 30

22 A closer examination of the naive concentration We notice that E{ f n } f 0 P λ f 0 f 0! The term P λ f 0 is the estimation bias caused by penalization; If using T n = f n f 0 to test H 0 : f = f 0, we prove that its minimal separation rate is n m 2m+1, which is sub-optimal in comparison with the minimax rate of testing n 2m 4m+1 ; To make inference, we need to develop a second order expansion, which we call as non-asymptotic Bahadur representation. 12 / 30

23 Review on (asymptotic) Bahadur representation Bahadur representations (Bahadur, 1966) are often useful to study the asymptotic properties of statistical estimators; 13 / 30

24 Review on (asymptotic) Bahadur representation Bahadur representations (Bahadur, 1966) are often useful to study the asymptotic properties of statistical estimators; For example, He and Shao (1996) considered a general M-estimation framework: (1/n) n ψ(x i, θ n ) = o(δ n ) i=1 for some δ n 0, and obtained that as n θ n θ 0 n i=1 for some R n o(n 1/2 ). Dn 1 ψ(x i, θ 0 ) = O(R n ) almost surely. 13 / 30

25 Non-asymptotic Bahadur representation Inspired by Shang and Cheng (2013), we dene T n := f n (I P λ )f 0 1 n ɛ i K Xi n and study its concentration property. Theorem 2 i=1 Under similar conditions as Lemma 1, it holds that ( sup P f0 Tn d ) n (M, λ) 2 exp( M), f 0 H m (1) where d n (M, λ) = c 2 K Mn 1/2 λ 1/(2m) A(λ)d n (M, λ). 14 / 30

26 Statistical application: hypothesis testing 15 / 30

27 Nonparametric hypothesis of interest Consider the following hypothesis testing problem: e.g., f 0 = 0. H 0 : f = f 0, vs H 1 : f f 0, 16 / 30

28 Test statistic Clearly, T n = f n (I P λ )f 0 n 1 n i=1 ɛ ik Xi cannot be used as a test statistic since ɛ i 's are not observable; 17 / 30

29 Test statistic Clearly, T n = f n (I P λ )f 0 n 1 n i=1 ɛ ik Xi cannot be used as a test statistic since ɛ i 's are not observable; Rather, we propose to use T n as test statistic: T n = f n n (I P λ )f 0 2 E n 1 ɛ i K Xi 2, = f n (I P λ )f n ν 1 i= λ/ρ ν ; 17 / 30

30 Test statistic Clearly, T n = f n (I P λ )f 0 n 1 n i=1 ɛ ik Xi cannot be used as a test statistic since ɛ i 's are not observable; Rather, we propose to use T n as test statistic: T n = f n n (I P λ )f 0 2 E n 1 ɛ i K Xi 2, = f n (I P λ )f n ν 1 i= λ/ρ ν ; In practice, replace (I P λ )f 0 by a noiseless version of f n : f NL n = arg min f S m (I) 1 n n (f 0 (X i ) f(x i )) 2 + λ f S m (I). i=1 17 / 30

31 Exact Type I and II errors control Based on the non-asymptotic Bahadur representation, we can control on Type I and II errors for any nite sample size as follows. Theorem 3 Under proper non-asymptotic conditions on λ, α and β, we have Type I error : P f0 ( T n d n (log(15/α), λ)) α, Type II error : sup P f ( T n d n (log(15/α), λ)) β, f f 0 Hα,β m (1) for any α, β (0, 1). The cut-o value d n (M, λ) 4 Mρ K nλ 1/(4m). The separation set Hα,β m (1) := {g Hm (1) : g ρ n (log(15/α), log(30/β), λ)}, where ρ n (M, L, λ) ζ K λ + 4ρ K M/(nh 1/(4m) ). 18 / 30

32 Asymptotic minimax optimality An important implication of Theorem 3 is that T n,λ is asymptotically minimax optimal; 19 / 30

33 Asymptotic minimax optimality An important implication of Theorem 3 is that T n,λ is asymptotically minimax optimal; To see that, we minimize the separation function ρ n (log(15/α), log(30/β), λ) over λ, and obtain the minimal separation rate n 2m/(4m+1) when λ is chosen as λ := ( (4ρK ) ) 2 2m/(4m+1) log(15/α) n 4m/(4m+1) ζ K n 4m/(4m+1). 19 / 30

34 Asymptotic minimax optimality An important implication of Theorem 3 is that T n,λ is asymptotically minimax optimal; To see that, we minimize the separation function ρ n (log(15/α), log(30/β), λ) over λ, and obtain the minimal separation rate n 2m/(4m+1) when λ is chosen as λ := ( (4ρK ) ) 2 2m/(4m+1) log(15/α) n 4m/(4m+1) ζ K n 4m/(4m+1). Following similar derivations, we prove that the test statistic based on the naive concentration, i.e., T n = f n f 0, is actually sub-optimal. So, our intuition of not using it is correct. 19 / 30

35 Extensions Composite Hypothesis, e.g., H 0 : f F 0 vs H 1 : f F 0, where F 0 = {f : f is linear on I}; 20 / 30

36 Extensions Composite Hypothesis, e.g., H 0 : f F 0 vs H 1 : f F 0, where F 0 = {f : f is linear on I}; Kernel ridge regression with dierent decaying rates of eigen-values; 20 / 30

37 Extensions Composite Hypothesis, e.g., H 0 : f F 0 vs H 1 : f F 0, where F 0 = {f : f is linear on I}; Kernel ridge regression with dierent decaying rates of eigen-values; Non-Gaussian regression with a smooth log-concave density such as logistic regression. 20 / 30

38 Simulations 21 / 30

39 Simulations Consider the following hypothesis: where f 0 = 5(x 2 x ). H 0 : f = f 0 vs H 1 : f f 0, 22 / 30

40 Simulations Consider the following hypothesis: H 0 : f = f 0 vs H 1 : f f 0, where f 0 = 5(x 2 x ). Data were generated as follows Y i = f c (X i ) + ɛ i, and f c (x) = 1 2 c x2 + f 0 (x); 22 / 30

41 Simulations Consider the following hypothesis: where f 0 = 5(x 2 x ). H 0 : f = f 0 vs H 1 : f f 0, Data were generated as follows Y i = f c (X i ) + ɛ i, and f c (x) = 1 2 c x2 + f 0 (x); Set ɛ i iid N(0, 1) and X i iid Unif(0, 1); 22 / 30

42 Simulations Consider the following hypothesis: H 0 : f = f 0 vs H 1 : f f 0, where f 0 = 5(x 2 x ). Data were generated as follows Y i = f c (X i ) + ɛ i, and f c (x) = 1 2 c x2 + f 0 (x); Set ɛ i iid N(0, 1) and X i iid Unif(0, 1); Set Type I and II errors as α = β = / 30

43 Remark: cuto value Recall that the cuto value d n (log(15/α), λ) is provided in Corollary 3. However, we found that it is quite conservative due to the use of some loose concentration inequalities; 23 / 30

44 Remark: cuto value Recall that the cuto value d n (log(15/α), λ) is provided in Corollary 3. However, we found that it is quite conservative due to the use of some loose concentration inequalities; Fortunately, we can simulate T n as follows. By conditioning on X, we simulate a number of synthetic datasets {Y (k) } N k=1 from the null model Y (k) i = f 0 (X i ) + N(0, 1), T (k) n each of which yields a new test statistic. The cuto value (k) is set as the (1 α)-th sample quantile of { T n } N k=1. 23 / 30

45 Remark: the choice of λ Rather, our theoretical analysis is useful in choosing λ; 24 / 30

46 Remark: the choice of λ Rather, our theoretical analysis is useful in choosing λ; In simulations, we choose λ by numerically minimizing the separation function ρ n (log(15/α), log(30/β), λ) over λ. The resulting λ is denoted as λ F S ; 24 / 30

47 Remark: the choice of λ Rather, our theoretical analysis is useful in choosing λ; In simulations, we choose λ by numerically minimizing the separation function ρ n (log(15/α), log(30/β), λ) over λ. The resulting λ is denoted as λ F S ; This non-asymptotic approach is much computationally cheaper than the conventional generalized cross validation; 24 / 30

48 Remark: the choice of λ Rather, our theoretical analysis is useful in choosing λ; In simulations, we choose λ by numerically minimizing the separation function ρ n (log(15/α), log(30/β), λ) over λ. The resulting λ is denoted as λ F S ; This non-asymptotic approach is much computationally cheaper than the conventional generalized cross validation; Note that all constants in ρ n (M, L, λ) and d n (M, λ) such as ρ K, ζ K and c K only depend on λ and the eigenvalues ρ ν 's, which can be well approximately by the empirical eigenvalues of the reproducing kernel matrix. 24 / 30

49 Empirical comparison with asymptotically valid test For simple hypothesis, we compare four testing procedures: (S1) The proposed T n with the smoothing parameter λ F S ; (S2) Asymptotically valid penalized likelihood ratio test (Shang and Cheng, 2013), denoted as P LRT (f 0 ), with the same λ F S ; (S3) The proposed T n with λ selected by GCV, denoted as λ GCV ; (S4) PLRT statistic P LRT (f 0 ) with the same λ GCV. Note that the cut-o value for PLRT in S2 and S4 is obtained from Monte Carlo simulation and the null limit distribution given in Shang and Cheng (2013), respectively. 25 / 30

50 bias in nonparametric testing; see Remark 3.2. The third observation is that as c increases, h GCV Simulation continues decreasing results and becomes closer to h F S, but never reaches h F S. This is consistent with their different asymptotic orders (recall h F S n 1/(2m+1/2) and h GCV n 1/(2m+1) ). n c λ 1/4 F S RPF S RPP LRT λ 1/4 GCV RPGCV RP GCV (0.008) (0.009) (0.009) (0.009) (0.007) (0.008) (0.007) (0.003) (0.006) (0.007) (0.006) (0.003) (0.006) (0.006) (0.005) (0.005) (0.004) (0.005) (0.004) (0.003) Table 1 Simulation results for simple hypothesis testing. λgcv is an average value over 500 replicates (that varies as c). RPF S, RPP LRT, RPGCV and RP GCV are average rejection proportions by Tn with λf S, PLRT with λf S, Tn with λgcv and PLRT with λgcv respectively, over 500 replicates. 26 / 30

51 Empirical observations Overall, all four procedures have comparable type I errors, i.e., c = 0, for any sample size; 27 / 30

52 Empirical observations Overall, all four procedures have comparable type I errors, i.e., c = 0, for any sample size; As for power performances, we note that 27 / 30

53 Empirical observations Overall, all four procedures have comparable type I errors, i.e., c = 0, for any sample size; As for power performances, we note that (i) the test using λ F S is always more powerful than those using λ GCV. This justies the nite sample advantage of the non-asymptotic formula in selecting λ; 27 / 30

54 Empirical observations Overall, all four procedures have comparable type I errors, i.e., c = 0, for any sample size; As for power performances, we note that (i) the test using λ F S is always more powerful than those using λ GCV. This justies the nite sample advantage of the non-asymptotic formula in selecting λ; (ii) T n is always more powerful than PLRT given the same choice of λ. In other words, S1 is always the most powerful one. This supports the need of removing estimation bias in nonparametric testing; 27 / 30

55 Empirical observations Overall, all four procedures have comparable type I errors, i.e., c = 0, for any sample size; As for power performances, we note that (i) the test using λ F S is always more powerful than those using λ GCV. This justies the nite sample advantage of the non-asymptotic formula in selecting λ; (ii) T n is always more powerful than PLRT given the same choice of λ. In other words, S1 is always the most powerful one. This supports the need of removing estimation bias in nonparametric testing; The third observation is that as c increases, λ GCV continues decreasing and becomes closer to λ F S, but never reaches λ F S. This is consistent with their dierent asymptotic orders, i.e., λ F S n 2m/(2m+1/2) and λ GCV n 2m/(2m+1). 27 / 30

56 Discussions The non-asymptotic Bahadur representation seems a possible direction to pursue in nite-sample inference; 28 / 30

57 Discussions The non-asymptotic Bahadur representation seems a possible direction to pursue in nite-sample inference; The practical usefulness of nite-sample inference procedures rely on computably sharp concentration inequality. Can this be done through data re-sampling or MCMC? 28 / 30

58 Discussions The non-asymptotic Bahadur representation seems a possible direction to pursue in nite-sample inference; The practical usefulness of nite-sample inference procedures rely on computably sharp concentration inequality. Can this be done through data re-sampling or MCMC? Does it exist something like non-asymptotic Fisher information? 28 / 30

59 Discussions The non-asymptotic Bahadur representation seems a possible direction to pursue in nite-sample inference; The practical usefulness of nite-sample inference procedures rely on computably sharp concentration inequality. Can this be done through data re-sampling or MCMC? Does it exist something like non-asymptotic Fisher information? Can the non-asymptotic approach be extended to Bayesian domain, e.g., choose hyper-parameter in hierarchical priors? 28 / 30

60 Questions? Thanks to ONR for support 29 / 30

61 Some necessary notation In smoothing spline models, we set H as an m-th order Sobolev space, denoted as S m (I), and f 2 H as I {f (m) (x)} 2 dx; Dene f, g = E{f(X)g(X)} + λ I f (m) (x)g (m) (x)dx. Endowed with,, S m (I) is an RKHS; Let K(x 1, x 2 ) be the reproducing kernel function satisfying K x, f = f(x) with K x ( ) = K(x, ). The underlying eigen-system is denoted as {ρ ν, φ ν ( )} ν=1, where ρ ν ν 2m ; For later use, dene c K = λ 1 4m sup K 1/2 (x, x), x I ρ K = λ 1 2m E[K 2 (X 1, X 2 )] with X 1, X 2 iid X. 30 / 30

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Compressive Inference

Compressive Inference Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti,

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Nonparametric Heterogeneity Testing for Massive Data

Nonparametric Heterogeneity Testing for Massive Data Nonparametric Heterogeneity Testing for Massive Data Abstract A massive dataset often consists of a growing number of potentially heterogeneous sub-populations. This paper is concerned about testing various

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

ST495: Survival Analysis: Hypothesis testing and confidence intervals

ST495: Survival Analysis: Hypothesis testing and confidence intervals ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Hilbert Space Methods in Learning

Hilbert Space Methods in Learning Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

arxiv: v2 [stat.me] 24 Sep 2018

arxiv: v2 [stat.me] 24 Sep 2018 ANOTHER LOOK AT STATISTICAL CALIBRATION: A NON-ASYMPTOTIC THEORY AND PREDICTION-ORIENTED OPTIMALITY arxiv:1802.00021v2 [stat.me] 24 Sep 2018 By Xiaowu Dai, and Peter Chien University of Wisconsin-Madison

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Reliable Inference in Conditions of Extreme Events. Adriana Cornea Reliable Inference in Conditions of Extreme Events by Adriana Cornea University of Exeter Business School Department of Economics ExISta Early Career Event October 17, 2012 Outline of the talk Extreme

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November

More information

Bayesian Inference by Density Ratio Estimation

Bayesian Inference by Density Ratio Estimation Bayesian Inference by Density Ratio Estimation Michael Gutmann https://sites.google.com/site/michaelgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Antonietta Mira. University of Pavia, Italy. Abstract. We propose a test based on Bonferroni's measure of skewness.

Antonietta Mira. University of Pavia, Italy. Abstract. We propose a test based on Bonferroni's measure of skewness. Distribution-free test for symmetry based on Bonferroni's Measure Antonietta Mira University of Pavia, Italy Abstract We propose a test based on Bonferroni's measure of skewness. The test detects the asymmetry

More information

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

5 Introduction to the Theory of Order Statistics and Rank Statistics

5 Introduction to the Theory of Order Statistics and Rank Statistics 5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information