Distributed Statistical Estimation and Rates of Convergence in Normal Approximation
|
|
- Anne Fisher
- 5 years ago
- Views:
Transcription
1 Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Stas Minsker (joint with Nate Strawn) Department of Mathematics, USC July 3, 2017 Colloquium on Concentration inequalities, High-dimensional Statistics, and Stein s method
2 Challenges of Contemporary Statistics Resource limitations: massive data need computer clusters for storage and efficient processing
3 Challenges of Contemporary Statistics Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Node 1 Node 2... Node k Master
4 Challenges of Contemporary Statistics Node 2 Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Presence of outliers of unknown nature... Node k Master
5 Challenges of Contemporary Statistics Node 2 Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Presence of outliers of unknown nature = requires algorithms that are robust and do not rely on preprocessing or outlier detection.... Node k Master
6 Challenges of Contemporary Statistics Node 2 Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Presence of outliers of unknown nature = requires algorithms that are robust and do not rely on preprocessing or outlier detection. While ad-hoc techniques exist for some problems, we would like to develop general methods.... Node k Master
7 Parallel algorithms
8 Parallel algorithms Data Subset 1 Subset k
9 Parallel algorithms Data Subset 1 Subset k "Embarrasingly parellel": no communication Communication allowed
10 Parallel algorithms Data Subset 1 Subset k "Embarrasingly parellel": no communication Communication allowed De-bias take average Compute the spatial median? Versions of gradient descent Very general, but is not robust Requires estmators to be asymptotically normal
11 Example: how to estimate the mean? Assume that X 1,..., X N are i.i.d. N (µ, σ 2 ). Problem: construct CI norm(α) for µ with coverage probability 1 2α.
12 Example: how to estimate the mean? Assume that X 1,..., X N are i.i.d. N (µ, σ 2 ). Problem: construct CI norm(α) for µ with coverage probability 1 2α. N Solution: compute ˆµ := 1 X N j, take j=1 [ CI norm(α) = ˆµ σ log(1/α) 2, ˆµ + σ ] log(1/α) 2 N N
13 Example: how to estimate the mean? Assume that X 1,..., X N are i.i.d. N (µ, σ 2 ). Problem: construct CI norm(α) for µ with coverage probability 1 2α. N Solution: compute ˆµ := 1 X N j, take j=1 To find ˆµ: set m = N/k, [ CI norm(α) = ˆµ σ log(1/α) 2, ˆµ + σ ] log(1/α) 2 N N X 1,..., X m X N m+1,..., X N }{{}}{{} µ 1 := m 1 m X N j X j j=1 µ k := m 1 j=n m+1 } {{ } k ˆµ j ˆµ= k 1 j=1
14 Example: how to estimate the mean? N Solution: compute ˆµ := 1 X N j, take j=1 [ CI norm(α) = ˆµ σ log(1/α) 2, ˆµ + σ ] log(1/α) 2 N N To find ˆµ: set m = N/k, X 1,..., X m X N m+1,..., X N }{{}}{{} µ 1 := m 1 m X N j X j j=1 µ k := m 1 j=n m+1 } {{ } k ˆµ j ˆµ= k 1 j=1 Averaging works in many scenarios where estimators have small bias (J. Fan, H. Liu et al., J. Lee, J. Taylor et al., Y. Zhang, J. Duchi, M. Wainwright)
15 Example: how to estimate the mean? P. J. Huber (1964):...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one?" Going back to our question: what if X, X 1,..., X N are i.i.d. from Π with EX = µ, Var(X) = σ 2?
16 Example: how to estimate the mean? P. J. Huber (1964):...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one?" Going back to our question: what if X, X 1,..., X N are i.i.d. from Π with EX = µ, Var(X) = σ 2? Problem: construct CI for µ with coverage probability 1 α such that for any α length(ci(α)) (Absolute constant) length(ci norm(α)) No additional assumptions on Π are imposed.
17 Example: how to estimate the mean? P. J. Huber (1964):...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one?" Going back to our question: what if X, X 1,..., X N are i.i.d. from Π with EX = µ, Var(X) = σ 2? Problem: construct CI for µ with coverage probability 1 α such that for any α length(ci(α)) (Absolute constant) length(ci norm(α)) No additional assumptions on Π are imposed. N Remark: guarantees for the sample mean ˆµ N = 1 X N j is unsatisfactory: j=1 ( Pr ˆµN µ ) (1/α) σ α. N Does the solution exist?
18 Example: how to estimate the mean? Answer: Yes!
19 Example: how to estimate the mean? Answer: Yes! Construction: [A. Nemirovski, D. Yudin 83; N. Alon, Y. Matias, M. Szegedy 96; R. Oliveira, M. Lerasle 11] Split the sample into k = log(1/α) + 1 groups G 1,..., G k of size N/k each: G 1 {}}{ X 1,..., X G1 }{{} µ 1 := 1 X G 1 i X i G 1 G k {}}{ X N Gk +1,..., X N }{{} µ k := 1 X G k i X i G k }{{} µ (k) :=median( ˆµ 1,..., ˆµ k )
20 Example: how to estimate the mean? Answer: Yes! Construction: [A. Nemirovski, D. Yudin 83; N. Alon, Y. Matias, M. Szegedy 96; R. Oliveira, M. Lerasle 11] Split the sample into k = log(1/α) + 1 groups G 1,..., G k of size N/k each: G 1 {}}{ X 1,..., X G1 }{{} µ 1 := 1 X G 1 i X i G 1 G k {}}{ X N Gk +1,..., X N }{{} µ k := 1 X G k i X i G k }{{} µ (k) :=median( ˆµ 1,..., ˆµ k ) Claim: ( ) log(1/α) Pr µ (k) µ 6.5 σ α N
21 Example: how to estimate the mean? Answer: Yes! Construction: [A. Nemirovski, D. Yudin 83; N. Alon, Y. Matias, M. Szegedy 96; R. Oliveira, M. Lerasle 11] Split the sample into k = log(1/α) + 1 groups G 1,..., G k of size N/k each: G 1 {}}{ X 1,..., X G1 }{{} µ 1 := 1 X G 1 i X i G 1 G k {}}{ X N Gk +1,..., X N }{{} µ k := 1 X G k i X i G k }{{} µ (k) :=median( ˆµ 1,..., ˆµ k ) Claim: Then take ( ) log(1/α) Pr µ (k) µ 6.5 σ α N [ ] log(1/α) log(1/α) CI(α) = µ (k) 6.5σ, µ (k) + 6.5σ N N
22 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N
23 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N;
24 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N; In this case, µ (k) µ N 1/4 with probability 1 e N ;
25 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N; In this case, µ (k) µ N 1/4 with probability 1 e N ; If we would like the confidence to be 95%, k = 5;
26 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N; In this case, µ (k) µ N 1/4 with probability 1 e N ; If we would like the confidence to be 95%, k = 5; Is the problem with the construction, or existing bounds are suboptimal?
27 Simulation results N = 2 16 N = 2 18 N = Median Error log N k
28 Under additional mild assumptions, existing results can be improved
29 {P θ, θ R};
30 {P θ, θ R}; X 1,..., X N i.i.d. from P θ ;
31 {P θ, θ R}; X 1,..., X N i.i.d. from P θ ; X 1,..., X m }{{} θ 1 = θ 1 (X 1,...,X m) X N m+1,..., X N }{{} θ k = θ k (X N m+1,...,x N ) } {{ } θ (k) =median( θ 1,..., θ k )
32 {P θ, θ R}; X 1,..., X N i.i.d. from P θ ; X 1,..., X m }{{} θ 1 = θ 1 (X 1,...,X m) X N m+1,..., X N }{{} θ k = θ k (X N m+1,...,x N ) } {{ } θ (k) =median( θ 1,..., θ k ) Assumption: there exists a sequence {σ n} n 1 such that ( ) g(n) := sup θ1 P θ t Φ(t) t R σ 0 as n. n
33 Assumption: there exists a sequence {σ n} n 1 such that ( g(n) := sup θ1 P θ t R σ n ) t Φ(t) 0 as n. Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s.
34 Assumption: there exists a sequence {σ n} n 1 such that ( g(n) := sup θ1 P θ t R σ n ) t Φ(t) 0 as n. Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s. Example: θ = EX, θ j is the sample mean over subgroup the G j, σ n = σ n (CLT)
35 ( g(n) := sup θ P 1 θ t R σ n ) t Φ(t) 0 as n. Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s. Example: θ = EX, θ j is the sample mean over subgroup the G j, σ n = σ n (CLT) Assume that E X EX 3 <. Then (by Berry-Esseen theorem) θ (k) θ 3σ with probability 1 4e 2s. E X EX 3 g(n) 0.5 σ 3, n ( E X θ 3 σ 3 k N k + s N k )
36 Example: θ = EX, θ j is the sample mean over subgroup the G j, σ n = σ n (CLT) Assume that E X EX 3 <. Then (by Berry-Esseen theorem) θ (k) θ 3σ with probability 1 4e 2s. Implies optimal rates whenever k N. E X EX 3 g(n) 0.5 σ 3, n ( E X θ 3 σ 3 k N k + s N k )
37 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ dx = p θ (x);
38 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ = p dx θ (x); Regularity (smoothness) assumptions for {p θ, θ Θ R} are satisfied;
39 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ = p dx θ (x); Regularity (smoothness) assumptions for {p θ, θ Θ R} are satisfied; Then for all s k, with probability 1 e s. θ (k) θ ( Const ) k s I(θ ) N + N
40 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ = p dx θ (x); Regularity (smoothness) assumptions for {p θ, θ Θ R} are satisfied; Then for all s k, with probability 1 e s. θ (k) θ ( Const ) k s I(θ ) N + N Asymptotic normality has been treated recently by G. Reinert and A. Anastasiou (2014), I. Pinelis (2016).
41 Proof of the theorem:
42 Connections to U-quantiles Previously described estimators depend on the specific partition of the data.
43 Connections to U-quantiles Previously described estimators depend on the specific partition of the data. To avoid such dependence, consider θ (k) = med ( θj, J A (n) N ), A (n) := {J : J {1,..., N}, Card(J) = n := N/k }; N θ J := θ(x j, j J) is an estimator of θ based on {X j, j J}.
44 Connections to U-quantiles Previously described estimators depend on the specific partition of the data. To avoid such dependence, consider θ (k) = med ( θ J, J A (n) N ), A (n) := {J : J {1,..., N}, Card(J) = n := N/k }; N θj := θ(x j, j J) is an estimator of θ based on {X j, j J}. Guarantees for θ (k) are at least as good as for θ (k) : Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s.
45 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ.
46 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ. "Naive approach" estimate each coordinate of θ separately: x = argmin y R d k y x j 1. j=1
47 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ. "Naive approach" estimate each coordinate of θ separately: x = argmin y R d k y x j 1. It follows from the previous results and the union bound that θ (k) 2 θ 3 E X (j) θ (j) 3 k s tr Σ max j=1...d Σ 3/2 N k + N k j,j with probability 1 4de 2s. j=1
48 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ. "Naive approach" estimate each coordinate of θ separately: x = argmin y R d k y x j 1. It follows from the previous results and the union bound that θ (k) 2 θ 3 E X (j) θ (j) 3 k s tr Σ max j=1...d Σ 3/2 N k + N k j,j with probability 1 4de 2s. Estimator is not invariant with respect to orthogonal transformations. j=1
49 Extension to higher dimensions Definition The geometric median is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1
50 Extension to higher dimensions Definition The geometric median is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1 Remarks: 1 x convex hull(x 1,..., x k ). y x
51 Extension to higher dimensions Definition The geometric median is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1 Remarks: 1 x convex hull(x 1,..., x k ). 2 x can be numerically approximated using Weiszfeld s algorithm. y x
52 Extension to higher dimensions The geometric median x is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1
53 Extension to higher dimensions The geometric median x is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. Assumption: there exists a sequence {σ n} n N R + and a positive-definite matrix Σ such that Σ 1 and ( ) g d (n) := sup 1 P ) ( θ 1 θ S Φ Σ (S) S - cone σ 0 as n. n j=1
54 Extension to higher dimensions Assumption: there exists a sequence {σ n} n N R + and a positive-definite matrix Σ such that Σ 1 and ( ) g d (n) := sup 1 P ) ( θ 1 θ S Φ Σ (S) S - cone σ 0 as n. n Theorem (M., Strawn) For all 1 s k, ( )) θ (k) 2 1 s θ σ n (Const 1 (d) k + Const 2(d) k + g d (n) with probability 1 e s. Const 1 (d) = 6 log 4e 5/2 (d + 4) d + 2 (d 1) ln 4 Const 2 (d) = d + 2 (d 1) ln 4.
55 Example For the mean estimation problem, the bound becomes θ (k) 2 θ 32.4 Σ 1/2 cond(σ 1/2 ) C 1(d) + C 2 (d) s N 4N + 400d 1/4 E Σ 1/2 (X θ ) 3 2 n
56 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry?
57 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension?
58 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension? Applications to robust optimization techniques, such as variants of the gradient descent methods.
59 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension? Applications to robust optimization techniques, such as variants of the gradient descent methods. Empirical risk minimization based on the median-of-means?
60 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension? Applications to robust optimization techniques, such as variants of the gradient descent methods. Empirical risk minimization based on the median-of-means? Extensions to Bayesian statistics?
61 Thank you for your attention!
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate
More informationSub-Gaussian estimators under heavy tails
Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015 Joint with Luc Devroye (McGill) Matthieu Lerasle (CNRS/Nice) Gábor
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationThe risk of machine learning
/ 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationA Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications
A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications Hyunsook Lee. hlee@stat.psu.edu Department of Statistics The Pennsylvania State University Hyunsook
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationR. Lachieze-Rey Recent Berry-Esseen bounds obtained with Stein s method andgeorgia PoincareTech. inequalities, 1 / with 29 G.
Recent Berry-Esseen bounds obtained with Stein s method and Poincare inequalities, with Geometric applications Raphaël Lachièze-Rey, Univ. Paris 5 René Descartes, Georgia Tech. R. Lachieze-Rey Recent Berry-Esseen
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationSTAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method
STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationSuggested solutions to written exam Jan 17, 2012
LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, ANd 73A36 THEORY OF STATISTICS, 6 CDTS Master s program in Statistics and Data Mining Fall semester Written exam Suggested solutions to
More informationComputational Oracle Inequalities for Large Scale Model Selection Problems
for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationQuantifying Stochastic Model Errors via Robust Optimization
Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)
1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood
ICSA Applied Statistics Symposium 1 Balanced adjusted empirical likelihood Art B. Owen Stanford University Sarah Emerson Oregon State University ICSA Applied Statistics Symposium 2 Empirical likelihood
More informationInference based on robust estimators Part 2
Inference based on robust estimators Part 2 Matias Salibian-Barrera 1 Department of Statistics University of British Columbia ECARES - Dec 2007 Matias Salibian-Barrera (UBC) Robust inference (2) ECARES
More informationPRINCIPLES OF STATISTICAL INFERENCE
Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationOrdinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods
Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationNonlinear Signal Processing ELEG 833
Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationLecture notes on statistical decision theory Econ 2110, fall 2013
Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationIntroduction to Rare Event Simulation
Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationOnline (and Distributed) Learning with Information Constraints. Ohad Shamir
Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information
More informationMinimax rates for Batched Stochastic Optimization
Minimax rates for Batched Stochastic Optimization John Duchi based on joint work with Feng Ruan and Chulhee Yun Stanford University Tradeoffs Major problem in theoretical statistics: how do we characterize
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationEmpirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationQuasi-local mass and isometric embedding
Quasi-local mass and isometric embedding Mu-Tao Wang, Columbia University September 23, 2015, IHP Recent Advances in Mathematical General Relativity Joint work with Po-Ning Chen and Shing-Tung Yau. The
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More informationEmpirical Likelihood
Empirical Likelihood Patrick Breheny September 20 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Introduction Empirical likelihood We will discuss one final approach to constructing confidence
More informationProceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.
Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. BOOTSTRAP RANKING & SELECTION REVISITED Soonhui Lee School of Business
More informationSection 8.2. Asymptotic normality
30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans
More informationStatistical Inference
Statistical Inference Classical and Bayesian Methods Class 7 AMS-UCSC Tue 31, 2012 Winter 2012. Session 1 (Class 7) AMS-132/206 Tue 31, 2012 1 / 13 Topics Topics We will talk about... 1 Hypothesis testing
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationCommunication-Efficient Distributed Statistical Inference
Communication-Efficient Distributed Statistical Inference Michael I. Jordan, Jason D. Lee, Yun Yang arxiv:1605.07689v3 [stat.ml] 6 Nov 2016 November 8, 2016 1 Abstract We present a Communication-efficient
More informationEcon 583 Final Exam Fall 2008
Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationMIT Spring 2015
MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationarxiv: v1 [math.st] 15 Nov 2017
Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING arxiv:1711.05381v1 [math.st] 15 Nov 2017 By
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationA NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING
Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING By Wen-Xin Zhou, Koushiki Bose, Jianqing Fan,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More information