Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

Size: px
Start display at page:

Download "Distributed Statistical Estimation and Rates of Convergence in Normal Approximation"

Transcription

1 Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Stas Minsker (joint with Nate Strawn) Department of Mathematics, USC July 3, 2017 Colloquium on Concentration inequalities, High-dimensional Statistics, and Stein s method

2 Challenges of Contemporary Statistics Resource limitations: massive data need computer clusters for storage and efficient processing

3 Challenges of Contemporary Statistics Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Node 1 Node 2... Node k Master

4 Challenges of Contemporary Statistics Node 2 Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Presence of outliers of unknown nature... Node k Master

5 Challenges of Contemporary Statistics Node 2 Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Presence of outliers of unknown nature = requires algorithms that are robust and do not rely on preprocessing or outlier detection.... Node k Master

6 Challenges of Contemporary Statistics Node 2 Node 1 Resource limitations: massive data need computer clusters for storage and efficient processing = requires algorithms that can be implemented in parallel. Presence of outliers of unknown nature = requires algorithms that are robust and do not rely on preprocessing or outlier detection. While ad-hoc techniques exist for some problems, we would like to develop general methods.... Node k Master

7 Parallel algorithms

8 Parallel algorithms Data Subset 1 Subset k

9 Parallel algorithms Data Subset 1 Subset k "Embarrasingly parellel": no communication Communication allowed

10 Parallel algorithms Data Subset 1 Subset k "Embarrasingly parellel": no communication Communication allowed De-bias take average Compute the spatial median? Versions of gradient descent Very general, but is not robust Requires estmators to be asymptotically normal

11 Example: how to estimate the mean? Assume that X 1,..., X N are i.i.d. N (µ, σ 2 ). Problem: construct CI norm(α) for µ with coverage probability 1 2α.

12 Example: how to estimate the mean? Assume that X 1,..., X N are i.i.d. N (µ, σ 2 ). Problem: construct CI norm(α) for µ with coverage probability 1 2α. N Solution: compute ˆµ := 1 X N j, take j=1 [ CI norm(α) = ˆµ σ log(1/α) 2, ˆµ + σ ] log(1/α) 2 N N

13 Example: how to estimate the mean? Assume that X 1,..., X N are i.i.d. N (µ, σ 2 ). Problem: construct CI norm(α) for µ with coverage probability 1 2α. N Solution: compute ˆµ := 1 X N j, take j=1 To find ˆµ: set m = N/k, [ CI norm(α) = ˆµ σ log(1/α) 2, ˆµ + σ ] log(1/α) 2 N N X 1,..., X m X N m+1,..., X N }{{}}{{} µ 1 := m 1 m X N j X j j=1 µ k := m 1 j=n m+1 } {{ } k ˆµ j ˆµ= k 1 j=1

14 Example: how to estimate the mean? N Solution: compute ˆµ := 1 X N j, take j=1 [ CI norm(α) = ˆµ σ log(1/α) 2, ˆµ + σ ] log(1/α) 2 N N To find ˆµ: set m = N/k, X 1,..., X m X N m+1,..., X N }{{}}{{} µ 1 := m 1 m X N j X j j=1 µ k := m 1 j=n m+1 } {{ } k ˆµ j ˆµ= k 1 j=1 Averaging works in many scenarios where estimators have small bias (J. Fan, H. Liu et al., J. Lee, J. Taylor et al., Y. Zhang, J. Duchi, M. Wainwright)

15 Example: how to estimate the mean? P. J. Huber (1964):...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one?" Going back to our question: what if X, X 1,..., X N are i.i.d. from Π with EX = µ, Var(X) = σ 2?

16 Example: how to estimate the mean? P. J. Huber (1964):...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one?" Going back to our question: what if X, X 1,..., X N are i.i.d. from Π with EX = µ, Var(X) = σ 2? Problem: construct CI for µ with coverage probability 1 α such that for any α length(ci(α)) (Absolute constant) length(ci norm(α)) No additional assumptions on Π are imposed.

17 Example: how to estimate the mean? P. J. Huber (1964):...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one?" Going back to our question: what if X, X 1,..., X N are i.i.d. from Π with EX = µ, Var(X) = σ 2? Problem: construct CI for µ with coverage probability 1 α such that for any α length(ci(α)) (Absolute constant) length(ci norm(α)) No additional assumptions on Π are imposed. N Remark: guarantees for the sample mean ˆµ N = 1 X N j is unsatisfactory: j=1 ( Pr ˆµN µ ) (1/α) σ α. N Does the solution exist?

18 Example: how to estimate the mean? Answer: Yes!

19 Example: how to estimate the mean? Answer: Yes! Construction: [A. Nemirovski, D. Yudin 83; N. Alon, Y. Matias, M. Szegedy 96; R. Oliveira, M. Lerasle 11] Split the sample into k = log(1/α) + 1 groups G 1,..., G k of size N/k each: G 1 {}}{ X 1,..., X G1 }{{} µ 1 := 1 X G 1 i X i G 1 G k {}}{ X N Gk +1,..., X N }{{} µ k := 1 X G k i X i G k }{{} µ (k) :=median( ˆµ 1,..., ˆµ k )

20 Example: how to estimate the mean? Answer: Yes! Construction: [A. Nemirovski, D. Yudin 83; N. Alon, Y. Matias, M. Szegedy 96; R. Oliveira, M. Lerasle 11] Split the sample into k = log(1/α) + 1 groups G 1,..., G k of size N/k each: G 1 {}}{ X 1,..., X G1 }{{} µ 1 := 1 X G 1 i X i G 1 G k {}}{ X N Gk +1,..., X N }{{} µ k := 1 X G k i X i G k }{{} µ (k) :=median( ˆµ 1,..., ˆµ k ) Claim: ( ) log(1/α) Pr µ (k) µ 6.5 σ α N

21 Example: how to estimate the mean? Answer: Yes! Construction: [A. Nemirovski, D. Yudin 83; N. Alon, Y. Matias, M. Szegedy 96; R. Oliveira, M. Lerasle 11] Split the sample into k = log(1/α) + 1 groups G 1,..., G k of size N/k each: G 1 {}}{ X 1,..., X G1 }{{} µ 1 := 1 X G 1 i X i G 1 G k {}}{ X N Gk +1,..., X N }{{} µ k := 1 X G k i X i G k }{{} µ (k) :=median( ˆµ 1,..., ˆµ k ) Claim: Then take ( ) log(1/α) Pr µ (k) µ 6.5 σ α N [ ] log(1/α) log(1/α) CI(α) = µ (k) 6.5σ, µ (k) + 6.5σ N N

22 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N

23 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N;

24 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N; In this case, µ (k) µ N 1/4 with probability 1 e N ;

25 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N; In this case, µ (k) µ N 1/4 with probability 1 e N ; If we would like the confidence to be 95%, k = 5;

26 k = log(1/α) + 1 ( ) log(1/α) Pr µ (k) µ 6.5σ α N We would like k to be large, for example k N; In this case, µ (k) µ N 1/4 with probability 1 e N ; If we would like the confidence to be 95%, k = 5; Is the problem with the construction, or existing bounds are suboptimal?

27 Simulation results N = 2 16 N = 2 18 N = Median Error log N k

28 Under additional mild assumptions, existing results can be improved

29 {P θ, θ R};

30 {P θ, θ R}; X 1,..., X N i.i.d. from P θ ;

31 {P θ, θ R}; X 1,..., X N i.i.d. from P θ ; X 1,..., X m }{{} θ 1 = θ 1 (X 1,...,X m) X N m+1,..., X N }{{} θ k = θ k (X N m+1,...,x N ) } {{ } θ (k) =median( θ 1,..., θ k )

32 {P θ, θ R}; X 1,..., X N i.i.d. from P θ ; X 1,..., X m }{{} θ 1 = θ 1 (X 1,...,X m) X N m+1,..., X N }{{} θ k = θ k (X N m+1,...,x N ) } {{ } θ (k) =median( θ 1,..., θ k ) Assumption: there exists a sequence {σ n} n 1 such that ( ) g(n) := sup θ1 P θ t Φ(t) t R σ 0 as n. n

33 Assumption: there exists a sequence {σ n} n 1 such that ( g(n) := sup θ1 P θ t R σ n ) t Φ(t) 0 as n. Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s.

34 Assumption: there exists a sequence {σ n} n 1 such that ( g(n) := sup θ1 P θ t R σ n ) t Φ(t) 0 as n. Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s. Example: θ = EX, θ j is the sample mean over subgroup the G j, σ n = σ n (CLT)

35 ( g(n) := sup θ P 1 θ t R σ n ) t Φ(t) 0 as n. Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s. Example: θ = EX, θ j is the sample mean over subgroup the G j, σ n = σ n (CLT) Assume that E X EX 3 <. Then (by Berry-Esseen theorem) θ (k) θ 3σ with probability 1 4e 2s. E X EX 3 g(n) 0.5 σ 3, n ( E X θ 3 σ 3 k N k + s N k )

36 Example: θ = EX, θ j is the sample mean over subgroup the G j, σ n = σ n (CLT) Assume that E X EX 3 <. Then (by Berry-Esseen theorem) θ (k) θ 3σ with probability 1 4e 2s. Implies optimal rates whenever k N. E X EX 3 g(n) 0.5 σ 3, n ( E X θ 3 σ 3 k N k + s N k )

37 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ dx = p θ (x);

38 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ = p dx θ (x); Regularity (smoothness) assumptions for {p θ, θ Θ R} are satisfied;

39 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ = p dx θ (x); Regularity (smoothness) assumptions for {p θ, θ Θ R} are satisfied; Then for all s k, with probability 1 e s. θ (k) θ ( Const ) k s I(θ ) N + N

40 Example: Distributed Maximum Likelihood Estimation X 1,..., X N P θ, dp θ = p dx θ (x); Regularity (smoothness) assumptions for {p θ, θ Θ R} are satisfied; Then for all s k, with probability 1 e s. θ (k) θ ( Const ) k s I(θ ) N + N Asymptotic normality has been treated recently by G. Reinert and A. Anastasiou (2014), I. Pinelis (2016).

41 Proof of the theorem:

42 Connections to U-quantiles Previously described estimators depend on the specific partition of the data.

43 Connections to U-quantiles Previously described estimators depend on the specific partition of the data. To avoid such dependence, consider θ (k) = med ( θj, J A (n) N ), A (n) := {J : J {1,..., N}, Card(J) = n := N/k }; N θ J := θ(x j, j J) is an estimator of θ based on {X j, j J}.

44 Connections to U-quantiles Previously described estimators depend on the specific partition of the data. To avoid such dependence, consider θ (k) = med ( θ J, J A (n) N ), A (n) := {J : J {1,..., N}, Card(J) = n := N/k }; N θj := θ(x j, j J) is an estimator of θ based on {X j, j J}. Guarantees for θ (k) are at least as good as for θ (k) : Theorem (M., Strawn) For all s k, ) θ (k) s θ 3σn (g(n) + k with probability 1 4e 2s.

45 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ.

46 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ. "Naive approach" estimate each coordinate of θ separately: x = argmin y R d k y x j 1. j=1

47 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ. "Naive approach" estimate each coordinate of θ separately: x = argmin y R d k y x j 1. It follows from the previous results and the union bound that θ (k) 2 θ 3 E X (j) θ (j) 3 k s tr Σ max j=1...d Σ 3/2 N k + N k j,j with probability 1 4de 2s. j=1

48 Extension to higher dimensions Assume that X R d, EX = θ, E(X EX)(X EX) T = Σ. "Naive approach" estimate each coordinate of θ separately: x = argmin y R d k y x j 1. It follows from the previous results and the union bound that θ (k) 2 θ 3 E X (j) θ (j) 3 k s tr Σ max j=1...d Σ 3/2 N k + N k j,j with probability 1 4de 2s. Estimator is not invariant with respect to orthogonal transformations. j=1

49 Extension to higher dimensions Definition The geometric median is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1

50 Extension to higher dimensions Definition The geometric median is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1 Remarks: 1 x convex hull(x 1,..., x k ). y x

51 Extension to higher dimensions Definition The geometric median is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1 Remarks: 1 x convex hull(x 1,..., x k ). 2 x can be numerically approximated using Weiszfeld s algorithm. y x

52 Extension to higher dimensions The geometric median x is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. j=1

53 Extension to higher dimensions The geometric median x is defined as x = med (x 1,..., x k ) := argmin y R d k y x j 2. Assumption: there exists a sequence {σ n} n N R + and a positive-definite matrix Σ such that Σ 1 and ( ) g d (n) := sup 1 P ) ( θ 1 θ S Φ Σ (S) S - cone σ 0 as n. n j=1

54 Extension to higher dimensions Assumption: there exists a sequence {σ n} n N R + and a positive-definite matrix Σ such that Σ 1 and ( ) g d (n) := sup 1 P ) ( θ 1 θ S Φ Σ (S) S - cone σ 0 as n. n Theorem (M., Strawn) For all 1 s k, ( )) θ (k) 2 1 s θ σ n (Const 1 (d) k + Const 2(d) k + g d (n) with probability 1 e s. Const 1 (d) = 6 log 4e 5/2 (d + 4) d + 2 (d 1) ln 4 Const 2 (d) = d + 2 (d 1) ln 4.

55 Example For the mean estimation problem, the bound becomes θ (k) 2 θ 32.4 Σ 1/2 cond(σ 1/2 ) C 1(d) + C 2 (d) s N 4N + 400d 1/4 E Σ 1/2 (X θ ) 3 2 n

56 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry?

57 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension?

58 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension? Applications to robust optimization techniques, such as variants of the gradient descent methods.

59 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension? Applications to robust optimization techniques, such as variants of the gradient descent methods. Empirical risk minimization based on the median-of-means?

60 Further questions What if asymptotic normality does not hold? What is the correct way to measure symmetry? Is it possible to obtain the bounds with optimal dependence on the dimension? Applications to robust optimization techniques, such as variants of the gradient descent methods. Empirical risk minimization based on the median-of-means? Extensions to Bayesian statistics?

61 Thank you for your attention!

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate

More information

Sub-Gaussian estimators under heavy tails

Sub-Gaussian estimators under heavy tails Sub-Gaussian estimators under heavy tails Roberto Imbuzeiro Oliveira XIX Escola Brasileira de Probabilidade Maresias, August 6th 2015 Joint with Luc Devroye (McGill) Matthieu Lerasle (CNRS/Nice) Gábor

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

The risk of machine learning

The risk of machine learning / 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications Hyunsook Lee. hlee@stat.psu.edu Department of Statistics The Pennsylvania State University Hyunsook

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

R. Lachieze-Rey Recent Berry-Esseen bounds obtained with Stein s method andgeorgia PoincareTech. inequalities, 1 / with 29 G.

R. Lachieze-Rey Recent Berry-Esseen bounds obtained with Stein s method andgeorgia PoincareTech. inequalities, 1 / with 29 G. Recent Berry-Esseen bounds obtained with Stein s method and Poincare inequalities, with Geometric applications Raphaël Lachièze-Rey, Univ. Paris 5 René Descartes, Georgia Tech. R. Lachieze-Rey Recent Berry-Esseen

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Suggested solutions to written exam Jan 17, 2012

Suggested solutions to written exam Jan 17, 2012 LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, ANd 73A36 THEORY OF STATISTICS, 6 CDTS Master s program in Statistics and Data Mining Fall semester Written exam Suggested solutions to

More information

Computational Oracle Inequalities for Large Scale Model Selection Problems

Computational Oracle Inequalities for Large Scale Model Selection Problems for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) 1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood

ICSA Applied Statistics Symposium 1. Balanced adjusted empirical likelihood ICSA Applied Statistics Symposium 1 Balanced adjusted empirical likelihood Art B. Owen Stanford University Sarah Emerson Oregon State University ICSA Applied Statistics Symposium 2 Empirical likelihood

More information

Inference based on robust estimators Part 2

Inference based on robust estimators Part 2 Inference based on robust estimators Part 2 Matias Salibian-Barrera 1 Department of Statistics University of British Columbia ECARES - Dec 2007 Matias Salibian-Barrera (UBC) Robust inference (2) ECARES

More information

PRINCIPLES OF STATISTICAL INFERENCE

PRINCIPLES OF STATISTICAL INFERENCE Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Nonlinear Signal Processing ELEG 833

Nonlinear Signal Processing ELEG 833 Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Introduction to Rare Event Simulation

Introduction to Rare Event Simulation Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Minimax rates for Batched Stochastic Optimization

Minimax rates for Batched Stochastic Optimization Minimax rates for Batched Stochastic Optimization John Duchi based on joint work with Feng Ruan and Chulhee Yun Stanford University Tradeoffs Major problem in theoretical statistics: how do we characterize

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Distirbutional robustness, regularizing variance, and adversaries

Distirbutional robustness, regularizing variance, and adversaries Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Quasi-local mass and isometric embedding

Quasi-local mass and isometric embedding Quasi-local mass and isometric embedding Mu-Tao Wang, Columbia University September 23, 2015, IHP Recent Advances in Mathematical General Relativity Joint work with Po-Ning Chen and Shing-Tung Yau. The

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

Empirical Likelihood

Empirical Likelihood Empirical Likelihood Patrick Breheny September 20 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Introduction Empirical likelihood We will discuss one final approach to constructing confidence

More information

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. BOOTSTRAP RANKING & SELECTION REVISITED Soonhui Lee School of Business

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Class 7 AMS-UCSC Tue 31, 2012 Winter 2012. Session 1 (Class 7) AMS-132/206 Tue 31, 2012 1 / 13 Topics Topics We will talk about... 1 Hypothesis testing

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Communication-Efficient Distributed Statistical Inference

Communication-Efficient Distributed Statistical Inference Communication-Efficient Distributed Statistical Inference Michael I. Jordan, Jason D. Lee, Yun Yang arxiv:1605.07689v3 [stat.ml] 6 Nov 2016 November 8, 2016 1 Abstract We present a Communication-efficient

More information

Econ 583 Final Exam Fall 2008

Econ 583 Final Exam Fall 2008 Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

MIT Spring 2015

MIT Spring 2015 MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Machine learning, shrinkage estimation, and economic theory

Machine learning, shrinkage estimation, and economic theory Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

arxiv: v1 [math.st] 15 Nov 2017

arxiv: v1 [math.st] 15 Nov 2017 Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING arxiv:1711.05381v1 [math.st] 15 Nov 2017 By

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING

A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING Submitted to the Annals of Statistics A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING By Wen-Xin Zhou, Koushiki Bose, Jianqing Fan,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,

More information