STAT 593 Robust statistics: Modeling and Computing
|
|
- Amberlynn Pierce
- 5 years ago
- Views:
Transcription
1 STAT 593 Robust statistics: Modeling and Computing Joseph Salmon Télécom Paristech, Institut Mines-Télécom & University of Washington, Department of Statistics (Visiting Assistant Professor) 1 / 49
2 Outline Presentation / course organization Prerequisite / references Common estimators Linear Model 2 / 49
3 Table of Contents Presentation / course organization Teaching staff Practical aspects Prerequisite / references Common estimators Linear Model 3 / 49
4 Presentation Joseph Salmon (Assistant Professor): Positions: PhD. student at Paris Diderot-Paris 7 (27-21) Post-Doc at Duke University ( ) Assistant Professor at Télécom ParisTech (212-) Visiting Assistant Professor at UW (218) Research themes: high dimensional statistics, optimization for machine learning, aggregation, image processing joseph.salmon@telecom-paristech.fr Website: josephsalmon.eu 4 / 49
5 (No) Grades / office hours Beware: this is a Credit/No-Credit grading course Office hours : Friday 1:3-11:3 AM; by appointment only Office: B314 Padelford Number of credits: 3 5 / 49
6 Outline of the course Week 1 Introduction, examples, basic concepts, location, scale, equivariance Week 2 Breaking point, M-estimates, pseudo-observations Week 3 L-statistics: Linear combination of order statistics Week 4 Numerical computation of M-estimates, non-smooth convex optimization, Iterative Re-weighted Least Square IRLS Week 5 Smoothing non smooth problems Week 6 Gâteaux differentiability, Sensitivity curve, Influence Function Week 7 Robust regression and multivariate statistics Week 8 Quantile regression, crossing Week 9 Guest Lectures Week 1 Project presentations 6 / 49
7 Table of Contents Presentation / course organization Prerequisite / references General advice Reading Common estimators Linear Model 7 / 49
8 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) 8 / 49
9 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) (bi-)linear algebra basis: vector space, norms, inner product, matrices, determinants, diagonalization Lecture: Horn and Johnson (1994) 8 / 49
10 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) (bi-)linear algebra basis: vector space, norms, inner product, matrices, determinants, diagonalization Lecture: Horn and Johnson (1994) Numerical linear algebra: linear system resolution, Gaussian elimination, matrix factorization, conditioning, etc. Lecture: Golub and VanLoan (213), Applied Numerical Computing by L. Vandenberghe 8 / 49
11 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) (bi-)linear algebra basis: vector space, norms, inner product, matrices, determinants, diagonalization Lecture: Horn and Johnson (1994) Numerical linear algebra: linear system resolution, Gaussian elimination, matrix factorization, conditioning, etc. Lecture: Golub and VanLoan (213), Applied Numerical Computing by L. Vandenberghe 8 / 49
12 Books, recommended lectures Books on robust statistics: Maronna et al. (26) Huber and Ronchetti (29) Hampel et al. (1986) Rousseeuw and Leroy (1987) Book for linear models: Seber and Lee (23) Book for optimization, Legendre/Fenchel conjugacy: Hiriart-Urruty and Lemarechal (1993,1993b) Bauschke and Combettes (211) Surveys on optimization: Parikh et al. (213) 9 / 49
13 Algorithmic aspects: some advice Python installation: use Conda / Anaconda Recommended tools: Jupyter / IPython Notebook, IPython with a text editor e.g., Atom, Sublime Text, Visual Studio Code, etc. Python, Scipy, Numpy: Pandas: scikit-learn: Statmodels: 1 / 49
14 General advice Use version control system for your work: Git (e.g., Bitbucket, Github, etc.) or Mercurial Use clean way of writing / presenting your code Example : PEP8 for Python (use for instance AutoPEP8, Learn from good examples: etc. 11 / 49
15 List of interesting papers (I) Depth 12 Linear models / Lasso methods D. L. Donoho and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. In: Ann. Statist. 2.4 (1992), pp K. Mosler. Depth statistics. In: Robustness and complex data structures. Springer, 213, pp M. Avella-Medina and E. M. Ronchetti. Robust and consistent variable selection in high-dimensional generalized linear models. In: Biometrika 15.1 (218), pp H. Xu, C. Caramanis, and S. Mannor. Robust regression and Lasso. In: IEEE Trans. Inf. Theory 56.7 (21), pp A. Alfons, C. Croux, and S. Gelper. Sparse least trimmed squares regression for analyzing high-dimensional large data sets. In: Ann. Appl. Stat. 7.1 (213), pp / 49
16 List of interesting papers (II) Robust optimization point of view 67 Robust covariance estimation 8 Geometric median 91 Smoothing non-smooth functions Y. Chen, C. Caramanis, and S. Mannor. Robust sparse regression under adversarial corruption. In: ICML. 213, pp D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. In: SIAM Rev (211), pp M. Chen, C. Gao, and Z. Ren. A General Decision Theory for Huber s ɛ-contamination Model. In: Electron. J. Stat. 1.2 (216), pp S. Minsker. Geometric median and robust estimation in Banach spaces. In: Bernoulli 21.4 (215), pp X. Wei and S. Minsker. Estimation of the covariance structure of heavy-tailed distributions. In: NIPS. 217, pp Y. Nesterov. Smooth minimization of non-smooth functions. In: Math. Program (25), pp A. Beck and M. Teboulle. Smoothing and first order methods: A unified framework. In: SIAM J. Optim (212), pp / 49
17 Table of Contents Presentation / course organization Prerequisite / references Common estimators Location estimation Scale estimation Masking effect Linear Model 14 / 49
18 Notation / Settings Observations: n samples x 1,..., x n real numbers; later real vector will be elements of R d Vector notation : n samples x 1,..., x n (or y 1,..., y n ): x = (x 1,..., x n ) R n (or y = (y 1,..., y n ) R n ) n Inner product: x, y = x i y i i=1 15 / 49
19 Sample Mean (empirical mean) x n : empirical mean x Definition Sample mean : x n = 1 n n i=1 x i = arg min µ R n (µ x i ) 2 i=1 Rem: x n = x, 1n n, where 1 n = (1,..., 1) R n and x n 1 n is the (Euclidean) projection of x on Span(1 n ) 16 / 49
20 Mean: optimization problem Individual objectives 17 / 49
21 Mean: optimization problem Individual objectives Sum of individual objective 17 / 49
22 Median Med n (x) : empirical median x Definition Median : Med n (x) arg min µ R n µ x i i=1 Rem: often, with x (1) x (2)... x (n) (order statistics) x ( n 2 ) +x ( n 2 +1) Med n (x) = 2, if n is even x ( n+1 ), if n is odd 2 18 / 49
23 Median: optimization problem Individual objectives 19 / 49
24 Median: optimization problem Individual objectives Sum of individual objective 19 / 49
25 Median: optimization problem Individual objectives 19 / 49
26 Median: optimization problem Individual objectives Sum of individual objective 19 / 49
27 Trimmed mean x n,.25 : trimmed mean x Definition Trimmed mean (at level α) : x n,α = 1 n 2m n m x (i) i=m+1 where m = (n 1)α and x (i) denotes the order statistics Rem: u is the integer part of u 2 / 49
28 Mean vs median vs Trimmed Mean Med n (x) : empirical median x n,.25 : trimmed mean x n : empirical mean x Trimmed Mean and median are robust to outliers; the (empirical) mean is not 21 / 49
29 Mean vs median vs Trimmed Mean Med n (x) : empirical median x n,.25 : trimmed mean x n : empirical mean x Trimmed Mean and median are robust to outliers; the (empirical) mean is not 21 / 49
30 Mean vs median vs Trimmed Mean Med n (x) : empirical median x n,.25 : trimmed mean x n : empirical mean x Trimmed Mean and median are robust to outliers; the (empirical) mean is not 21 / 49
31 Variance / standard-deviation s n s n x n : empirical mean x Definition Variance : var n (x) = 1 n (x i x n ) 2 = 1 n 1 n 1 x x n1 n 2 i=1 n Std : s n (x) = var n (x) (where z 2 = zi 2 ) i=1 Rem: normalization can change 1/n or 1/(n 1) (unbiased) 22 / 49
32 Mean Absolute Deviation MADn(x) MADn(x) Med n (x) : empirical median x Definition Mean Absolute Deviation (MAD): MAD n (x) = Med n ( Med n (x) x ) Normalized Mean Absolute Deviation (MADN): MADN n (x) = MAD n (x)/.6745 Rem: Φ 1 (3/4).6745 (Φ : Standard Gaussian CDF) 23 / 49
33 Newcomb s experiments (speed of light) Raw observations 24 / 49
34 Newcomb s experiments (speed of light) t-statistics index Standard statistical rule of thumb 3σ : flag a sample x i as outlier when t i > 3, where t i = x i x n s n 24 / 49
35 Newcomb s experiments (speed of light) t-statistics index Robust counterpart for 3σ rule of thumb: flag a sample x i as outlier when t i > 3, where t i = x i Med n (x) MADN n (x) Rem: helps limiting the masking effect 24 / 49
36 Table of Contents Presentation / course organization Prerequisite / references Common estimators Linear Model Least square and variants Leverage points Multidimensional regression 25 / 49
37 Ordinary Least Squares: toy example (y-corruption) 5 Data Raw data 4 3 y-axis x-axis 26 / 49
38 Ordinary Least Squares: toy example (y-corruption) 5 4 OLS-sklearn Data Raw data and fitted 3 y-axis x-axis 26 / 49
39 Ordinary Least Squares: toy example (y-corruption) 5 4 OLS-sklearn Data Raw data and fitted 3 y-axis x-axis 26 / 49
40 Ordinary Least Squares: toy example (y-corruption) 5 4 HuberRegressor-sklearn Data Raw data and fitted 3 y-axis x-axis 26 / 49
41 Ordinary Least Squares: toy example (x-corruption) 5 Raw data Data 4 3 y-axis x-axis 27 / 49
42 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted OLS-sklearn Data 3 y-axis x-axis 27 / 49
43 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted OLS-sklearn Data 3 y-axis x-axis 27 / 49
44 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted HuberRegressor-sklearn Data 3 y-axis x-axis 27 / 49
45 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted LTS Data 3 y-axis x-axis 27 / 49
46 A real 2D example Example : braking distance for cars as a function of speed (n = 5 measurements) Data Raw data Distance Speed Dataset cars: 28 / 49
47 A real 2D example Example : braking distance for cars as a function of speed (n = 5 measurements) Data OLS-sklearn Raw data and fitted Distance Speed Dataset cars: 28 / 49
48 Modeling : single feature Observations: (y i, x i ), for i = 1,..., n Linear model or linear regression hypothesis assume: β : intercept (unknown) β1 : slope (unknown) y i β + β 1x i Rem: both parameters are unknown from the statistician Definitions y is an observation or a variable to explain x is a feature or a covariate 29 / 49
49 Modeling III intercept : the scalar β y i = β + β 1x i + ε i Definitions slope : the scalar β 1 noise : the vector ε = (ε 1,..., ε n ) Goal Estimate β and β 1 (unknown) by ˆβ and ˆβ 1 relying on observations (y i, x i ) for i = 1,..., n 3 / 49
50 OLS and Center of gravity y ˆβ + ˆβ 1 x Distance Raw data fitted with intercept and center of mass Data OLS-sklearn speed = 15.4 dist = ˆβ = intercept (negative!) ˆβ1 = slope Speed Physical interpretation: the cloud of points center of gravity belongs to the (estimated) regression line 31 / 49
51 Centering Centered model: Write for any i = 1,..., n : { x i = x i x n y i = y i y n { x = x x n 1 n y = y y n 1 n and 1 n = (1,..., 1) R n, then solving the OLS with (x, y ) leads to β = 1 β 1 = n 1 n n x iy i i=1 n x 2 i Rem: equivalent to choosing the cloud of points center of mass as origin, i.e., (x n, y n) = (, ) i=1 32 / 49
52 Centering (II) 8 6 Data OLS Raw data recentered to center of mass Recentered distance Recentered speed 33 / 49
53 Centering and interpretation Consider the coefficient β 1 ( β = ) for centered y, x, then: β 1 arg min β 1 R n n (y i β 1 x i) 2 = arg min i=1 β 1 R i=1 x 2 i ( y i x i β 1 ) 2 Interpretation: β 1 is a weighted average of the slopes y i β 1 = n i=1 n x 2 y i i x i Influence of extreme points: weights proportional to x 2 i ; leverage effect for far away points j=1 x 2 j x i 34 / 49
54 Extreme points leverage effect Recentered distance Average of slopes: weight by importance Recentered speed 35 / 49
55 Extreme points leverage effect Recentered distance Average of slopes: weight by importance Recentered speed 35 / 49
56 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
57 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
58 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
59 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
60 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
61 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
62 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
63 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
64 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
65 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
66 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
67 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
68 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
69 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
70 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
71 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
72 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
73 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
74 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
75 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
76 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
77 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
78 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
79 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
80 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
81 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
82 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
83 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
84 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
85 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
86 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
87 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
88 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
89 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
90 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
91 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
92 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
93 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
94 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
95 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
96 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
97 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
98 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
99 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
100 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
101 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
102 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
103 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
104 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
105 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49
106 Multidimensional regression: Model / vocabulary y = Xβ + ε y R n : observations vector X R n p : design matrix (with features as columns) β R p : (unknown) true parameter to be estimated ε R n : noise vector Observations point of view: y i = x i, β + ε i for i = 1,..., n, stands for standard inner product p Features point of view: y = βj x j + ε j=1 37 / 49
107 (Ordinary) Least squares, (O)LS A least square estimator is any solution of the following problem: 1 ˆβ arg min β R p 2 y Xβ 2 2 := f(β) 1 ˆβ arg min β R p 2 n [ yi x i, β ] 2 i=1 Rem: uniqueness does not hold when features are co-linear, and then there are an infinite number of solutions Rem: an intercept is often added Rem: Gaussian (-log)-likelihood leads to square formulation 38 / 49
108 Least squares - normal equation f(β) = X Xβ X y = X (Xβ y) = Theorem Fermat s rule ensures that any LS solution ˆβ satisfies: Normal equation: X X ˆβ = X y ˆβ is solution of the linear system Aβ = b for a matrix A = X X and right hand side b = X y 39 / 49
109 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) for any h For the f of interest here, this reads f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) 4 / 49
110 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) for any h For the f of interest here, this reads f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh 4 / 49
111 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) for any h For the f of interest here, this reads f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh 4 / 49
112 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) For the f of interest here, this reads for any h f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh =f(β) + h, X Xβ X y + 1 }{{} 2 h X Xh }{{} f(β) o(h) 4 / 49
113 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) For the f of interest here, this reads for any h f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh =f(β) + h, X Xβ X y + 1 }{{} 2 h X Xh }{{} f(β) o(h) Hence, f(β) = X Xβ X y = X (Xβ y) 4 / 49
114 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) For the f of interest here, this reads for any h f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh =f(β) + h, X Xβ X y + 1 }{{} 2 h X Xh }{{} f(β) o(h) Hence, f(β) = X Xβ X y = X (Xβ y) 4 / 49
115 Vocabulary (and abuse of terms) Definition We call Gramian matrix the matrix X X R p p, whose general term is [X X] i,j = x i, x j Rem: X X is often referred to as the feature correlation matrix (true for standardized columns) Rem: when columns are scaled such that j 1, p, x j 2 = n, the Gramian diagonal is (n,..., n) X y = x 1, y. x p, y : observations/features correlation 41 / 49
116 OLS closed-form solution (full rank case) Theorem If X is full (column) rank (i.e., if X X is non-singular) then ˆβ OLS = (X X) 1 X y Rem: if X = 1 n : ˆβ OLS = 1 n, y 1 n, 1 n = ȳ n (empirical mean) Rem: single feature X = x = (x 1,..., x n ) : ˆβ OLS x = x 2, y Beware: in practice avoid inverting the matrix X X numerically time consuming the matrix X X is not even be invertible if p n, e.g., in biology n patients ( 1), p genes ( 5) 42 / 49
117 Example Stackloss dataset: Stackloss plant data, Brownlee (1965), contains 21 days of measurements from a plant s oxidation of ammonia to nitric acid. The nitric oxide pollutants are captured in an absorption tower. number of samples : n = 21 number of features : p = 3 y (to predict): STACKLOSS - 1 times the percentage of ammonia going into the plant escaping from the tower Features: AIRFLOW - Rate of operation of the plant WATERTEMP - Cooling water temperature in the tower ACIDCONC - Acid concentration of circulating acid minus 5 times / 49
118 3σ rule to spot outliers in a linear model 1 OLS t-statistics index t i = y i x i, ˆβ OLS ˆσ with ˆσ = y X ˆβ OLS n p 44 / 49
119 3σ rule to spot outliers in a linear model Robust t-statistics 1 1 Huber index t i = y i x i, ˆβ Huber ˆσ with ˆσ = y X ˆβ Huber n p 44 / 49
120 3σ rule to spot outliers in a linear model Robust t-statistics 1 1 RANSAC index t i = y i x i, ˆβ RANSAC ˆσ with ˆσ = MAD n(y X ˆβ RANSAC ) / 49
121 3σ rule to spot outliers in a linear model Robust t-statistics 1 1 Least Trimmed Squares index t i = y i x i, ˆβ LTS ˆσ with ˆσ = MAD n(y X ˆβ LTS ) / 49
122 References I Alfons, A., C. Croux, and S. Gelper. Sparse least trimmed squares regression for analyzing high-dimensional large data sets. In: Ann. Appl. Stat. 7.1 (213), pp Avella-Medina, M. and E. M. Ronchetti. Robust and consistent variable selection in high-dimensional generalized linear models. In: Biometrika 15.1 (218), pp Bauschke, H. H. and P. L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. New York: Springer, 211, pp. xvi+468. Beck, A. and M. Teboulle. Smoothing and first order methods: A unified framework. In: SIAM J. Optim (212), pp Bertsekas, D. P. Nonlinear programming. Athena Scientific, Bertsimas, D., D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. In: SIAM Rev (211), pp / 49
123 References II Boyd, S. and L. Vandenberghe. Convex optimization. Cambridge University Press, 24, pp. xiv+716. Chen, M., C. Gao, and Z. Ren. A General Decision Theory for Huber s ɛ-contamination Model. In: Electron. J. Stat. 1.2 (216), pp Chen, Y., C. Caramanis, and S. Mannor. Robust sparse regression under adversarial corruption. In: ICML. 213, pp Donoho, D. L. and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. In: Ann. Statist. 2.4 (1992), pp Golub, G. H. and C. F. van Loan. Matrix computations. Fourth. Johns Hopkins University Press, Baltimore, MD, 213, pp. xiv+756. Hampel, F. R. et al. Robust statistics: The Approach Based on Influence Functions. Wiley series in probability and statistics. Wiley, / 49
124 References III Hiriart-Urruty, J.-B. and C. Lemaréchal. Convex analysis and minimization algorithms. I. Vol. 35. Berlin: Springer-Verlag, Convex analysis and minimization algorithms. II. Vol. 36. Berlin: Springer-Verlag, Horn, R. A. and C. R. Johnson. Topics in matrix analysis. Corrected reprint of the 1991 original. Cambridge: Cambridge University Press, 1994, pp. viii+67. Huber, P. J. and E. M. Ronchetti. Robust statistics. Second. Wiley series in probability and statistics. Wiley, 29. Maronna, R. A., R. D. Martin, and V. J. Yohai. Robust statistics: Theory and methods. Chichester: John Wiley & Sons, 26. Minsker, S. Geometric median and robust estimation in Banach spaces. In: Bernoulli 21.4 (215), pp Mosler, K. Depth statistics. In: Robustness and complex data structures. Springer, 213, pp / 49
125 References IV Murphy, K. P. Machine learning: a probabilistic perspective. MIT press, 212. Nesterov, Y. Smooth minimization of non-smooth functions. In: Math. Program (25), pp Parikh, N. et al. Proximal algorithms. In: Foundations and Trends in Machine Learning 1.3 (213), pp Rousseeuw, P. J. and A. M. Leroy. Robust regression and outlier detection. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: John Wiley & Sons Inc., 1987, pp. xvi+329. Seber, G. A. F. and A. J. Lee. Linear Regression Analysis, 2nd edition (Wiley Series in Probability and Statistics). 2nd ed. Wiley, 23. Wei, X. and S. Minsker. Estimation of the covariance structure of heavy-tailed distributions. In: NIPS. 217, pp / 49
126 References V Xu, H., C. Caramanis, and S. Mannor. Robust regression and Lasso. In: IEEE Trans. Inf. Theory 56.7 (21), pp / 49
Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationIntroduction Robust regression Examples Conclusion. Robust regression. Jiří Franc
Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationMULTIVARIATE TECHNIQUES, ROBUSTNESS
MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More informationMidwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter
Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationHalf-Day 1: Introduction to Robust Estimation Techniques
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Half-Day 1: Introduction to Robust Estimation Techniques Andreas Ruckstuhl Institut fr Datenanalyse
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationBreakdown points of Cauchy regression-scale estimators
Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion
More informationIMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE
IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos
More informationMATH36061 Convex Optimization
M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017 Outline General information What is optimization?
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationSmall Sample Corrections for LTS and MCD
myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationIE598 Big Data Optimization Introduction
IE598 Big Data Optimization Introduction Instructor: Niao He Jan 17, 2018 1 A little about me Assistant Professor, ISE & CSL UIUC, 2016 Ph.D. in Operations Research, M.S. in Computational Sci. & Eng. Georgia
More informationIntroduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.
Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction
More informationCS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine
CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization
More informationDoes Compressed Sensing have applications in Robust Statistics?
Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationOutlier Detection via Feature Selection Algorithms in
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationA Modified M-estimator for the Detection of Outliers
A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,
More informationSpatial autocorrelation: robustness of measures and tests
Spatial autocorrelation: robustness of measures and tests Marie Ernst and Gentiane Haesbroeck University of Liege London, December 14, 2015 Spatial Data Spatial data : geographical positions non spatial
More informationRobustness Meets Algorithms
Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationHighly Robust Variogram Estimation 1. Marc G. Genton 2
Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationOptimization for Machine Learning
Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS
More informationLogistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015
Logistic Regression Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Mohammad Emtiyaz Khan 2015 Classification with linear regression We can use y = 0 for C 1 and y = 1 for C 2 (or vice-versa), and simply use least-squares
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationOutliers Robustness in Multivariate Orthogonal Regression
674 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 6, NOVEMBER 2000 Outliers Robustness in Multivariate Orthogonal Regression Giuseppe Carlo Calafiore Abstract
More informationLecture #11: Classification & Logistic Regression
Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationHOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS
HOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS MAT 217 Linear Algebra CREDIT HOURS: 4.0 EQUATED HOURS: 4.0 CLASS HOURS: 4.0 PREREQUISITE: PRE/COREQUISITE: MAT 210 Calculus I MAT 220 Calculus II RECOMMENDED
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course
More informationA Brief Overview of Robust Statistics
A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust
More informationMachine Learning CSE546
http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction
Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More informationLahore University of Management Sciences. MATH 210 Introduction to Differential Equations
MATH 210 Introduction to Differential Equations Fall 2016-2017 Instructor Room No. Office Hours Email Telephone Secretary/TA TA Office Hours Course URL (if any) Ali Ashher Zaidi ali.zaidi@lums.edu.pk Math.lums.edu.pk/moodle
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationOn Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness
Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu
More informationKANSAS STATE UNIVERSITY Manhattan, Kansas
ROBUST MIXTURE MODELING by CHUN YU M.S., Kansas State University, 2008 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy Department
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationRobust Variable Selection Through MAVE
Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse
More informationYIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)
YIJUN ZUO Department of Statistics and Probability Michigan State University East Lansing, MI 48824 Tel: (517) 432-5413 Fax: (517) 432-5413 Email: zuo@msu.edu URL: www.stt.msu.edu/users/zuo Education PhD,
More informationEfficient and Robust Scale Estimation
Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationPreface. Figures Figures appearing in the text were prepared using MATLAB R. For product information, please contact:
Linear algebra forms the basis for much of modern mathematics theoretical, applied, and computational. The purpose of this book is to provide a broad and solid foundation for the study of advanced mathematics.
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationParameter Estimation and Hypothesis Testing in Linear Models
Parameter Estimation and Hypothesis Testing in Linear Models Springer-Verlag Berlin Heidelberg GmbH Karl-Rudolf Koch Parameter Estimation and Hypothesis Testing in Linear Models Second, updated and enlarged
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationDetecting and Assessing Data Outliers and Leverage Points
Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals
More informationPAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION
International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 349-360 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.7
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationChapter 1 - Lecture 3 Measures of Location
Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationAn algorithm for robust fitting of autoregressive models Dimitris N. Politis
An algorithm for robust fitting of autoregressive models Dimitris N. Politis Abstract: An algorithm for robust fitting of AR models is given, based on a linear regression idea. The new method appears to
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More information