STAT 593 Robust statistics: Modeling and Computing

Size: px
Start display at page:

Download "STAT 593 Robust statistics: Modeling and Computing"

Transcription

1 STAT 593 Robust statistics: Modeling and Computing Joseph Salmon Télécom Paristech, Institut Mines-Télécom & University of Washington, Department of Statistics (Visiting Assistant Professor) 1 / 49

2 Outline Presentation / course organization Prerequisite / references Common estimators Linear Model 2 / 49

3 Table of Contents Presentation / course organization Teaching staff Practical aspects Prerequisite / references Common estimators Linear Model 3 / 49

4 Presentation Joseph Salmon (Assistant Professor): Positions: PhD. student at Paris Diderot-Paris 7 (27-21) Post-Doc at Duke University ( ) Assistant Professor at Télécom ParisTech (212-) Visiting Assistant Professor at UW (218) Research themes: high dimensional statistics, optimization for machine learning, aggregation, image processing joseph.salmon@telecom-paristech.fr Website: josephsalmon.eu 4 / 49

5 (No) Grades / office hours Beware: this is a Credit/No-Credit grading course Office hours : Friday 1:3-11:3 AM; by appointment only Office: B314 Padelford Number of credits: 3 5 / 49

6 Outline of the course Week 1 Introduction, examples, basic concepts, location, scale, equivariance Week 2 Breaking point, M-estimates, pseudo-observations Week 3 L-statistics: Linear combination of order statistics Week 4 Numerical computation of M-estimates, non-smooth convex optimization, Iterative Re-weighted Least Square IRLS Week 5 Smoothing non smooth problems Week 6 Gâteaux differentiability, Sensitivity curve, Influence Function Week 7 Robust regression and multivariate statistics Week 8 Quantile regression, crossing Week 9 Guest Lectures Week 1 Project presentations 6 / 49

7 Table of Contents Presentation / course organization Prerequisite / references General advice Reading Common estimators Linear Model 7 / 49

8 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) 8 / 49

9 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) (bi-)linear algebra basis: vector space, norms, inner product, matrices, determinants, diagonalization Lecture: Horn and Johnson (1994) 8 / 49

10 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) (bi-)linear algebra basis: vector space, norms, inner product, matrices, determinants, diagonalization Lecture: Horn and Johnson (1994) Numerical linear algebra: linear system resolution, Gaussian elimination, matrix factorization, conditioning, etc. Lecture: Golub and VanLoan (213), Applied Numerical Computing by L. Vandenberghe 8 / 49

11 Prerequisites Probability basis: probability, expectation, law of large number, Gaussian distribution, central limit theorem. Books: Murphy (212, ch.1 and 2) Optimisation basis: (differential) calculus, convexity, first order conditions, gradient descent, Newton method Books: Boyd and Vandenberghe (24), Bertsekas (1999) (bi-)linear algebra basis: vector space, norms, inner product, matrices, determinants, diagonalization Lecture: Horn and Johnson (1994) Numerical linear algebra: linear system resolution, Gaussian elimination, matrix factorization, conditioning, etc. Lecture: Golub and VanLoan (213), Applied Numerical Computing by L. Vandenberghe 8 / 49

12 Books, recommended lectures Books on robust statistics: Maronna et al. (26) Huber and Ronchetti (29) Hampel et al. (1986) Rousseeuw and Leroy (1987) Book for linear models: Seber and Lee (23) Book for optimization, Legendre/Fenchel conjugacy: Hiriart-Urruty and Lemarechal (1993,1993b) Bauschke and Combettes (211) Surveys on optimization: Parikh et al. (213) 9 / 49

13 Algorithmic aspects: some advice Python installation: use Conda / Anaconda Recommended tools: Jupyter / IPython Notebook, IPython with a text editor e.g., Atom, Sublime Text, Visual Studio Code, etc. Python, Scipy, Numpy: Pandas: scikit-learn: Statmodels: 1 / 49

14 General advice Use version control system for your work: Git (e.g., Bitbucket, Github, etc.) or Mercurial Use clean way of writing / presenting your code Example : PEP8 for Python (use for instance AutoPEP8, Learn from good examples: etc. 11 / 49

15 List of interesting papers (I) Depth 12 Linear models / Lasso methods D. L. Donoho and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. In: Ann. Statist. 2.4 (1992), pp K. Mosler. Depth statistics. In: Robustness and complex data structures. Springer, 213, pp M. Avella-Medina and E. M. Ronchetti. Robust and consistent variable selection in high-dimensional generalized linear models. In: Biometrika 15.1 (218), pp H. Xu, C. Caramanis, and S. Mannor. Robust regression and Lasso. In: IEEE Trans. Inf. Theory 56.7 (21), pp A. Alfons, C. Croux, and S. Gelper. Sparse least trimmed squares regression for analyzing high-dimensional large data sets. In: Ann. Appl. Stat. 7.1 (213), pp / 49

16 List of interesting papers (II) Robust optimization point of view 67 Robust covariance estimation 8 Geometric median 91 Smoothing non-smooth functions Y. Chen, C. Caramanis, and S. Mannor. Robust sparse regression under adversarial corruption. In: ICML. 213, pp D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. In: SIAM Rev (211), pp M. Chen, C. Gao, and Z. Ren. A General Decision Theory for Huber s ɛ-contamination Model. In: Electron. J. Stat. 1.2 (216), pp S. Minsker. Geometric median and robust estimation in Banach spaces. In: Bernoulli 21.4 (215), pp X. Wei and S. Minsker. Estimation of the covariance structure of heavy-tailed distributions. In: NIPS. 217, pp Y. Nesterov. Smooth minimization of non-smooth functions. In: Math. Program (25), pp A. Beck and M. Teboulle. Smoothing and first order methods: A unified framework. In: SIAM J. Optim (212), pp / 49

17 Table of Contents Presentation / course organization Prerequisite / references Common estimators Location estimation Scale estimation Masking effect Linear Model 14 / 49

18 Notation / Settings Observations: n samples x 1,..., x n real numbers; later real vector will be elements of R d Vector notation : n samples x 1,..., x n (or y 1,..., y n ): x = (x 1,..., x n ) R n (or y = (y 1,..., y n ) R n ) n Inner product: x, y = x i y i i=1 15 / 49

19 Sample Mean (empirical mean) x n : empirical mean x Definition Sample mean : x n = 1 n n i=1 x i = arg min µ R n (µ x i ) 2 i=1 Rem: x n = x, 1n n, where 1 n = (1,..., 1) R n and x n 1 n is the (Euclidean) projection of x on Span(1 n ) 16 / 49

20 Mean: optimization problem Individual objectives 17 / 49

21 Mean: optimization problem Individual objectives Sum of individual objective 17 / 49

22 Median Med n (x) : empirical median x Definition Median : Med n (x) arg min µ R n µ x i i=1 Rem: often, with x (1) x (2)... x (n) (order statistics) x ( n 2 ) +x ( n 2 +1) Med n (x) = 2, if n is even x ( n+1 ), if n is odd 2 18 / 49

23 Median: optimization problem Individual objectives 19 / 49

24 Median: optimization problem Individual objectives Sum of individual objective 19 / 49

25 Median: optimization problem Individual objectives 19 / 49

26 Median: optimization problem Individual objectives Sum of individual objective 19 / 49

27 Trimmed mean x n,.25 : trimmed mean x Definition Trimmed mean (at level α) : x n,α = 1 n 2m n m x (i) i=m+1 where m = (n 1)α and x (i) denotes the order statistics Rem: u is the integer part of u 2 / 49

28 Mean vs median vs Trimmed Mean Med n (x) : empirical median x n,.25 : trimmed mean x n : empirical mean x Trimmed Mean and median are robust to outliers; the (empirical) mean is not 21 / 49

29 Mean vs median vs Trimmed Mean Med n (x) : empirical median x n,.25 : trimmed mean x n : empirical mean x Trimmed Mean and median are robust to outliers; the (empirical) mean is not 21 / 49

30 Mean vs median vs Trimmed Mean Med n (x) : empirical median x n,.25 : trimmed mean x n : empirical mean x Trimmed Mean and median are robust to outliers; the (empirical) mean is not 21 / 49

31 Variance / standard-deviation s n s n x n : empirical mean x Definition Variance : var n (x) = 1 n (x i x n ) 2 = 1 n 1 n 1 x x n1 n 2 i=1 n Std : s n (x) = var n (x) (where z 2 = zi 2 ) i=1 Rem: normalization can change 1/n or 1/(n 1) (unbiased) 22 / 49

32 Mean Absolute Deviation MADn(x) MADn(x) Med n (x) : empirical median x Definition Mean Absolute Deviation (MAD): MAD n (x) = Med n ( Med n (x) x ) Normalized Mean Absolute Deviation (MADN): MADN n (x) = MAD n (x)/.6745 Rem: Φ 1 (3/4).6745 (Φ : Standard Gaussian CDF) 23 / 49

33 Newcomb s experiments (speed of light) Raw observations 24 / 49

34 Newcomb s experiments (speed of light) t-statistics index Standard statistical rule of thumb 3σ : flag a sample x i as outlier when t i > 3, where t i = x i x n s n 24 / 49

35 Newcomb s experiments (speed of light) t-statistics index Robust counterpart for 3σ rule of thumb: flag a sample x i as outlier when t i > 3, where t i = x i Med n (x) MADN n (x) Rem: helps limiting the masking effect 24 / 49

36 Table of Contents Presentation / course organization Prerequisite / references Common estimators Linear Model Least square and variants Leverage points Multidimensional regression 25 / 49

37 Ordinary Least Squares: toy example (y-corruption) 5 Data Raw data 4 3 y-axis x-axis 26 / 49

38 Ordinary Least Squares: toy example (y-corruption) 5 4 OLS-sklearn Data Raw data and fitted 3 y-axis x-axis 26 / 49

39 Ordinary Least Squares: toy example (y-corruption) 5 4 OLS-sklearn Data Raw data and fitted 3 y-axis x-axis 26 / 49

40 Ordinary Least Squares: toy example (y-corruption) 5 4 HuberRegressor-sklearn Data Raw data and fitted 3 y-axis x-axis 26 / 49

41 Ordinary Least Squares: toy example (x-corruption) 5 Raw data Data 4 3 y-axis x-axis 27 / 49

42 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted OLS-sklearn Data 3 y-axis x-axis 27 / 49

43 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted OLS-sklearn Data 3 y-axis x-axis 27 / 49

44 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted HuberRegressor-sklearn Data 3 y-axis x-axis 27 / 49

45 Ordinary Least Squares: toy example (x-corruption) 5 4 Raw data and fitted LTS Data 3 y-axis x-axis 27 / 49

46 A real 2D example Example : braking distance for cars as a function of speed (n = 5 measurements) Data Raw data Distance Speed Dataset cars: 28 / 49

47 A real 2D example Example : braking distance for cars as a function of speed (n = 5 measurements) Data OLS-sklearn Raw data and fitted Distance Speed Dataset cars: 28 / 49

48 Modeling : single feature Observations: (y i, x i ), for i = 1,..., n Linear model or linear regression hypothesis assume: β : intercept (unknown) β1 : slope (unknown) y i β + β 1x i Rem: both parameters are unknown from the statistician Definitions y is an observation or a variable to explain x is a feature or a covariate 29 / 49

49 Modeling III intercept : the scalar β y i = β + β 1x i + ε i Definitions slope : the scalar β 1 noise : the vector ε = (ε 1,..., ε n ) Goal Estimate β and β 1 (unknown) by ˆβ and ˆβ 1 relying on observations (y i, x i ) for i = 1,..., n 3 / 49

50 OLS and Center of gravity y ˆβ + ˆβ 1 x Distance Raw data fitted with intercept and center of mass Data OLS-sklearn speed = 15.4 dist = ˆβ = intercept (negative!) ˆβ1 = slope Speed Physical interpretation: the cloud of points center of gravity belongs to the (estimated) regression line 31 / 49

51 Centering Centered model: Write for any i = 1,..., n : { x i = x i x n y i = y i y n { x = x x n 1 n y = y y n 1 n and 1 n = (1,..., 1) R n, then solving the OLS with (x, y ) leads to β = 1 β 1 = n 1 n n x iy i i=1 n x 2 i Rem: equivalent to choosing the cloud of points center of mass as origin, i.e., (x n, y n) = (, ) i=1 32 / 49

52 Centering (II) 8 6 Data OLS Raw data recentered to center of mass Recentered distance Recentered speed 33 / 49

53 Centering and interpretation Consider the coefficient β 1 ( β = ) for centered y, x, then: β 1 arg min β 1 R n n (y i β 1 x i) 2 = arg min i=1 β 1 R i=1 x 2 i ( y i x i β 1 ) 2 Interpretation: β 1 is a weighted average of the slopes y i β 1 = n i=1 n x 2 y i i x i Influence of extreme points: weights proportional to x 2 i ; leverage effect for far away points j=1 x 2 j x i 34 / 49

54 Extreme points leverage effect Recentered distance Average of slopes: weight by importance Recentered speed 35 / 49

55 Extreme points leverage effect Recentered distance Average of slopes: weight by importance Recentered speed 35 / 49

56 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

57 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

58 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

59 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

60 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

61 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

62 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

63 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

64 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

65 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

66 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

67 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

68 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

69 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

70 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

71 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

72 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

73 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

74 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

75 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

76 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

77 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

78 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

79 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

80 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

81 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

82 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

83 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

84 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

85 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

86 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

87 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

88 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

89 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

90 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

91 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

92 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

93 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

94 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

95 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

96 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

97 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

98 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

99 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

100 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

101 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

102 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

103 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

104 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

105 Extreme points leverage effect Recentered distance Least-squares with sample size n = Recentered speed 36 / 49

106 Multidimensional regression: Model / vocabulary y = Xβ + ε y R n : observations vector X R n p : design matrix (with features as columns) β R p : (unknown) true parameter to be estimated ε R n : noise vector Observations point of view: y i = x i, β + ε i for i = 1,..., n, stands for standard inner product p Features point of view: y = βj x j + ε j=1 37 / 49

107 (Ordinary) Least squares, (O)LS A least square estimator is any solution of the following problem: 1 ˆβ arg min β R p 2 y Xβ 2 2 := f(β) 1 ˆβ arg min β R p 2 n [ yi x i, β ] 2 i=1 Rem: uniqueness does not hold when features are co-linear, and then there are an infinite number of solutions Rem: an intercept is often added Rem: Gaussian (-log)-likelihood leads to square formulation 38 / 49

108 Least squares - normal equation f(β) = X Xβ X y = X (Xβ y) = Theorem Fermat s rule ensures that any LS solution ˆβ satisfies: Normal equation: X X ˆβ = X y ˆβ is solution of the linear system Aβ = b for a matrix A = X X and right hand side b = X y 39 / 49

109 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) for any h For the f of interest here, this reads f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) 4 / 49

110 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) for any h For the f of interest here, this reads f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh 4 / 49

111 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) for any h For the f of interest here, this reads f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh 4 / 49

112 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) For the f of interest here, this reads for any h f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh =f(β) + h, X Xβ X y + 1 }{{} 2 h X Xh }{{} f(β) o(h) 4 / 49

113 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) For the f of interest here, this reads for any h f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh =f(β) + h, X Xβ X y + 1 }{{} 2 h X Xh }{{} f(β) o(h) Hence, f(β) = X Xβ X y = X (Xβ y) 4 / 49

114 Proof: gradient computation The gradient of f, f is defined for any β as the vector satisfying: f(β + h) = f(β) + h, f(β) + o(h) For the f of interest here, this reads for any h f(β + h) = 1 2 y 2 β + h, X y (β + h) X X(β + h) = 1 2 y 2 β, X y h, X y β X Xβ h X Xh + β X Xh =f(β) h, X y h X Xh + β X Xh =f(β) + h, X Xβ X y + 1 }{{} 2 h X Xh }{{} f(β) o(h) Hence, f(β) = X Xβ X y = X (Xβ y) 4 / 49

115 Vocabulary (and abuse of terms) Definition We call Gramian matrix the matrix X X R p p, whose general term is [X X] i,j = x i, x j Rem: X X is often referred to as the feature correlation matrix (true for standardized columns) Rem: when columns are scaled such that j 1, p, x j 2 = n, the Gramian diagonal is (n,..., n) X y = x 1, y. x p, y : observations/features correlation 41 / 49

116 OLS closed-form solution (full rank case) Theorem If X is full (column) rank (i.e., if X X is non-singular) then ˆβ OLS = (X X) 1 X y Rem: if X = 1 n : ˆβ OLS = 1 n, y 1 n, 1 n = ȳ n (empirical mean) Rem: single feature X = x = (x 1,..., x n ) : ˆβ OLS x = x 2, y Beware: in practice avoid inverting the matrix X X numerically time consuming the matrix X X is not even be invertible if p n, e.g., in biology n patients ( 1), p genes ( 5) 42 / 49

117 Example Stackloss dataset: Stackloss plant data, Brownlee (1965), contains 21 days of measurements from a plant s oxidation of ammonia to nitric acid. The nitric oxide pollutants are captured in an absorption tower. number of samples : n = 21 number of features : p = 3 y (to predict): STACKLOSS - 1 times the percentage of ammonia going into the plant escaping from the tower Features: AIRFLOW - Rate of operation of the plant WATERTEMP - Cooling water temperature in the tower ACIDCONC - Acid concentration of circulating acid minus 5 times / 49

118 3σ rule to spot outliers in a linear model 1 OLS t-statistics index t i = y i x i, ˆβ OLS ˆσ with ˆσ = y X ˆβ OLS n p 44 / 49

119 3σ rule to spot outliers in a linear model Robust t-statistics 1 1 Huber index t i = y i x i, ˆβ Huber ˆσ with ˆσ = y X ˆβ Huber n p 44 / 49

120 3σ rule to spot outliers in a linear model Robust t-statistics 1 1 RANSAC index t i = y i x i, ˆβ RANSAC ˆσ with ˆσ = MAD n(y X ˆβ RANSAC ) / 49

121 3σ rule to spot outliers in a linear model Robust t-statistics 1 1 Least Trimmed Squares index t i = y i x i, ˆβ LTS ˆσ with ˆσ = MAD n(y X ˆβ LTS ) / 49

122 References I Alfons, A., C. Croux, and S. Gelper. Sparse least trimmed squares regression for analyzing high-dimensional large data sets. In: Ann. Appl. Stat. 7.1 (213), pp Avella-Medina, M. and E. M. Ronchetti. Robust and consistent variable selection in high-dimensional generalized linear models. In: Biometrika 15.1 (218), pp Bauschke, H. H. and P. L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. New York: Springer, 211, pp. xvi+468. Beck, A. and M. Teboulle. Smoothing and first order methods: A unified framework. In: SIAM J. Optim (212), pp Bertsekas, D. P. Nonlinear programming. Athena Scientific, Bertsimas, D., D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. In: SIAM Rev (211), pp / 49

123 References II Boyd, S. and L. Vandenberghe. Convex optimization. Cambridge University Press, 24, pp. xiv+716. Chen, M., C. Gao, and Z. Ren. A General Decision Theory for Huber s ɛ-contamination Model. In: Electron. J. Stat. 1.2 (216), pp Chen, Y., C. Caramanis, and S. Mannor. Robust sparse regression under adversarial corruption. In: ICML. 213, pp Donoho, D. L. and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. In: Ann. Statist. 2.4 (1992), pp Golub, G. H. and C. F. van Loan. Matrix computations. Fourth. Johns Hopkins University Press, Baltimore, MD, 213, pp. xiv+756. Hampel, F. R. et al. Robust statistics: The Approach Based on Influence Functions. Wiley series in probability and statistics. Wiley, / 49

124 References III Hiriart-Urruty, J.-B. and C. Lemaréchal. Convex analysis and minimization algorithms. I. Vol. 35. Berlin: Springer-Verlag, Convex analysis and minimization algorithms. II. Vol. 36. Berlin: Springer-Verlag, Horn, R. A. and C. R. Johnson. Topics in matrix analysis. Corrected reprint of the 1991 original. Cambridge: Cambridge University Press, 1994, pp. viii+67. Huber, P. J. and E. M. Ronchetti. Robust statistics. Second. Wiley series in probability and statistics. Wiley, 29. Maronna, R. A., R. D. Martin, and V. J. Yohai. Robust statistics: Theory and methods. Chichester: John Wiley & Sons, 26. Minsker, S. Geometric median and robust estimation in Banach spaces. In: Bernoulli 21.4 (215), pp Mosler, K. Depth statistics. In: Robustness and complex data structures. Springer, 213, pp / 49

125 References IV Murphy, K. P. Machine learning: a probabilistic perspective. MIT press, 212. Nesterov, Y. Smooth minimization of non-smooth functions. In: Math. Program (25), pp Parikh, N. et al. Proximal algorithms. In: Foundations and Trends in Machine Learning 1.3 (213), pp Rousseeuw, P. J. and A. M. Leroy. Robust regression and outlier detection. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: John Wiley & Sons Inc., 1987, pp. xvi+329. Seber, G. A. F. and A. J. Lee. Linear Regression Analysis, 2nd edition (Wiley Series in Probability and Statistics). 2nd ed. Wiley, 23. Wei, X. and S. Minsker. Estimation of the covariance structure of heavy-tailed distributions. In: NIPS. 217, pp / 49

126 References V Xu, H., C. Caramanis, and S. Mannor. Robust regression and Lasso. In: IEEE Trans. Inf. Theory 56.7 (21), pp / 49

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Half-Day 1: Introduction to Robust Estimation Techniques

Half-Day 1: Introduction to Robust Estimation Techniques Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Half-Day 1: Introduction to Robust Estimation Techniques Andreas Ruckstuhl Institut fr Datenanalyse

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

MATH36061 Convex Optimization

MATH36061 Convex Optimization M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017 Outline General information What is optimization?

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

IE598 Big Data Optimization Introduction

IE598 Big Data Optimization Introduction IE598 Big Data Optimization Introduction Instructor: Niao He Jan 17, 2018 1 A little about me Assistant Professor, ISE & CSL UIUC, 2016 Ph.D. in Operations Research, M.S. in Computational Sci. & Eng. Georgia

More information

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland. Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction

More information

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization

More information

Does Compressed Sensing have applications in Robust Statistics?

Does Compressed Sensing have applications in Robust Statistics? Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Outlier Detection via Feature Selection Algorithms in

Outlier Detection via Feature Selection Algorithms in Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Spatial autocorrelation: robustness of measures and tests

Spatial autocorrelation: robustness of measures and tests Spatial autocorrelation: robustness of measures and tests Marie Ernst and Gentiane Haesbroeck University of Liege London, December 14, 2015 Spatial Data Spatial data : geographical positions non spatial

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Highly Robust Variogram Estimation 1. Marc G. Genton 2

Highly Robust Variogram Estimation 1. Marc G. Genton 2 Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

Logistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015

Logistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Logistic Regression Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Mohammad Emtiyaz Khan 2015 Classification with linear regression We can use y = 0 for C 1 and y = 1 for C 2 (or vice-versa), and simply use least-squares

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Outliers Robustness in Multivariate Orthogonal Regression

Outliers Robustness in Multivariate Orthogonal Regression 674 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 6, NOVEMBER 2000 Outliers Robustness in Multivariate Orthogonal Regression Giuseppe Carlo Calafiore Abstract

More information

Lecture #11: Classification & Logistic Regression

Lecture #11: Classification & Logistic Regression Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

HOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS

HOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS HOSTOS COMMUNITY COLLEGE DEPARTMENT OF MATHEMATICS MAT 217 Linear Algebra CREDIT HOURS: 4.0 EQUATED HOURS: 4.0 CLASS HOURS: 4.0 PREREQUISITE: PRE/COREQUISITE: MAT 210 Calculus I MAT 220 Calculus II RECOMMENDED

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

Machine Learning CSE546

Machine Learning CSE546 http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Lahore University of Management Sciences. MATH 210 Introduction to Differential Equations

Lahore University of Management Sciences. MATH 210 Introduction to Differential Equations MATH 210 Introduction to Differential Equations Fall 2016-2017 Instructor Room No. Office Hours Email Telephone Secretary/TA TA Office Hours Course URL (if any) Ali Ashher Zaidi ali.zaidi@lums.edu.pk Math.lums.edu.pk/moodle

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

KANSAS STATE UNIVERSITY Manhattan, Kansas

KANSAS STATE UNIVERSITY Manhattan, Kansas ROBUST MIXTURE MODELING by CHUN YU M.S., Kansas State University, 2008 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy Department

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0)

YIJUN ZUO. Education. PhD, Statistics, 05/98, University of Texas at Dallas, (GPA 4.0/4.0) YIJUN ZUO Department of Statistics and Probability Michigan State University East Lansing, MI 48824 Tel: (517) 432-5413 Fax: (517) 432-5413 Email: zuo@msu.edu URL: www.stt.msu.edu/users/zuo Education PhD,

More information

Efficient and Robust Scale Estimation

Efficient and Robust Scale Estimation Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Preface. Figures Figures appearing in the text were prepared using MATLAB R. For product information, please contact:

Preface. Figures Figures appearing in the text were prepared using MATLAB R. For product information, please contact: Linear algebra forms the basis for much of modern mathematics theoretical, applied, and computational. The purpose of this book is to provide a broad and solid foundation for the study of advanced mathematics.

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Parameter Estimation and Hypothesis Testing in Linear Models

Parameter Estimation and Hypothesis Testing in Linear Models Parameter Estimation and Hypothesis Testing in Linear Models Springer-Verlag Berlin Heidelberg GmbH Karl-Rudolf Koch Parameter Estimation and Hypothesis Testing in Linear Models Second, updated and enlarged

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein Organization of the lecture Advanced course, 2+2 hours,

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION

PAijpam.eu M ESTIMATION, S ESTIMATION, AND MM ESTIMATION IN ROBUST REGRESSION International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 349-360 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.7

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

An algorithm for robust fitting of autoregressive models Dimitris N. Politis

An algorithm for robust fitting of autoregressive models Dimitris N. Politis An algorithm for robust fitting of autoregressive models Dimitris N. Politis Abstract: An algorithm for robust fitting of AR models is given, based on a linear regression idea. The new method appears to

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information