Fast Algorithms for Segmented Regression

Size: px
Start display at page:

Download "Fast Algorithms for Segmented Regression"

Transcription

1 Fast Algorithms for Segmented Regression Jayadev Acharya 1 Ilias Diakonikolas 2 Jerry Li 1 Ludwig Schmidt 1 1 MIT 2 USC June 21, / 21

2 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? 2 / 21

3 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: 2 / 21

4 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: Estimator A: Great statistical rate, but slow to compute Estimator B: Worse statistical rate, but faster to compute 2 / 21

5 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: Estimator A: Great statistical rate, but slow to compute Estimator B: Worse statistical rate, but faster to compute When is it better to use estimator B vs estimator A? 2 / 21

6 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: Estimator A: Great statistical rate, but slow to compute Estimator B: Worse statistical rate, but faster to compute When is it better to use estimator B vs estimator A? As data grows, it may be beneficial to consider faster inferential algorithms, because the increasing statistical strength of the data can compensate for the poor algorithmic quality. [Jor13] 2 / 21

7 Introduction Outline Introduction The exact algorithm Our algorithm Experiments 2 / 21

8 Introduction Outline Introduction The exact algorithm Our algorithm Experiments 2 / 21

9 Introduction Linear regression 3 / 21

10 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), 3 / 21

11 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. 3 / 21

12 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. Assume that ɛ (i) are independent noise variables. 3 / 21

13 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. Assume that ɛ (i) are independent noise variables. Goal: Find a linear l(x) minimizing MSE(l) = 1 n n (l(x (i) ) l (x (i) )) 2. i=1 3 / 21

14 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. Assume that ɛ (i) are independent noise variables. Goal: Find a linear l(x) minimizing MSE(l) = 1 n n (l(x (i) ) l (x (i) )) 2. i=1 We consider fixed design regression: we assume the x (i) are fixed, and the only randomness is over the ɛ (i). 3 / 21

15 Introduction The least squares estimator Definition (Least squares estimator) The least squares estimator, denoted l LS, is given by: l LS def 1 = arg min l linear n n (y (i) l(x (i) )) 2. i=1 4 / 21

16 Introduction The least squares estimator Definition (Least squares estimator) The least squares estimator, denoted l LS, is given by: l LS def 1 = arg min l linear n n (y (i) l(x (i) )) 2. i=1 The least squares fit simply the best fit linear function to the data. 4 / 21

17 Introduction The least squares estimator Definition (Least squares estimator) The least squares estimator, denoted l LS, is given by: l LS def 1 = arg min l linear n n (y (i) l(x (i) )) 2. i=1 The least squares fit simply the best fit linear function to the data. How well does it recover the ground truth l? 4 / 21

18 Introduction The least squares estimator Theorem Let l LS be as above. Suppose that ɛ (i) N (0, σ 2 ). Then with high probability, ( MSE( l LS ) = O σ 2 d ). n Moreover, l LS can be computed in time O(nd 2 ). 5 / 21

19 Introduction The least squares estimator Theorem Let l LS be as above. Suppose that ɛ (i) N (0, σ 2 ). Then with high probability, ( MSE( l LS ) = O σ 2 d ). n Moreover, l LS can be computed in time O(nd 2 ). More recent work (see e.g. [CW13]) gets even faster theoretical runtimes. 5 / 21

20 Introduction Dealing with change What if linear regression is insufficient? 6 / 21

21 Introduction Dealing with change What if linear regression is insufficient? Dow Jones data 6 / 21

22 Introduction Dealing with change What if linear regression is insufficient? Dow Jones data Q: What if your model changes as a function of one of your variables? 6 / 21

23 Introduction Dealing with change What if linear regression is insufficient? Dow Jones data Q: What if your model changes as a function of one of your variables? A: Model it with a piecewise linear fit! 6 / 21

24 Introduction Segmented Regression 7 / 21

25 Introduction Segmented Regression Definition (Piecewise linearity) A function f : R d R is k-piecewise linear if there exists a partition of R into k intervals I 1,..., I k so that for all j, the function f is linear restricted to the set of x R d so that x 1 I j. 7 / 21

26 Introduction Segmented Regression Definition (Piecewise linearity) A function f : R d R is k-piecewise linear if there exists a partition of R into k intervals I 1,..., I k so that for all j, the function f is linear restricted to the set of x R d so that x 1 I j. Segmented Regression [BP98, YP13] Given a data set (x (1), y (1) ),..., (x (n), y (n) ) so that y (i) = f (x (i) ) + ɛ (i), where ɛ (i) are independent noise and f is k-piecewise linear, recover f in MSE. 7 / 21

27 The exact algorithm Outline Introduction The exact algorithm Our algorithm Experiments 8 / 21

28 The exact algorithm The k-piecewise linear LS estimator Definition (Least squares estimator) The k-piecewise linear least squares estimator, denoted f LS k def 1 = arg min f k-piecewise linear n n (y (i) f (x (i) )) 2. i=1 f LS k, is given by: 9 / 21

29 The exact algorithm The k-piecewise linear LS estimator Definition (Least squares estimator) The k-piecewise linear least squares estimator, denoted Theorem f LS f LS k def 1 = arg min f k-piecewise linear n n (y (i) f (x (i) )) 2. i=1 f LS k, is given by: Let k be as above. Suppose that ɛ (i) N (0, σ 2 ). Then with high probability, ( MSE( f k LS ) = O σ 2 kd ). n Moreover, this rate is optimal. 9 / 21

30 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? 10 / 21

31 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? Theorem (BP98) There is a dynamic program for computing the k-piecewise linear LS estimator on n samples in d dimensions which runs in time O(n 2 (d 2 + k)). 10 / 21

32 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? Theorem (BP98) There is a dynamic program for computing the k-piecewise linear LS estimator on n samples in d dimensions which runs in time O(n 2 (d 2 + k)). So poly time...but quite slow as n gets large. 10 / 21

33 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? Theorem (BP98) There is a dynamic program for computing the k-piecewise linear LS estimator on n samples in d dimensions which runs in time O(n 2 (d 2 + k)). So poly time...but quite slow as n gets large. The algorithm took Θ(1) minutes to run for 10 4 samples, and Θ(1) hours to run for 10 5 samples. 10 / 21

34 Our algorithm Outline Introduction The exact algorithm Our algorithm Experiments 10 / 21

35 Our algorithm Our Results Main Result (informally) An algorithm for segmented regression which runs in time which is linear in n / 21

36 Our algorithm Our Results Main Result (informally) An algorithm for segmented regression which runs in time which is linear in n...but has a worse theoretical statistical guarantee. 11 / 21

37 Our algorithm Our Results Main Result (informally) An algorithm for segmented regression which runs in time which is linear in n...but has a worse theoretical statistical guarantee. Formally... Theorem There is an 4k-piecewise linear estimator f which can be computed in time O(n(d 2 + k)) so that w.h.p. ( MSE( f ) Õ σ 2 kd ) k n + σ. n 11 / 21

38 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime 12 / 21

39 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP 12 / 21

40 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP O ( σ 2 kd n ) 12 / 21

41 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP O ( σ 2 kd n ) O(n 2 (d 2 + k)) 12 / 21

42 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd n ) O(n 2 (d 2 + k)) 12 / 21

43 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) 12 / 21

44 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) 12 / 21

45 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? 12 / 21

46 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? ( ) DP: O σ 2 k2 d 2 (d 2 + k) ɛ 2 12 / 21

47 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? ( ) DP: O σ 2 k2 d 2 (d 2 + k) ɛ 2 Our Results: Õ ( σ 2 k ɛ max ( d, 1 ) ɛ (d 2 + k) ) 12 / 21

48 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? ( ) DP: O σ 2 k2 d 2 (d 2 + k) ɛ 2 Our Results: Õ ( σ 2 k ɛ max ( d, 1 ) ɛ (d 2 + k) ) Speedup: Õ(min ( kd ɛ, kd 2) ) 12 / 21

49 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). 13 / 21

50 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) / 21

51 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 (1) (n) }, {x 2 },..., {x 1 }}. 13 / 21

52 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) (1) (n) 1 }, {x 2 },..., {x 1 }}. While I j > 4k: 13 / 21

53 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. 13 / 21

54 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. 13 / 21

55 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. 13 / 21

56 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u 13 / 21

57 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: 13 / 21

58 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: If u L, include I 2u 1 I 2u in I j+1 13 / 21

59 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: If u L, include I 2u 1 I 2u in I j+1 Else if u L include I 2u 1 and I 2u in I j / 21

60 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: If u L, include I 2u 1 I 2u in I j+1 Else if u L include I 2u 1 and I 2u in I j+1. Output The linear least squares fit over each interval in I j 13 / 21

61 Our algorithm Example: k = 2 x 1 14 / 21

62 Our algorithm Example: k = 2 x 1 14 / 21

63 Our algorithm Example: k = 2 x 1 14 / 21

64 Our algorithm Example: k = 2 x 1 14 / 21

65 Our algorithm Example: k = 2 x 1 14 / 21

66 Our algorithm Example: k = 2 x 1 14 / 21

67 Our algorithm Remarks 15 / 21

68 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. 15 / 21

69 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. Can get a smooth tradeoff between runtime and number of pieces see paper for details. 15 / 21

70 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. Can get a smooth tradeoff between runtime and number of pieces see paper for details. The error rule we use requires knowledge of σ 2 we also give a similar algorithm which (up to log factors) matches the same guarantees as before which requires no such knowledge. 15 / 21

71 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. Can get a smooth tradeoff between runtime and number of pieces see paper for details. The error rule we use requires knowledge of σ 2 we also give a similar algorithm which (up to log factors) matches the same guarantees as before which requires no such knowledge. Can show that our algorithms are robust to model misspecification. 15 / 21

72 Experiments Outline Introduction The exact algorithm Our algorithm Experiments 15 / 21

73 Experiments Experiments: piecewise constant MSE Relative MSE ratio n Running time (s) Speed-up n n n 10 4 Merging k Merging 2k Merging 4k Exact DP 16 / 21

74 Experiments Experiments: piecewise linear MSE Relative MSE ratio n Running time (s) Speed-up n n n 10 4 Merging k Merging 2k Merging 4k Exact DP 17 / 21

75 Experiments Experiments: time vs error trade-off 10 0 Piecewise constant 10 0 Piecewise linear MSE 10 2 MSE Time (s) Time (s) Merging k Merging 2k Merging 4k Exact DP 18 / 21

76 Experiments Experiments: real data Dow Jones data index value time Dow Jones Exact DP Merging 19 / 21

77 Experiments Experiments 20 / 21

78 Experiments Experiments Our algorithm performs 1000 faster with n = / 21

79 Experiments Experiments Our algorithm performs 1000 faster with n = 10 4 Our algorithm s MSE on synthetic data was 2 4 times worse than the DP s 20 / 21

80 Experiments Experiments Our algorithm performs 1000 faster with n = 10 4 Our algorithm s MSE on synthetic data was 2 4 times worse than the DP s Given enough data, we get the same MSE 100 faster 20 / 21

81 Conclusions Conclusions 21 / 21

82 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? 21 / 21

83 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime 21 / 21

84 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime Experimentally our algorithm has slightly worse MSE, but runs 1000 faster. 21 / 21

85 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime Experimentally our algorithm has slightly worse MSE, but runs 1000 faster. Open Question: Is this tradeoff necessary? 21 / 21

86 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime Experimentally our algorithm has slightly worse MSE, but runs 1000 faster. Open Question: Is this tradeoff necessary? Thank you! 21 / 21

Fast and Near Optimal Algorithms for Approximating Distributions by Histograms

Fast and Near Optimal Algorithms for Approximating Distributions by Histograms Fast and Near Optimal Algorithms for Approximating Distributions by Histograms Jayadev Acharya MIT jayadev@csail.mit.edu Ilias Diakonikolas University of Edinburgh ilias.d@ed.ac.uk Chinmay Hegde MIT chinmay@csail.mit.edu

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Greedy vs Dynamic Programming Approach

Greedy vs Dynamic Programming Approach Greedy vs Dynamic Programming Approach Outline Compare the methods Knapsack problem Greedy algorithms for 0/1 knapsack An approximation algorithm for 0/1 knapsack Optimal greedy algorithm for knapsack

More information

Sample-Optimal Density Estimation in Nearly-Linear Time

Sample-Optimal Density Estimation in Nearly-Linear Time Sample-Optimal Density Estimation in Nearly-Linear Time Jayadev Acharya EECS, MIT jayadev@csail.mit.edu Jerry Li EECS, MIT jerryzli@csail.mit.edu Ilias Diakonikolas CS, USC diakonik@usc.edu Ludwig Schmidt

More information

Sample-Optimal Density Estimation in Nearly-Linear Time

Sample-Optimal Density Estimation in Nearly-Linear Time Sample-Optimal Density Estimation in Nearly-Linear Time Jayadev Acharya EECS, MIT jayadev@csail.mit.edu Jerry Li EECS, MIT jerryzli@csail.mit.edu Ilias Diakonikolas Informatics, U. of Edinburgh ilias.d@ed.ac.uk

More information

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k

A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k Jerry Li MIT jerryzli@mit.edu Ludwig Schmidt MIT ludwigs@mit.edu June 27, 205 Abstract Learning

More information

Approximation Algorithms (Load Balancing)

Approximation Algorithms (Load Balancing) July 6, 204 Problem Definition : We are given a set of n jobs {J, J 2,..., J n }. Each job J i has a processing time t i 0. We are given m identical machines. Problem Definition : We are given a set of

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

Wild Binary Segmentation for multiple change-point detection

Wild Binary Segmentation for multiple change-point detection for multiple change-point detection Piotr Fryzlewicz p.fryzlewicz@lse.ac.uk Department of Statistics, London School of Economics, UK Isaac Newton Institute, 14 January 2014 Segmentation in a simple function

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems

An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems Jason Ansel Maciej Pacula Saman Amarasinghe Una-May O Reilly MIT - CSAIL July 14, 2011 Jason Ansel (MIT) PetaBricks July

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Latent Feature Lasso

Latent Feature Lasso Latent Feature Lasso Ian E.H. Yen, Wei-Cheng Lee, Sung-En Chang, Arun Suggala, Shou-De Lin and Pradeep Ravikumar. Carnegie Mellon University National Taiwan University 1 / 16 Latent Feature Models Latent

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Least Squares Approximation

Least Squares Approximation Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start

More information

Communication-efficient and Differentially-private Distributed SGD

Communication-efficient and Differentially-private Distributed SGD 1/36 Communication-efficient and Differentially-private Distributed SGD Ananda Theertha Suresh with Naman Agarwal, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Google Research November 16, 2018 2/36 Outline

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

Efficiently decodable codes for the binary deletion channel

Efficiently decodable codes for the binary deletion channel Efficiently decodable codes for the binary deletion channel Venkatesan Guruswami (venkatg@cs.cmu.edu) Ray Li * (rayyli@stanford.edu) Carnegie Mellon University August 18, 2017 V. Guruswami and R. Li (CMU)

More information

Fast and Accurate Causal Inference from Time Series Data

Fast and Accurate Causal Inference from Time Series Data Fast and Accurate Causal Inference from Time Series Data Yuxiao Huang and Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {yuxiao.huang, samantha.kleinberg}@stevens.edu Abstract Causal inference

More information

Efficient and Optimal Modal-set Estimation using knn graphs

Efficient and Optimal Modal-set Estimation using knn graphs Efficient and Optimal Modal-set Estimation using knn graphs Samory Kpotufe ORFE, Princeton University Based on various results with Sanjoy Dasgupta, Kamalika Chaudhuri, Ulrike von Luxburg, Heinrich Jiang

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

HOMEWORK ANALYSIS #2 - STOPPING DISTANCE

HOMEWORK ANALYSIS #2 - STOPPING DISTANCE HOMEWORK ANALYSIS #2 - STOPPING DISTANCE Total Points Possible: 35 1. In your own words, summarize the overarching problem and any specific questions that need to be answered using the stopping distance

More information

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II

CS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II CS 5630/6630 Scientific Visualization Elementary Plotting Techniques II Motivation Given a certain type of data, what plotting technique should I use? What plotting techniques should be avoided? How do

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Learning and Testing Structured Distributions

Learning and Testing Structured Distributions Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust

More information

Combining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models

Combining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models Combining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models Benjamin Peherstorfer a,, Boris Kramer a, Karen Willcox a a Department of Aeronautics

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Rong Zhu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. rongzhu@amss.ac.cn Abstract

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Computational Complexity

Computational Complexity Computational Complexity S. V. N. Vishwanathan, Pinar Yanardag January 8, 016 1 Computational Complexity: What, Why, and How? Intuitively an algorithm is a well defined computational procedure that takes

More information

The Optimal Mechanism in Differential Privacy

The Optimal Mechanism in Differential Privacy The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1 Outline Background on differential privacy Problem formulation

More information

Importance Reweighting Using Adversarial-Collaborative Training

Importance Reweighting Using Adversarial-Collaborative Training Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a

More information

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex

More information

CSE 421 Greedy Algorithms / Interval Scheduling

CSE 421 Greedy Algorithms / Interval Scheduling CSE 421 Greedy Algorithms / Interval Scheduling Yin Tat Lee 1 Interval Scheduling Job j starts at s(j) and finishes at f(j). Two jobs compatible if they don t overlap. Goal: find maximum subset of mutually

More information

Mario A. Nascimento. Univ. of Alberta, Canada http: //

Mario A. Nascimento. Univ. of Alberta, Canada http: // DATA CACHING IN W Mario A. Nascimento Univ. of Alberta, Canada http: //www.cs.ualberta.ca/~mn With R. Alencar and A. Brayner. Work partially supported by NSERC and CBIE (Canada) and CAPES (Brazil) Outline

More information

PetaBricks: Variable Accuracy and Online Learning

PetaBricks: Variable Accuracy and Online Learning PetaBricks: Variable Accuracy and Online Learning Jason Ansel MIT - CSAIL May 4, 2011 Jason Ansel (MIT) PetaBricks May 4, 2011 1 / 40 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

How much should we rely on Besov spaces as a framework for the mathematical study of images?

How much should we rely on Besov spaces as a framework for the mathematical study of images? How much should we rely on Besov spaces as a framework for the mathematical study of images? C. Sinan Güntürk Princeton University, Program in Applied and Computational Mathematics Abstract Relations between

More information

An intro to lattices and learning with errors

An intro to lattices and learning with errors A way to keep your secrets secret in a post-quantum world Some images in this talk authored by me Many, excellent lattice images in this talk authored by Oded Regev and available in papers and surveys

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Planning and Model Selection in Data Driven Markov models

Planning and Model Selection in Data Driven Markov models Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion),

More information

The optimistic principle applied to function optimization

The optimistic principle applied to function optimization The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:

More information

Differentially Private Learning of Structured Discrete Distributions

Differentially Private Learning of Structured Discrete Distributions Differentially Private Learning of Structured Discrete Distributions Ilias Diakonikolas University of Edinburgh Moritz Hardt Google Research Ludwig Schmidt MIT Abstract We investigate the problem of learning

More information

Divide and Conquer Strategy

Divide and Conquer Strategy Divide and Conquer Strategy Algorithm design is more an art, less so a science. There are a few useful strategies, but no guarantee to succeed. We will discuss: Divide and Conquer, Greedy, Dynamic Programming.

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14 Divide and Conquer Algorithms CSE 101: Design and Analysis of Algorithms Lecture 14 CSE 101: Design and analysis of algorithms Divide and conquer algorithms Reading: Sections 2.3 and 2.4 Homework 6 will

More information

An Analytic Solution to Discrete Bayesian Reinforcement Learning

An Analytic Solution to Discrete Bayesian Reinforcement Learning An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms CSE 0, Winter 08 Design and Analysis of Algorithms Lecture 8: Consolidation # (DP, Greed, NP-C, Flow) Class URL: http://vlsicad.ucsd.edu/courses/cse0-w8/ Followup on IGO, Annealing Iterative Global Optimization

More information

CSE 417: Algorithms and Computational Complexity

CSE 417: Algorithms and Computational Complexity CSE 417: Algorithms and Computational Complexity Lecture 2: Analysis Larry Ruzzo 1 Why big-o: measuring algorithm efficiency outline What s big-o: definition and related concepts Reasoning with big-o:

More information

Truncated Max-of-Convex Models Technical Report

Truncated Max-of-Convex Models Technical Report Truncated Max-of-Convex Models Technical Report Pankaj Pansari University of Oxford The Alan Turing Institute pankaj@robots.ox.ac.uk M. Pawan Kumar University of Oxford The Alan Turing Institute pawan@robots.ox.ac.uk

More information

Nonlinear Least Squares

Nonlinear Least Squares Nonlinear Least Squares Stephen Boyd EE103 Stanford University December 6, 2016 Outline Nonlinear equations and least squares Examples Levenberg-Marquardt algorithm Nonlinear least squares classification

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

Constructing Approximation Kernels for Non-Harmonic Fourier Data

Constructing Approximation Kernels for Non-Harmonic Fourier Data Constructing Approximation Kernels for Non-Harmonic Fourier Data Aditya Viswanathan aditya.v@caltech.edu California Institute of Technology SIAM Annual Meeting 2013 July 10 2013 0 / 19 Joint work with

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks

Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks E. Wanufelle 1 S. Leyffer 2 A. Sartenaer 1 Ph. Toint 1 1 FUNDP, University of Namur 2

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b) Introduction to Algorithms October 14, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Srini Devadas and Constantinos (Costis) Daskalakis Quiz 1 Solutions Quiz 1 Solutions Problem

More information

This means that we can assume each list ) is

This means that we can assume each list ) is This means that we can assume each list ) is of the form ),, ( )with < and Since the sizes of the items are integers, there are at most +1pairs in each list Furthermore, if we let = be the maximum possible

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

On Computational Thinking, Inferential Thinking and Data Science

On Computational Thinking, Inferential Thinking and Data Science On Computational Thinking, Inferential Thinking and Data Science Michael I. Jordan University of California, Berkeley December 17, 2016 A Job Description, circa 2015 Your Boss: I need a Big Data system

More information

Towards stability and optimality in stochastic gradient descent

Towards stability and optimality in stochastic gradient descent Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 2 Luca Trevisan August 29, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 2 Luca Trevisan August 29, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout Luca Trevisan August 9, 07 Scribe: Mahshid Montazer Lecture In this lecture, we study the Max Cut problem in random graphs. We compute the probable

More information

Constructing comprehensive summaries of large event sequences

Constructing comprehensive summaries of large event sequences Constructing comprehensive summaries of large event sequences JERRY KIERNAN IBM Silicon Valley Lab and EVIMARIA TERZI IBM Almaden Research Center Event sequences capture system and user activity over time.

More information

ASYMPTOTIC COMPLEXITY SEARCHING/SORTING

ASYMPTOTIC COMPLEXITY SEARCHING/SORTING Quotes about loops O! Thou hast damnable iteration and art, indeed, able to corrupt a saint. Shakespeare, Henry IV, Pt I, 1 ii Use not vain repetition, as the heathen do. Matthew V, 48 Your if is the only

More information

Performance and Scalability. Lars Karlsson

Performance and Scalability. Lars Karlsson Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix

More information

EECS 477: Introduction to algorithms. Lecture 12

EECS 477: Introduction to algorithms. Lecture 12 EECS 477: Introduction to algorithms. Lecture 12 Prof. Igor Guskov guskov@eecs.umich.edu October 17, 2002 1 Lecture outline: greedy algorithms II Knapsack Scheduling minimizing time with deadlines 2 Greedy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Failures of Gradient-Based Deep Learning

Failures of Gradient-Based Deep Learning Failures of Gradient-Based Deep Learning Shai Shalev-Shwartz, Shaked Shammah, Ohad Shamir The Hebrew University and Mobileye Representation Learning Workshop Simons Institute, Berkeley, 2017 Shai Shalev-Shwartz

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Scalable machine learning for massive datasets: Fast summation algorithms

Scalable machine learning for massive datasets: Fast summation algorithms Scalable machine learning for massive datasets: Fast summation algorithms Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Kernels to detect abrupt changes in time series

Kernels to detect abrupt changes in time series 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES

More information

Convex Optimization in Computer Vision:

Convex Optimization in Computer Vision: Convex Optimization in Computer Vision: Segmentation and Multiview 3D Reconstruction Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Great Theoretical Ideas in Computer Science. Lecture 9: Introduction to Computational Complexity

Great Theoretical Ideas in Computer Science. Lecture 9: Introduction to Computational Complexity 15-251 Great Theoretical Ideas in Computer Science Lecture 9: Introduction to Computational Complexity February 14th, 2017 Poll What is the running time of this algorithm? Choose the tightest bound. def

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information