Fast Algorithms for Segmented Regression
|
|
- Bruce McLaughlin
- 6 years ago
- Views:
Transcription
1 Fast Algorithms for Segmented Regression Jayadev Acharya 1 Ilias Diakonikolas 2 Jerry Li 1 Ludwig Schmidt 1 1 MIT 2 USC June 21, / 21
2 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? 2 / 21
3 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: 2 / 21
4 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: Estimator A: Great statistical rate, but slow to compute Estimator B: Worse statistical rate, but faster to compute 2 / 21
5 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: Estimator A: Great statistical rate, but slow to compute Estimator B: Worse statistical rate, but faster to compute When is it better to use estimator B vs estimator A? 2 / 21
6 Statistical vs computational tradeoffs? General Motivating Question When is it worth it to trade statistical efficiency for runtime? Given two estimators: Estimator A: Great statistical rate, but slow to compute Estimator B: Worse statistical rate, but faster to compute When is it better to use estimator B vs estimator A? As data grows, it may be beneficial to consider faster inferential algorithms, because the increasing statistical strength of the data can compensate for the poor algorithmic quality. [Jor13] 2 / 21
7 Introduction Outline Introduction The exact algorithm Our algorithm Experiments 2 / 21
8 Introduction Outline Introduction The exact algorithm Our algorithm Experiments 2 / 21
9 Introduction Linear regression 3 / 21
10 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), 3 / 21
11 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. 3 / 21
12 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. Assume that ɛ (i) are independent noise variables. 3 / 21
13 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. Assume that ɛ (i) are independent noise variables. Goal: Find a linear l(x) minimizing MSE(l) = 1 n n (l(x (i) ) l (x (i) )) 2. i=1 3 / 21
14 Introduction Linear regression We are given a labelled data set (x (1), y (1) ),..., (x (n), y (n) ) R d R so that y (i) = l (x (i) ) + ɛ (i), l (x) = θ, x is an unknown linear function that we want to recover. Assume that ɛ (i) are independent noise variables. Goal: Find a linear l(x) minimizing MSE(l) = 1 n n (l(x (i) ) l (x (i) )) 2. i=1 We consider fixed design regression: we assume the x (i) are fixed, and the only randomness is over the ɛ (i). 3 / 21
15 Introduction The least squares estimator Definition (Least squares estimator) The least squares estimator, denoted l LS, is given by: l LS def 1 = arg min l linear n n (y (i) l(x (i) )) 2. i=1 4 / 21
16 Introduction The least squares estimator Definition (Least squares estimator) The least squares estimator, denoted l LS, is given by: l LS def 1 = arg min l linear n n (y (i) l(x (i) )) 2. i=1 The least squares fit simply the best fit linear function to the data. 4 / 21
17 Introduction The least squares estimator Definition (Least squares estimator) The least squares estimator, denoted l LS, is given by: l LS def 1 = arg min l linear n n (y (i) l(x (i) )) 2. i=1 The least squares fit simply the best fit linear function to the data. How well does it recover the ground truth l? 4 / 21
18 Introduction The least squares estimator Theorem Let l LS be as above. Suppose that ɛ (i) N (0, σ 2 ). Then with high probability, ( MSE( l LS ) = O σ 2 d ). n Moreover, l LS can be computed in time O(nd 2 ). 5 / 21
19 Introduction The least squares estimator Theorem Let l LS be as above. Suppose that ɛ (i) N (0, σ 2 ). Then with high probability, ( MSE( l LS ) = O σ 2 d ). n Moreover, l LS can be computed in time O(nd 2 ). More recent work (see e.g. [CW13]) gets even faster theoretical runtimes. 5 / 21
20 Introduction Dealing with change What if linear regression is insufficient? 6 / 21
21 Introduction Dealing with change What if linear regression is insufficient? Dow Jones data 6 / 21
22 Introduction Dealing with change What if linear regression is insufficient? Dow Jones data Q: What if your model changes as a function of one of your variables? 6 / 21
23 Introduction Dealing with change What if linear regression is insufficient? Dow Jones data Q: What if your model changes as a function of one of your variables? A: Model it with a piecewise linear fit! 6 / 21
24 Introduction Segmented Regression 7 / 21
25 Introduction Segmented Regression Definition (Piecewise linearity) A function f : R d R is k-piecewise linear if there exists a partition of R into k intervals I 1,..., I k so that for all j, the function f is linear restricted to the set of x R d so that x 1 I j. 7 / 21
26 Introduction Segmented Regression Definition (Piecewise linearity) A function f : R d R is k-piecewise linear if there exists a partition of R into k intervals I 1,..., I k so that for all j, the function f is linear restricted to the set of x R d so that x 1 I j. Segmented Regression [BP98, YP13] Given a data set (x (1), y (1) ),..., (x (n), y (n) ) so that y (i) = f (x (i) ) + ɛ (i), where ɛ (i) are independent noise and f is k-piecewise linear, recover f in MSE. 7 / 21
27 The exact algorithm Outline Introduction The exact algorithm Our algorithm Experiments 8 / 21
28 The exact algorithm The k-piecewise linear LS estimator Definition (Least squares estimator) The k-piecewise linear least squares estimator, denoted f LS k def 1 = arg min f k-piecewise linear n n (y (i) f (x (i) )) 2. i=1 f LS k, is given by: 9 / 21
29 The exact algorithm The k-piecewise linear LS estimator Definition (Least squares estimator) The k-piecewise linear least squares estimator, denoted Theorem f LS f LS k def 1 = arg min f k-piecewise linear n n (y (i) f (x (i) )) 2. i=1 f LS k, is given by: Let k be as above. Suppose that ɛ (i) N (0, σ 2 ). Then with high probability, ( MSE( f k LS ) = O σ 2 kd ). n Moreover, this rate is optimal. 9 / 21
30 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? 10 / 21
31 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? Theorem (BP98) There is a dynamic program for computing the k-piecewise linear LS estimator on n samples in d dimensions which runs in time O(n 2 (d 2 + k)). 10 / 21
32 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? Theorem (BP98) There is a dynamic program for computing the k-piecewise linear LS estimator on n samples in d dimensions which runs in time O(n 2 (d 2 + k)). So poly time...but quite slow as n gets large. 10 / 21
33 The exact algorithm The k-piecewise linear LS estimator, computationally How fast can you compute this estimator? Theorem (BP98) There is a dynamic program for computing the k-piecewise linear LS estimator on n samples in d dimensions which runs in time O(n 2 (d 2 + k)). So poly time...but quite slow as n gets large. The algorithm took Θ(1) minutes to run for 10 4 samples, and Θ(1) hours to run for 10 5 samples. 10 / 21
34 Our algorithm Outline Introduction The exact algorithm Our algorithm Experiments 10 / 21
35 Our algorithm Our Results Main Result (informally) An algorithm for segmented regression which runs in time which is linear in n / 21
36 Our algorithm Our Results Main Result (informally) An algorithm for segmented regression which runs in time which is linear in n...but has a worse theoretical statistical guarantee. 11 / 21
37 Our algorithm Our Results Main Result (informally) An algorithm for segmented regression which runs in time which is linear in n...but has a worse theoretical statistical guarantee. Formally... Theorem There is an 4k-piecewise linear estimator f which can be computed in time O(n(d 2 + k)) so that w.h.p. ( MSE( f ) Õ σ 2 kd ) k n + σ. n 11 / 21
38 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime 12 / 21
39 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP 12 / 21
40 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP O ( σ 2 kd n ) 12 / 21
41 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP O ( σ 2 kd n ) O(n 2 (d 2 + k)) 12 / 21
42 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd n ) O(n 2 (d 2 + k)) 12 / 21
43 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) 12 / 21
44 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) 12 / 21
45 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? 12 / 21
46 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? ( ) DP: O σ 2 k2 d 2 (d 2 + k) ɛ 2 12 / 21
47 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? ( ) DP: O σ 2 k2 d 2 (d 2 + k) ɛ 2 Our Results: Õ ( σ 2 k ɛ max ( d, 1 ) ɛ (d 2 + k) ) 12 / 21
48 Our algorithm Compare and contrast Algorithm Statistical Rate Runtime DP Our Results O ( σ 2 kd ) n (σ Õ 2 kdn + σ k n ) O(n 2 (d 2 + k)) O(n(d 2 + k)) Given enough data, how much time does it take to get some target accuracy ɛ? ( ) DP: O σ 2 k2 d 2 (d 2 + k) ɛ 2 Our Results: Õ ( σ 2 k ɛ max ( d, 1 ) ɛ (d 2 + k) ) Speedup: Õ(min ( kd ɛ, kd 2) ) 12 / 21
49 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). 13 / 21
50 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) / 21
51 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 (1) (n) }, {x 2 },..., {x 1 }}. 13 / 21
52 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) (1) (n) 1 }, {x 2 },..., {x 1 }}. While I j > 4k: 13 / 21
53 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. 13 / 21
54 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. 13 / 21
55 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. 13 / 21
56 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u 13 / 21
57 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: 13 / 21
58 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: If u L, include I 2u 1 I 2u in I j+1 13 / 21
59 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: If u L, include I 2u 1 I 2u in I j+1 Else if u L include I 2u 1 and I 2u in I j / 21
60 Our algorithm The Greedy Merging Algorithm Input: A labelled data set (x (1), y (1) ),... (x (n), y (n) ). Sort them so that x (1) 1 x (2) 1... x (n) 1. Let I 0 {{x (1) 1 While I j > 4k: Let I j = I 1,..., I s. (1) (n) }, {x 2 },..., {x 1 }}. Pair up consecutive intervals: J u = I 2u 1 I 2u, u = 1,..., s/2. For each J u, compute the least squares fit for all data points in J u, and an error quantity e u. Let L be the set of 2k u s with largest e u For u = 1,..., s/2: If u L, include I 2u 1 I 2u in I j+1 Else if u L include I 2u 1 and I 2u in I j+1. Output The linear least squares fit over each interval in I j 13 / 21
61 Our algorithm Example: k = 2 x 1 14 / 21
62 Our algorithm Example: k = 2 x 1 14 / 21
63 Our algorithm Example: k = 2 x 1 14 / 21
64 Our algorithm Example: k = 2 x 1 14 / 21
65 Our algorithm Example: k = 2 x 1 14 / 21
66 Our algorithm Example: k = 2 x 1 14 / 21
67 Our algorithm Remarks 15 / 21
68 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. 15 / 21
69 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. Can get a smooth tradeoff between runtime and number of pieces see paper for details. 15 / 21
70 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. Can get a smooth tradeoff between runtime and number of pieces see paper for details. The error rule we use requires knowledge of σ 2 we also give a similar algorithm which (up to log factors) matches the same guarantees as before which requires no such knowledge. 15 / 21
71 Our algorithm Remarks Algorithmically similar to an algorithm due to [ADHLS15] for histogram approximation however analysis is quite different and more involved here. Can get a smooth tradeoff between runtime and number of pieces see paper for details. The error rule we use requires knowledge of σ 2 we also give a similar algorithm which (up to log factors) matches the same guarantees as before which requires no such knowledge. Can show that our algorithms are robust to model misspecification. 15 / 21
72 Experiments Outline Introduction The exact algorithm Our algorithm Experiments 15 / 21
73 Experiments Experiments: piecewise constant MSE Relative MSE ratio n Running time (s) Speed-up n n n 10 4 Merging k Merging 2k Merging 4k Exact DP 16 / 21
74 Experiments Experiments: piecewise linear MSE Relative MSE ratio n Running time (s) Speed-up n n n 10 4 Merging k Merging 2k Merging 4k Exact DP 17 / 21
75 Experiments Experiments: time vs error trade-off 10 0 Piecewise constant 10 0 Piecewise linear MSE 10 2 MSE Time (s) Time (s) Merging k Merging 2k Merging 4k Exact DP 18 / 21
76 Experiments Experiments: real data Dow Jones data index value time Dow Jones Exact DP Merging 19 / 21
77 Experiments Experiments 20 / 21
78 Experiments Experiments Our algorithm performs 1000 faster with n = / 21
79 Experiments Experiments Our algorithm performs 1000 faster with n = 10 4 Our algorithm s MSE on synthetic data was 2 4 times worse than the DP s 20 / 21
80 Experiments Experiments Our algorithm performs 1000 faster with n = 10 4 Our algorithm s MSE on synthetic data was 2 4 times worse than the DP s Given enough data, we get the same MSE 100 faster 20 / 21
81 Conclusions Conclusions 21 / 21
82 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? 21 / 21
83 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime 21 / 21
84 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime Experimentally our algorithm has slightly worse MSE, but runs 1000 faster. 21 / 21
85 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime Experimentally our algorithm has slightly worse MSE, but runs 1000 faster. Open Question: Is this tradeoff necessary? 21 / 21
86 Conclusions Conclusions When is it worth it to trade statistical effectiveness for algorithmic efficiency? We give an algorithm for segmented regression that gets a worse theoretical MSE, but a much faster runtime Experimentally our algorithm has slightly worse MSE, but runs 1000 faster. Open Question: Is this tradeoff necessary? Thank you! 21 / 21
Fast and Near Optimal Algorithms for Approximating Distributions by Histograms
Fast and Near Optimal Algorithms for Approximating Distributions by Histograms Jayadev Acharya MIT jayadev@csail.mit.edu Ilias Diakonikolas University of Edinburgh ilias.d@ed.ac.uk Chinmay Hegde MIT chinmay@csail.mit.edu
More informationComputational Lower Bounds for Statistical Estimation Problems
Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationGreedy vs Dynamic Programming Approach
Greedy vs Dynamic Programming Approach Outline Compare the methods Knapsack problem Greedy algorithms for 0/1 knapsack An approximation algorithm for 0/1 knapsack Optimal greedy algorithm for knapsack
More informationSample-Optimal Density Estimation in Nearly-Linear Time
Sample-Optimal Density Estimation in Nearly-Linear Time Jayadev Acharya EECS, MIT jayadev@csail.mit.edu Jerry Li EECS, MIT jerryzli@csail.mit.edu Ilias Diakonikolas CS, USC diakonik@usc.edu Ludwig Schmidt
More informationSample-Optimal Density Estimation in Nearly-Linear Time
Sample-Optimal Density Estimation in Nearly-Linear Time Jayadev Acharya EECS, MIT jayadev@csail.mit.edu Jerry Li EECS, MIT jerryzli@csail.mit.edu Ilias Diakonikolas Informatics, U. of Edinburgh ilias.d@ed.ac.uk
More informationA Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k
A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k Jerry Li MIT jerryzli@mit.edu Ludwig Schmidt MIT ludwigs@mit.edu June 27, 205 Abstract Learning
More informationApproximation Algorithms (Load Balancing)
July 6, 204 Problem Definition : We are given a set of n jobs {J, J 2,..., J n }. Each job J i has a processing time t i 0. We are given m identical machines. Problem Definition : We are given a set of
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationDetecting Sparse Structures in Data in Sub-Linear Time: A group testing approach
Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro
More informationSeq2Seq Losses (CTC)
Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss
More informationWild Binary Segmentation for multiple change-point detection
for multiple change-point detection Piotr Fryzlewicz p.fryzlewicz@lse.ac.uk Department of Statistics, London School of Economics, UK Isaac Newton Institute, 14 January 2014 Segmentation in a simple function
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationAn Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems
An Efficient Evolutionary Algorithm for Solving Incrementally Structured Problems Jason Ansel Maciej Pacula Saman Amarasinghe Una-May O Reilly MIT - CSAIL July 14, 2011 Jason Ansel (MIT) PetaBricks July
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationLatent Feature Lasso
Latent Feature Lasso Ian E.H. Yen, Wei-Cheng Lee, Sung-En Chang, Arun Suggala, Shou-De Lin and Pradeep Ravikumar. Carnegie Mellon University National Taiwan University 1 / 16 Latent Feature Models Latent
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationLeast Squares Approximation
Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start
More informationCommunication-efficient and Differentially-private Distributed SGD
1/36 Communication-efficient and Differentially-private Distributed SGD Ananda Theertha Suresh with Naman Agarwal, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Google Research November 16, 2018 2/36 Outline
More informationCMPUT651: Differential Privacy
CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged
More informationEfficiently decodable codes for the binary deletion channel
Efficiently decodable codes for the binary deletion channel Venkatesan Guruswami (venkatg@cs.cmu.edu) Ray Li * (rayyli@stanford.edu) Carnegie Mellon University August 18, 2017 V. Guruswami and R. Li (CMU)
More informationFast and Accurate Causal Inference from Time Series Data
Fast and Accurate Causal Inference from Time Series Data Yuxiao Huang and Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {yuxiao.huang, samantha.kleinberg}@stevens.edu Abstract Causal inference
More informationEfficient and Optimal Modal-set Estimation using knn graphs
Efficient and Optimal Modal-set Estimation using knn graphs Samory Kpotufe ORFE, Princeton University Based on various results with Sanjoy Dasgupta, Kamalika Chaudhuri, Ulrike von Luxburg, Heinrich Jiang
More informationSegmentation of the mean of heteroscedastic data via cross-validation
Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationHOMEWORK ANALYSIS #2 - STOPPING DISTANCE
HOMEWORK ANALYSIS #2 - STOPPING DISTANCE Total Points Possible: 35 1. In your own words, summarize the overarching problem and any specific questions that need to be answered using the stopping distance
More informationCS 5630/6630 Scientific Visualization. Elementary Plotting Techniques II
CS 5630/6630 Scientific Visualization Elementary Plotting Techniques II Motivation Given a certain type of data, what plotting technique should I use? What plotting techniques should be avoided? How do
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationLearning and Testing Structured Distributions
Learning and Testing Structured Distributions Ilias Diakonikolas University of Edinburgh Simons Institute March 2015 This Talk Algorithmic Framework for Distribution Estimation: Leads to fast & robust
More informationCombining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models
Combining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models Benjamin Peherstorfer a,, Boris Kramer a, Karen Willcox a a Department of Aeronautics
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationGradient-based Sampling: An Adaptive Importance Sampling for Least-squares
Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Rong Zhu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. rongzhu@amss.ac.cn Abstract
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationComputational Complexity
Computational Complexity S. V. N. Vishwanathan, Pinar Yanardag January 8, 016 1 Computational Complexity: What, Why, and How? Intuitively an algorithm is a well defined computational procedure that takes
More informationThe Optimal Mechanism in Differential Privacy
The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 3/29/2013 PhD Prelimary Exam of Quan Geng, ECE, UIUC 1 Outline Background on differential privacy Problem formulation
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationFaster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians. Constantinos Daskalakis, MIT Gautam Kamath, MIT
Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT What s a Gaussian Mixture Model (GMM)? Interpretation 1: PDF is a convex
More informationCSE 421 Greedy Algorithms / Interval Scheduling
CSE 421 Greedy Algorithms / Interval Scheduling Yin Tat Lee 1 Interval Scheduling Job j starts at s(j) and finishes at f(j). Two jobs compatible if they don t overlap. Goal: find maximum subset of mutually
More informationMario A. Nascimento. Univ. of Alberta, Canada http: //
DATA CACHING IN W Mario A. Nascimento Univ. of Alberta, Canada http: //www.cs.ualberta.ca/~mn With R. Alencar and A. Brayner. Work partially supported by NSERC and CBIE (Canada) and CAPES (Brazil) Outline
More informationPetaBricks: Variable Accuracy and Online Learning
PetaBricks: Variable Accuracy and Online Learning Jason Ansel MIT - CSAIL May 4, 2011 Jason Ansel (MIT) PetaBricks May 4, 2011 1 / 40 Outline 1 Motivating Example 2 PetaBricks Language Overview 3 Variable
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationHow much should we rely on Besov spaces as a framework for the mathematical study of images?
How much should we rely on Besov spaces as a framework for the mathematical study of images? C. Sinan Güntürk Princeton University, Program in Applied and Computational Mathematics Abstract Relations between
More informationAn intro to lattices and learning with errors
A way to keep your secrets secret in a post-quantum world Some images in this talk authored by me Many, excellent lattice images in this talk authored by Oded Regev and available in papers and surveys
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationPlanning and Model Selection in Data Driven Markov models
Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion),
More informationThe optimistic principle applied to function optimization
The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:
More informationDifferentially Private Learning of Structured Discrete Distributions
Differentially Private Learning of Structured Discrete Distributions Ilias Diakonikolas University of Edinburgh Moritz Hardt Google Research Ludwig Schmidt MIT Abstract We investigate the problem of learning
More informationDivide and Conquer Strategy
Divide and Conquer Strategy Algorithm design is more an art, less so a science. There are a few useful strategies, but no guarantee to succeed. We will discuss: Divide and Conquer, Greedy, Dynamic Programming.
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationDivide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14
Divide and Conquer Algorithms CSE 101: Design and Analysis of Algorithms Lecture 14 CSE 101: Design and analysis of algorithms Divide and conquer algorithms Reading: Sections 2.3 and 2.4 Homework 6 will
More informationAn Analytic Solution to Discrete Bayesian Reinforcement Learning
An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More information16.4 Multiattribute Utility Functions
285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate
More informationDesign and Analysis of Algorithms
CSE 0, Winter 08 Design and Analysis of Algorithms Lecture 8: Consolidation # (DP, Greed, NP-C, Flow) Class URL: http://vlsicad.ucsd.edu/courses/cse0-w8/ Followup on IGO, Annealing Iterative Global Optimization
More informationCSE 417: Algorithms and Computational Complexity
CSE 417: Algorithms and Computational Complexity Lecture 2: Analysis Larry Ruzzo 1 Why big-o: measuring algorithm efficiency outline What s big-o: definition and related concepts Reasoning with big-o:
More informationTruncated Max-of-Convex Models Technical Report
Truncated Max-of-Convex Models Technical Report Pankaj Pansari University of Oxford The Alan Turing Institute pankaj@robots.ox.ac.uk M. Pawan Kumar University of Oxford The Alan Turing Institute pawan@robots.ox.ac.uk
More informationNonlinear Least Squares
Nonlinear Least Squares Stephen Boyd EE103 Stanford University December 6, 2016 Outline Nonlinear equations and least squares Examples Levenberg-Marquardt algorithm Nonlinear least squares classification
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationConstructing Approximation Kernels for Non-Harmonic Fourier Data
Constructing Approximation Kernels for Non-Harmonic Fourier Data Aditya Viswanathan aditya.v@caltech.edu California Institute of Technology SIAM Annual Meeting 2013 July 10 2013 0 / 19 Joint work with
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationDevelopment of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks
Development of an algorithm for solving mixed integer and nonconvex problems arising in electrical supply networks E. Wanufelle 1 S. Leyffer 2 A. Sartenaer 1 Ph. Toint 1 1 FUNDP, University of Namur 2
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationQuiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)
Introduction to Algorithms October 14, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Srini Devadas and Constantinos (Costis) Daskalakis Quiz 1 Solutions Quiz 1 Solutions Problem
More informationThis means that we can assume each list ) is
This means that we can assume each list ) is of the form ),, ( )with < and Since the sizes of the items are integers, there are at most +1pairs in each list Furthermore, if we let = be the maximum possible
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationOn Computational Thinking, Inferential Thinking and Data Science
On Computational Thinking, Inferential Thinking and Data Science Michael I. Jordan University of California, Berkeley December 17, 2016 A Job Description, circa 2015 Your Boss: I need a Big Data system
More informationTowards stability and optimality in stochastic gradient descent
Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationBayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems
Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 2 Luca Trevisan August 29, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout Luca Trevisan August 9, 07 Scribe: Mahshid Montazer Lecture In this lecture, we study the Max Cut problem in random graphs. We compute the probable
More informationConstructing comprehensive summaries of large event sequences
Constructing comprehensive summaries of large event sequences JERRY KIERNAN IBM Silicon Valley Lab and EVIMARIA TERZI IBM Almaden Research Center Event sequences capture system and user activity over time.
More informationASYMPTOTIC COMPLEXITY SEARCHING/SORTING
Quotes about loops O! Thou hast damnable iteration and art, indeed, able to corrupt a saint. Shakespeare, Henry IV, Pt I, 1 ii Use not vain repetition, as the heathen do. Matthew V, 48 Your if is the only
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationEECS 477: Introduction to algorithms. Lecture 12
EECS 477: Introduction to algorithms. Lecture 12 Prof. Igor Guskov guskov@eecs.umich.edu October 17, 2002 1 Lecture outline: greedy algorithms II Knapsack Scheduling minimizing time with deadlines 2 Greedy
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationFailures of Gradient-Based Deep Learning
Failures of Gradient-Based Deep Learning Shai Shalev-Shwartz, Shaked Shammah, Ohad Shamir The Hebrew University and Mobileye Representation Learning Workshop Simons Institute, Berkeley, 2017 Shai Shalev-Shwartz
More informationComputational statistics
Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationScalable machine learning for massive datasets: Fast summation algorithms
Scalable machine learning for massive datasets: Fast summation algorithms Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationKernels to detect abrupt changes in time series
1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES
More informationConvex Optimization in Computer Vision:
Convex Optimization in Computer Vision: Segmentation and Multiview 3D Reconstruction Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More informationGreat Theoretical Ideas in Computer Science. Lecture 9: Introduction to Computational Complexity
15-251 Great Theoretical Ideas in Computer Science Lecture 9: Introduction to Computational Complexity February 14th, 2017 Poll What is the running time of this algorithm? Choose the tightest bound. def
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More information