MINIMUM PRECISION REQUIREMENTS FOR THE SVM-SGD LEARNING ALGORITHM. Charbel Sakr, Ameya Patil, Sai Zhang, Yongjune Kim, and Naresh Shanbhag

Size: px
Start display at page:

Download "MINIMUM PRECISION REQUIREMENTS FOR THE SVM-SGD LEARNING ALGORITHM. Charbel Sakr, Ameya Patil, Sai Zhang, Yongjune Kim, and Naresh Shanbhag"

Transcription

1 MINIMUM PRECISION REQUIREMENTS FOR THE SVM-SGD LEARNING ALGORITHM Charbel Sakr, Amea Patil, Sai Zhang, Yongjune Kim, and Naresh Shanbhag Dept. of Electrical and Computer Engineering, Universit of Illinois at Urbana Champaign ABSTRACT It is well-known that the precision of data, weight vector, and internal representations emploed in learning sstems directl impacts their energ, throughput, and latenc. The precision requirements for the training algorithm are also important for sstems that learn on-the-fl. In this paper, we present analtical lower bounds on the precision requirements for the commonl emploed stochastic gradient descent (SGD) on-line learning algorithm in the specific context of a support vector machine (SVM). These bounds are obtained subject to desired sstem performance. These bounds are validated using the UCI breast cancer dataset. Additionall, the impact of these precisions on the energ consumption of a fixed-point SVM with on-line training is studied. Simulation results in 45 nm CMOS process show that operating at the minimum precision as dictated b our bounds improves energ consumption b a factor of 5.3 as compared to conventional precision assignments with no observable loss in accurac. Index Terms machine learning, fixed point, precision, energ, accurac. INTRODUCTION Precision of data, weight vector, and internal signal representations in a machine learning implementation has a deep impact on its energ consumption and throughput. Recent works have empiricall studied the effect of moderate [] and heav [2, 3, 4, 5] quantization on the performance of learning sstems. A more analtical approach has recentl emerged where signal-to-quantization noise ratio (SQNR) is used as a metric to attempt establishing precision to accurac trade-offs for the feedforward path [6] and training [7] of deep neural networks. Our work addresses the problem of a sstematic wa of assigning precision to a fixed-point learning sstem while providing accurac guarantees. We stud the specific case of a support vector machine (SVM) [8] being trained using the stochastic gradient descent (SGD) algorithm, i.e., the SVM- SGD algorithm. Our approach is analtical in contrast to the trial-and-error approach emploed toda. Energ consumption is also brought in as a metric for consideration in the design of learning sstems. This work was supported in part b Sstems on Nanoscale Information fabrics (SONIC), one of the six SRC STARnet Centers, sponsored b MARCO and DARPA... Contributions Our contributions in this paper are the following: we provide analtical lower bounds on the precision of the data, weight vector, and training parameters for the SVM-SGD algorithm subject to requirements on classification accurac. We also stud the impact of precision reduction on the energ consumption of a baseline SVM-SGD architecture b emploing architectural level energ and dela models in a 45 nm CMOS process. We show that up to 5.3 reduction in energ is achieved when precisions are assigned based on the lower bounds as compared to a 6 b baseline implementation. The rest of this paper is organized as follows. Section 2 presents necessar background on which we build up our work. Section 3 proposes our analtical bounds on precision for fixed-point implementations. Experimental results are included in Section 4. We conclude our paper in Section BACKGROUND 2.. Online Learning SVM (SVM-SGD) SGD is an efficient training method for machine learning algorithms [9]. At each iteration, the weight vector of the algorithm is updated as follows: w n+ = w n γ w Q(z n, w n ) () where Q( ) is the loss function to be minimized, γ is the stepsize, w n R D is the weight vector to be learned, z n is the streamed sample which usuall consists of an input data vector x n and a corresponding label n, and D is the dimensionalit of the input data. SVM [8] is a simple and popular supervised learning method for classification. SVM operates b determining a maximum margin separating hperplane in the feature space. For regularization, some feature vectors ma be allowed to lie inside the margin making it a soft margin. The SVM predicts the label ŷ n {±} given a feature vector (i.e., input data vector) x n as follows: w T x n + b ŷ n= ŷ n= (2) where w is the weight vector and b is the bias term. The classification error for the SVM is defined as p e = P {Y

2 x B X B F T wx >> D Classifier Weight Update >> D >> >> w B W B F b B W ŷ ( ) Fig. : SVM-SGD algorithm s architecture showing the precision per dimension. Ŷ }. In the rest of this paper, we emplo capital letters to denote random variables. The optimum weight vector w in (2) maximizes the margin or minimizes the average of following loss function: Q(z n, w) = λ( w 2 +b 2 )+max{, n (w T x n +b)} (3) It can be shown that, when applied to the SVM algorithm, specificall the loss function defined in (3), the SGD update equation is [9]: { if n (wn T x n + b) >, w n+ = ( γ λ)w n + γ n x n otherwise. { if n (wn T x n + b) >, b n+ = ( γλ)b n + γ n otherwise. (4) 2.2. Architectural Energ and Dela Models Fig. shows the architecture of the SVM-SGD algorithm and indicates the precision assignments (per dimension) for the input B X, the classifier B F, and the weight update B W. The critical path of the architecture is an path between two latch nodes having the maximum dela, assuming inputs and outputs are also latched. The main building block of such computational sstem is the b full-adder (FA). Hence the critical path is the one having maximum number of FAs. The maximum throughput at which the architecture can operate is the reciprocal of the critical path dela as follows []: f max = I ON βl F A C F A V dd (5) where L F A is the number of FAs in the critical path of the architecture, C F A is the load capacitance of one FA, β is an empirical fitting parameter, I ON is the ON current of a transistor, and V dd is the suppl voltage at which the architecture is being operated at. The energ consumption of the architecture operating at f max is given b []: E = αn F A C F A V 2 dd + βn F A L F A C F A V 2 dd V dd S (6) where N F A is the total number of FAs in the sstem, α is the activit factor, and S is the subthreshold slope. It is found [] that both N F A and L F A are linear in the precisions B X, B F, and B W. Since E is proportional to the product of N F A and L F A, E is a quadratic function of these precisions. This clearl indicates the importance of minimizing B X, B F, and B W. Note, this analsis assumes standard implementation using ripple carr adders and Baugh-Woole multipliers. 3. ANALYTICAL BOUNDS ON PRECISION The impact of precisions of data (B X ), weight vector (B F ), and weight update block (B W ) on the overall SQNR is well established for finite-impulse response (FIR) filters and least mean-squared (LMS) adaptive filters [2]. In this section, we analticall predict the accurac of the SVM-SGD algorithm as a function of B X, B F, and B W. All details concerning the derivation and proofs of the upcoming results can be found in []. 3.. Classification Block We assume that the input data is quantized to B X bits and is hence corrupted b quantization noise that is independentl uniforml distributed on [ x 2, x 2 ] per dimension [3]. Note that x = 2 (B X ) is the input quantization step. Similarl, the weights are quantized to B F bits and are hence corrupted b quantization noise that is independentl uniforml distributed on [ f 2, f 2 ] per dimension where f = 2 (B F ). Our first result is a lower bound derived based on the geometric propert of the SVM. This lower bound on B X ensures that an datapoint ling outside the margin is classified correctl even after being perturbed b quantization noise. Theorem 3. (Geometric Bound). Given D, B F, and w, a feature vector x ling outside the margin will be classified correctl if ( ) D w B X > log 2 ( +. (7) D x )2 B F Proof. The main idea is to make sure that the worst case quantization noise is less than the functional margin of the classifier. For details, please see [].

3 The geometric bound reveals the following: larger SVM margin (i.e., smaller w ) allows greater reduction of B X ; there is a trade-off relation between B X and B F ; higher dimension D and larger x require more input precision B X. Note that Theorem 3. is specific to a single feature vector x. The following is a simple corollar that applies to all datapoints in the dataset ling outside the margin. Corollar 3... Given D, B F, and w, an feature vector in the dataset X ling outside the margin will be classified correctl if D w B X > log 2 ( + D max x X x )2 B F. (8) It is worth noting that max x X x D because of normalization. Let Ŷfx be the output of the fixed-point classifier and Ŷfl be the output of the floating-point classifier. Our next result is an upper bound on the mismatch probabilit p m = Pr{Ŷfx Ŷ fl } as a function of precision. Theorem 3.2 (Probabilistic Bound). Given B X, B F, w, and b, the upper bound on the mismatch probabilit p m is given b: p m 24 ( 2 x w 2 E + 2 f E [ X 2 + w T X + b 2 [ ] w T X + b 2 ]) where X denotes the random variable of the dataset. Note that the statistics (i.e., the expected values in (9)) are calculated empiricall. Proof. The main idea is to consider the mismatch event for one datapoint and upper bound its probabilit using Chebshev s inequalit. The law of total probabilit is then used to obtain the upper bound over the whole dataset. For details, please see []. Theorem 3.2 can be reformulated to obtain the lower bound on precision to achieve a target worst-case mismatch rate. It can also be extended to obtain the upper bound on the actual classification accurac of the fixed-point realization as follows: Theorem 3.3. The fixed-point probabilit of error is upper bounded as follows: (9) p e + min(p fl, p m ) max(p fl, p m ). () where p fl = Pr{Ŷfl = Y } denotes the probabilit of detection in floating-point. Proof. Basic laws of probabilit theor were used. For details, please see [] Weight Update Block Analsis In the preceding discussion, w is the converged weight vector of an infinite-precision algorithm. In practice, it is standard to first implement a floating-point algorithm with desirable convergence properties and then to quantize so as to keep the convergence behavior unaltered. The upcoming setup assumes without loss of generalit that the floating-point convergence of SGD in the context of SVM (4) is obtained for λ = and some small value of γ (tpicall a negative power of 2) Our next result ensures that the non-zero update term in (4) is represented correctl during training in spite of weight update quantization. Theorem 3.4. The lower bound on the weight update precision B W in order to ensure full convergence is given b: B W B X log 2 (γ). () Proof. The idea is to prevent the updates from adding additional quantization. For details, please see []. The setup of Theorem 3.4 is quite conservative. In fact, it is possible to sometimes break the bound in () and still obtain satisfactor convergence behavior []. Two particular cases are to be noted. If B W = log 2 (γ), then we obtain a sign-sgd behavior onl for negative updates. This is because onl the sign bit of the input data is retrieved after truncation at the end of the update. Similarl, if B W = log 2 (γ), then we still obtain a sign-sgd behavior, but the effective step-size doubles due to truncation. 4. EXPERIMENTAL RESULTS 4.. Classification Accurac and Convergence In this section, we validate our analtical results based on the UCI Breast Cancer dataset [4]. For fixed-point classification with B F = 6, Fig. 2 (a) shows that B X 5 is sufficient to achieve a target classification probabilit of p e.6. This is consistent with the analtical values determined b the geometric bound (8) and the probabilistic bound (9): B X = 4 and B X = 5, respectivel. Fig. 2 (b) shows the convergence curve of the average loss function of SVM as described in (3) for various weight update block precisions B W. The minimum precision given b () is B W = when γ = 2 5, and B X = 6 chosen to be consistent with the bounds shown in Fig. 2(a). We note that the fixed-point convergence curve tracks the floating-point curve ver closel. We also show convergence curves for precisions B W =, 7, 6, and 5. The curves are similar to the floatingpoint curve though with an observable loss in the accurac for lower precisions. Further analsis explaining this loss in

4 Loss Function E(J) Loss Function E(J) p e B X Bits V dd Sample (V) (a) Index (V) V dd Fig. 3: Energ savings using minimum precision (B F = 6, B X = 5, B W = ) for p m <., compared to 2 b higher precision (B F = 8, B X = 7, B W = 2) and uniform precision assignment of 6 b (B F = 6, B X = 6, B W = 6). Sample Index (b) Fig. 2: Experimental results for the Breast Cancer dataset: (a) fixed-point classifier simulation (FX sim), probabilistic upper bound (UB), and geometric bound (GB) for B F = 6, and (b) convergence curves for fixed-point (B F = 6 and B X = 6) and floating-point (FL sim) SGD training (γ = 2 5 and λ = ). accurac can be found in []. Note: the initial faster convergence but eventual loss in accurac when B W = 5 is because it corresponds to B W = log 2 (γ) meaning a sign-sgd behavior with double the step size as discussed in Section Energ Consumption We emploed the energ-dela models in Section 2.2 to estimate the energ consumption of the SVM-SGD architecture (Fig. ) when processing the Breast Cancer dataset. Our methodolog is the same as the one in [] but it uses a 45 nm semiconductor process parameters. This methodolog is useful for estimating the benefits of precision reduction without a time-consuming and laborious ASIC design process. The model parameters obtained through circuit simulations of a full adder and their values can be found in []. Based on these parameters, we evaluate the energ consumption in (6). Fig. 3 shows the energ consumption of the architecture as a function of suppl voltage for different precisions in order to achieve p m <.. An architecture that emplos 6 b for all variables is chosen as reference as recent works on fixedpoint learning have emploed commercial 6 b digital signal processors []. As discussed in [], the trade-off between leakage and dnamic energ (the two terms in (6)) leads to the well known minimum energ operating point (MEOP) for each curve. As shown in Fig. 3, the precision assignment using (9)-() results in 5.3 energ savings at the MEOP over a 6 b fixed-point implementation. 5. CONCLUSION In this paper, we presented analtical bounds on the precisions of data, weight vector, and weight update block for the SVM-SGD algorithm. These bounds are closel related to performance metrics such as the classification accurac and mismatch probabilit. Moreover, we quantified the impact of precision on the energ cost of inference in a commercial semiconductor process technolog using circuit analsis and models. We observed significant energ savings when assigning precision at the lower bounds as compared to prior work. Using our results, designers of fixed-point realization can efficientl determine precision requirements analticall. Future work includes developing a similar fixed-point analsis of the popular Deep Neural Networks (DNN). Such networks tend to be highl complex. A sstematic stud of precision requirements will be extremel valuable.

5 References [] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Naraanan, Deep learning with limited numerical precision, in Proceedings of The 32nd International Conference on Machine Learning, 25, pp [3] K. Parhi, VLSI digital signal processing sstems: design and implementation, John Wile & Sons, 27. [4] A. Asuncion and D. Newman, UCI machine learning repositor, edu/ml/, 27, [online]. [2] M. Kim and P. Smaragdis, Bitwise Neural Networks, arxiv preprint arxiv:6.67, 26. [3] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David, Binarconnect: Training deep neural networks with binar weights during propagations, in Advances in Neural Information Processing Sstems, 25, pp [4] Matthieu Courbariaux and Yoshua Bengio, Binarnet: Training deep neural networks with weights and activations constrained to + or -, arxiv preprint arxiv:62.283, 26. [5] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi, XNOR-Net: ImageNet classification csing binar convolutional neural networks, arxiv preprint arxiv: , 26. [6] Darrl D Lin, Sachin S Talathi, and V Sreekanth Annapuredd, Fixed point quantization of deep convolutional networks, arxiv preprint arxiv:5.6393, 25. [7] Darrl D Lin and Sachin S Talathi, Overcoming challenges in fixed point training of deep convolutional networks, arxiv preprint arxiv:67.224, 26. [8] C. Cortes and V. Vapnik, Support-vector networks, Machine learning, vol. 2, no. 3, pp , 995. [9] L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMP- STAT 2, pp Springer, 2. [] R. Abdallah and N. Shanbhag, Reducing energ at the minimum energ operating point via statistical error compensation, Ver Large Scale Integration (VLSI) Sstems, IEEE Transactions on, vol. 22, no. 6, pp , 24. [] Charbel Sakr, Amea Patil, Sai Zhang, and Naresh Shanbhag, Understanding the energ and precision requirements for online learning, arxiv preprint arxiv:67.669, 26. [2] M. Goel and N. Shanbhag, Finite-precision analsis of the pipelined strength-reduced adaptive filter, Signal Processing, IEEE Transactions on, vol. 46, no. 6, pp , 998.

arxiv: v3 [stat.ml] 26 Aug 2016

arxiv: v3 [stat.ml] 26 Aug 2016 Understanding the Energy and Precision Requirements for Online Learning arxiv:67.669v3 [stat.ml] 26 Aug 26 Charbel Sakr Ameya Patil Sai Zhang Yongjune Kim Naresh Shanbhag Department of Electrical and Computer

More information

Layer 1. Layer L. difference between the two precisions (B A and B W ) is chosen to balance the sum in (1), as follows: B A B W = round log 2 (2)

Layer 1. Layer L. difference between the two precisions (B A and B W ) is chosen to balance the sum in (1), as follows: B A B W = round log 2 (2) AN ANALYTICAL METHOD TO DETERMINE MINIMUM PER-LAYER PRECISION OF DEEP NEURAL NETWORKS Charbel Sakr Naresh Shanbhag Dept. of Electrical Computer Engineering, University of Illinois at Urbana Champaign ABSTRACT

More information

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks Charbel Sakr, Naresh Shanbhag Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

More information

Analytical Guarantees on Numerical Precision of Deep Neural Networks

Analytical Guarantees on Numerical Precision of Deep Neural Networks Charbel Sakr Yongjune Kim Naresh Shanbhag Abstract The acclaimed successes of neural networks often overshadow their tremendous complexity. We focus on numerical precision - a key parameter defining the

More information

Analytical Guarantees on Numerical Precision of Deep Neural Networks

Analytical Guarantees on Numerical Precision of Deep Neural Networks Charbel Sakr Yongjune Kim Naresh Shanbhag Abstract The acclaimed successes of neural networks often overshadow their tremendous complexity. We focus on numerical precision - a key parameter defining the

More information

σ(x) = clip( x m + α 2, 0, α) = min(max( x m + α 2, 0), α) (1)

σ(x) = clip( x m + α 2, 0, α) = min(max( x m + α 2, 0), α) (1) TRUE GRADIENT-BASED TRAINING OF DEEP BINARY ACTIVATED NEURAL NETWORKS VIA CONTINUOUS BINARIZATION Charbel Sakr, Jungwook Choi, Zhuo Wang, Kailash Gopalakrishnan, Naresh Shanbhag Dept. of Electrical and

More information

Binary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky

Binary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky Binary Deep Learning Presented by Roey Nagar and Kostya Berestizshevsky Deep Learning Seminar, School of Electrical Engineering, Tel Aviv University January 22 nd 2017 Lecture Outline Motivation and existing

More information

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad

TYPES OF MODEL COMPRESSION. Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad TYPES OF MODEL COMPRESSION Soham Saha, MS by Research in CSE, CVIT, IIIT Hyderabad 1. Pruning 2. Quantization 3. Architectural Modifications PRUNING WHY PRUNING? Deep Neural Networks have redundant parameters.

More information

On the Complexity of Neural-Network-Learned Functions

On the Complexity of Neural-Network-Learned Functions On the Complexity of Neural-Network-Learned Functions Caren Marzban 1, Raju Viswanathan 2 1 Applied Physics Lab, and Dept of Statistics, University of Washington, Seattle, WA 98195 2 Cyberon LLC, 3073

More information

Stochastic Computing: A Design Sciences Approach to Moore s Law

Stochastic Computing: A Design Sciences Approach to Moore s Law Stochastic Computing: A Design Sciences Approach to Moore s Law Naresh Shanbhag Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana Champaign

More information

PERFECT ERROR COMPENSATION VIA ALGORITHMIC ERROR CANCELLATION

PERFECT ERROR COMPENSATION VIA ALGORITHMIC ERROR CANCELLATION PERFECT ERROR COMPENSATION VIA ALGORITHMIC ERROR CANCELLATION Sujan K. Gonugondla, Byonghyo Shim and Naresh R. Shanbhag Dept. of Electrical and Computer Engineering, University of Illinois at Urbana Champaign

More information

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Differentiable Fine-grained Quantization for Deep Neural Network Compression Differentiable Fine-grained Quantization for Deep Neural Network Compression Hsin-Pai Cheng hc218@duke.edu Yuanjun Huang University of Science and Technology of China Anhui, China yjhuang@mail.ustc.edu.cn

More information

Towards Accurate Binary Convolutional Neural Network

Towards Accurate Binary Convolutional Neural Network Paper: #261 Poster: Pacific Ballroom #101 Towards Accurate Binary Convolutional Neural Network Xiaofan Lin, Cong Zhao and Wei Pan* firstname.lastname@dji.com Photos and videos are either original work

More information

ONLINE LEARNING WITH KERNELS: OVERCOMING THE GROWING SUM PROBLEM. Abhishek Singh, Narendra Ahuja and Pierre Moulin

ONLINE LEARNING WITH KERNELS: OVERCOMING THE GROWING SUM PROBLEM. Abhishek Singh, Narendra Ahuja and Pierre Moulin 22 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 23 26, 22, SANTANDER, SPAIN ONLINE LEARNING WITH KERNELS: OVERCOMING THE GROWING SUM PROBLEM Abhishek Singh, Narendra Ahuja

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

COMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS

COMPARISON STUDY OF SENSITIVITY DEFINITIONS OF NEURAL NETWORKS COMPARISON SUDY OF SENSIIVIY DEFINIIONS OF NEURAL NEORKS CHUN-GUO LI 1, HAI-FENG LI 1 Machine Learning Center, Facult of Mathematics and Computer Science, Hebei Universit, Baoding 07100, China Department

More information

A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications

A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications M. C. Parameshwara 1,K.S.Shashidhara 2 and H. C. Srinivasaiah 3 1 Department of Electronics and Communication

More information

arxiv: v1 [cs.lg] 10 Nov 2017

arxiv: v1 [cs.lg] 10 Nov 2017 Quantized Memory-Augmented Neural Networks Seongsik Park, Seijoon Kim, Seil Lee, Ho Bae 2, and Sungroh Yoon,2 Department of Electrical and Computer Engineering, Seoul National University, Seoul 8826, Korea

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Credit Assignment: Beyond Backpropagation

Credit Assignment: Beyond Backpropagation Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning

More information

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017 Neural Network Approximation Low rank, Sparsity, and Quantization zsc@megvii.com Oct. 2017 Motivation Faster Inference Faster Training Latency critical scenarios VR/AR, UGV/UAV Saves time and energy Higher

More information

A Parallel SGD method with Strong Convergence

A Parallel SGD method with Strong Convergence A Parallel SGD method with Strong Convergence Dhruv Mahajan Microsoft Research India dhrumaha@microsoft.com S. Sundararajan Microsoft Research India ssrajan@microsoft.com S. Sathiya Keerthi Microsoft Corporation,

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

On the design of Incremental ΣΔ Converters

On the design of Incremental ΣΔ Converters M. Belloni, C. Della Fiore, F. Maloberti, M. Garcia Andrade: "On the design of Incremental ΣΔ Converters"; IEEE Northeast Workshop on Circuits and Sstems, NEWCAS 27, Montreal, 5-8 August 27, pp. 376-379.

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Other examples of neurons can be found in:

Other examples of neurons can be found in: 4 Concept neurons Introduction to artificial neural networks 41 Tpical neuron/nerve cell: from Kandel, Schwartz and Jessel, Principles of Neural Science The cell od or soma contains the nucleus, the storehouse

More information

arxiv: v3 [cs.lg] 2 Jun 2016

arxiv: v3 [cs.lg] 2 Jun 2016 Darryl D. Lin Qualcomm Research, San Diego, CA 922, USA Sachin S. Talathi Qualcomm Research, San Diego, CA 922, USA DARRYL.DLIN@GMAIL.COM TALATHI@GMAIL.COM arxiv:5.06393v3 [cs.lg] 2 Jun 206 V. Sreekanth

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Design for Manufacturability and Power Estimation. Physical issues verification (DSM) Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata Serial Parallel Multiplier Design in Quantum-dot Cellular Automata Heumpil Cho and Earl E. Swartzlander, Jr. Application Specific Processor Group Department of Electrical and Computer Engineering The University

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Supervised Learning Part I

Supervised Learning Part I Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure

More information

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont. UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 9: Classifica8on with Support Vector Machine (cont.) Yanjun Qi / Jane University of Virginia Department of Computer Science

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

Least-Squares Independence Test

Least-Squares Independence Test IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp.333-336,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc,

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Multiscale methods for neural image processing. Sohil Shah, Pallabi Ghosh, Larry S. Davis and Tom Goldstein Hao Li, Soham De, Zheng Xu, Hanan Samet

Multiscale methods for neural image processing. Sohil Shah, Pallabi Ghosh, Larry S. Davis and Tom Goldstein Hao Li, Soham De, Zheng Xu, Hanan Samet Multiscale methods for neural image processing Sohil Shah, Pallabi Ghosh, Larry S. Davis and Tom Goldstein Hao Li, Soham De, Zheng Xu, Hanan Samet A TALK IN TWO ACTS Part I: Stacked U-nets The globalization

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Regular Physics - Notes Ch. 1

Regular Physics - Notes Ch. 1 Regular Phsics - Notes Ch. 1 What is Phsics? the stud of matter and energ and their relationships; the stud of the basic phsical laws of nature which are often stated in simple mathematical equations.

More information

Strain Transformation and Rosette Gage Theory

Strain Transformation and Rosette Gage Theory Strain Transformation and Rosette Gage Theor It is often desired to measure the full state of strain on the surface of a part, that is to measure not onl the two etensional strains, and, but also the shear

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Computation of Csiszár s Mutual Information of Order α

Computation of Csiszár s Mutual Information of Order α Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures

Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures Ms. P.THENMOZHI 1, Ms. C.THAMILARASI 2 and Mr. V.VENGATESHWARAN 3 Assistant Professor, Dept. of ECE, J.K.K.College of Technology,

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

THE UNIVERSITY OF MICHIGAN. Timing Analysis of Domino Logic

THE UNIVERSITY OF MICHIGAN. Timing Analysis of Domino Logic Timing Analsis of Domino Logic b David Van Campenhout, Trevor Mudge and Karem Sakallah CSE-T-296-96 THE UNIVESITY O MICHIGAN Computer Science and Engineering Division Department of Electrical Engineering

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

An Information Theory For Preferences

An Information Theory For Preferences An Information Theor For Preferences Ali E. Abbas Department of Management Science and Engineering, Stanford Universit, Stanford, Ca, 94305 Abstract. Recent literature in the last Maimum Entrop workshop

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Sparse Regularized Deep Neural Networks For Efficient Embedded Learning

Sparse Regularized Deep Neural Networks For Efficient Embedded Learning Sparse Regularized Deep Neural Networks For Efficient Embedded Learning Anonymous authors Paper under double-blind review Abstract Deep learning is becoming more widespread in its application due to its

More information

Stability of Limit Cycles in Hybrid Systems

Stability of Limit Cycles in Hybrid Systems Proceedings of the 34th Hawaii International Conference on Sstem Sciences - 21 Stabilit of Limit Ccles in Hbrid Sstems Ian A. Hiskens Department of Electrical and Computer Engineering Universit of Illinois

More information

Change point method: an exact line search method for SVMs

Change point method: an exact line search method for SVMs Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.

More information

Simulation of Acoustic and Vibro-Acoustic Problems in LS-DYNA using Boundary Element Method

Simulation of Acoustic and Vibro-Acoustic Problems in LS-DYNA using Boundary Element Method 10 th International LS-DYNA Users Conference Simulation Technolog (2) Simulation of Acoustic and Vibro-Acoustic Problems in LS-DYNA using Boundar Element Method Yun Huang Livermore Software Technolog Corporation

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES R.P.MEENAAKSHI SUNDHARI 1, Dr.R.ANITA 2 1 Department of ECE, Sasurie College of Engineering, Vijayamangalam, Tamilnadu, India.

More information

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp. On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained

More information

Identification of Nonlinear Dynamic Systems with Multiple Inputs and Single Output using discrete-time Volterra Type Equations

Identification of Nonlinear Dynamic Systems with Multiple Inputs and Single Output using discrete-time Volterra Type Equations Identification of Nonlinear Dnamic Sstems with Multiple Inputs and Single Output using discrete-time Volterra Tpe Equations Thomas Treichl, Stefan Hofmann, Dierk Schröder Institute for Electrical Drive

More information

An overview of deep learning methods for genomics

An overview of deep learning methods for genomics An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Tunable Floating-Point for Energy Efficient Accelerators

Tunable Floating-Point for Energy Efficient Accelerators Tunable Floating-Point for Energy Efficient Accelerators Alberto Nannarelli DTU Compute, Technical University of Denmark 25 th IEEE Symposium on Computer Arithmetic A. Nannarelli (DTU Compute) Tunable

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information