Scikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python
|
|
- Janel Eaton
- 5 years ago
- Views:
Transcription
1 Scikit-learn Machine learning for the small and the many Gaël Varoquaux scikit machine learning in Python In this meeting, I represent low performance computing
2 Scikit-learn Machine learning for the small and the many Gae l Varoquaux What I do: bridging psychology to neuroscience via machine learning on brain images scikit machine learning in Python In this meeting, I represent low performance computing
3 G Varoquaux 2 1 Scikit-learn 2 Statistical algorithms 3 Scaling up / scaling out?
4 G Varoquaux 3 1 Scikit-learn Goals and tradeoff
5 G Varoquaux 4 Scikit-learn s vision: Machine learning for everyone Outreach across scientific fields, applications, communities Enabling foster innovation
6 Minimal prerequisites & assumptions G Varoquaux 4 Scikit-learn s vision: Machine learning for everyone Outreach across scientific fields, applications, communities Enabling foster innovation
7 1 scikit-learn user base G Varoquaux returning users citations OS Employer 50% 30% 20% 63% 34% 3% Windows Mac Linux industry academia other
8 1 A Python library G Varoquaux 6 Python High-level language, for users and developers General-purpose: suitable for any application Excellent interactive use
9 1 A Python library Python High-level language, for users and developers General-purpose: suitable for any application Excellent interactive use Slow compiled code as a backend Python s primitive virtual machine makes it easy G Varoquaux 6
10 1 A Python library G Varoquaux 6 Python High-level language, for users and developers General-purpose: suitable for any application Excellent interactive use Scipy Vibrant scientific stack numpy arrays = wrappers on C pointers pandas for columnar data scikit-image for images
11 1 A Python library Users like Python Web searches: Google trends G Varoquaux 7
12 1 A Python library And developpers like Python 50 Number of contributors active in a week G Varoquaux 8
13 1 A Python library And developpers like Python 50 Number of contributors active in a week Huge set of features ( 160 different statistical models) G Varoquaux 8
14 1 API: simplify, but do not dumb down G Varoquaux 9 Universal estimator interface from s k l e a r n import svm c l a s s i f i e r = svm.svc() c l a s s i f i e r. f i t ( X t r a i n, Y t r a i n ) Y t e s t = c l a s s i f i e r. p r e d i c t ( X t e s t ) # or X r e d = c l a s s i f i e r. t r a n s f o r m ( X t e s t )
15 1 API: simplify, but do not dumb down G Varoquaux 9 Universal estimator interface from s k l e a r n import svm c l a s s i f i e r = svm.svc() c l a s s i f i e r. f i t ( X t r a i n, Y t r a i n ) Y t e s t = c l a s s i f i e r. p r e d i c t ( X t e s t ) # or X r e d = c l a s s i f i e r. t r a n s f o r m ( X t e s t ) classifier often has hyperparameters Finding good defaults is crucial, and hard
16 1 API: simplify, but do not dumb down G Varoquaux 9 Universal estimator interface from s k l e a r n import svm c l a s s i f i e r = svm.svc() c l a s s i f i e r. f i t ( X t r a i n, Y t r a i n ) Y t e s t = c l a s s i f i e r. p r e d i c t ( X t e s t ) # or X r e d = c l a s s i f i e r. t r a n s f o r m ( X t e s t ) classifier often has hyperparameters Finding good defaults is crucial, and hard A lot of effort on the documentation Example-driven development
17 1 Tradeoffs G Varoquaux 10 Algorithms and models with good failure mode Avoid parameters hard to set or fragile convergence Statistical computing = ill-posed & data-dependent Little or no dependencies Easy build everywhere All compiled code generated from Cython High-level languages give features (Spark) Low-level gives speed (eg cache-friendly code)
18 2 Statistical algorithms Fast algorithms accept statistical error G Varoquaux 11
19 2 Statistical algorithms Fast algorithms accept statistical error Models most used in scikit-learn: 1. Logistic regression, SVM 4. Kmeans 2. Random forests 5. Naive Bayes 3. PCA 6. Nearest neighbor G Varoquaux 11
20 G Varoquaux 12 Big data Many samples samples features or Many features samples features Web behavior data Cheap sensors (cameras) Medical patients Scientific experiments
21 G Varoquaux 13 2 Linear models min w i l(y i, x i w) Many features Coordinate descent Iteratively optimize w.r.t. w j separately It works because: Features are redundant Sparse models can guess which w j are zero Progress = better selection of features
22 2 Linear models min w i l(y i, x i w) Many features Coordinate descent Iteratively optimize w.r.t. w j separately Many samples Gradient descent: Stochastic gradient descent min E[l(y, x w)] w w w + α w l Stochastic gradient descent w w + αe[ w l] Use a cheap estimate of E[ w l] (e.g. subsampling) Progress = second order schemes G Varoquaux 13
23 2 Linear models min w i l(y i, x i w) Many features Coordinate descent Iteratively optimize w.r.t. w j separately Many samples Gradient descent: Stochastic gradient descent min E[l(y, x w)] w w w + α w l Stochastic gradient descent w w + αe[ w l] Use a cheap estimate of E[ w l] (e.g. subsampling) Data-access locality G Varoquaux 13
24 2 Linear models min l(y i, x i w) w i Deep learning Many features Composition Coordinate of linear descent models optimized Iteratively jointly optimize (non-convex) w.r.t. w j separately with stochastic gradient descent Many samples Stochastic gradient descent Gradient descent: min E[l(y, x w)] w w w + α w l Stochastic gradient descent w w + αe[ w l] Use a cheap estimate of E[ w l] (e.g. subsampling) Data-access locality G Varoquaux 13
25 2 Trees & (random) forests (on subsets of the data) Compute simple bi-variate statistics Split data accordingly... Speed ups Share computing between trees or precompute Cache friendly access optimize traversal order Approximate histograms / statistics LightGBM, XGBoost G Varoquaux 14
26 G Varoquaux 15 2 PCA: principal component analysis Truncated SVD (singular value decomposition) X = U s V T
27 2 PCA: principal component analysis Truncated SVD (singular value decomposition) X = U s V T Randomized linear algebra for i in [1,... k]: X =random projection(x) Ũ i, s i, ṼT i = SVD( X) 20x speed ups # e.g. subsampling V red, R = QR([Ṽ1,..., Ṽk]) X red = V T redx U s V T = SVD(X red ) V T = V T V T red G Varoquaux 15
28 2 PCA: principal component analysis Truncated SVD (singular value decomposition) X = U s V T Randomized linear algebra for i in [1,... k]: X =random projection(x) Ũ i, s i, ṼT i = SVD( X) 20x speed ups # e.g. subsampling V red, R = QR([Ṽ1,..., Ṽk]) X red = V T redx U s V T = SVD(X red ) V T = V T V T red X summarize well the data Each SVD is on local data [Halko ] G Varoquaux 15
29 2 Stochastic factorization of huge matrices G Varoquaux 16 Factorization of dense matrices Data matrix X U V min U,V X UVT 2 + V 1
30 2 Stochastic factorization of huge matrices G Varoquaux 16 Factorization of dense matrices Data matrix - Data access - Code computation Stream columns min X U,V UVT 2 + V 1 - Dictionary update
31 2 Stochastic factorization of huge matrices Factorization of dense matrices Data matrix Alternating minimization Online matrix factorization - Data access - Code computation Stream columns - Dictionary update Seen at t Seen at t+1 Unseen at t [Mairal ] out of core, huge speed ups G Varoquaux 16
32 2 Stochastic factorization of huge matrices Factorization of dense matrices Data matrix Alternating minimization Online matrix factorization New subsampling algorithm - Data access - Code computation - Dictionary update Stream columns Subsample rows Seen at t Seen at t+1 Unseen at t [Mensch ] 10X speed ups, or more G Varoquaux 16
33 3 Scaling up / scaling out? G Varoquaux 17
34 3 Dataflow is key to scale Data parallel Array computing Streaming CPU Parallel computing Data + code transfer Out-of-memory persistence These patterns can yield horrible code G Varoquaux 18
35 3 Parallel-computing engine: joblib sklearn.estimator(n jobs=2) Under the hood: joblib Parallel for loops concurrency is hard Queues are the central abstraction G Varoquaux 19
36 3 Parallel-computing engine: joblib sklearn.estimator(n jobs=2) Under the hood: joblib Parallel for loops concurrency is hard G Varoquaux 19
37 3 Parallel-computing engine: joblib sklearn.estimator(n jobs=2) Under the hood: joblib Parallel for loops concurrency is hard New: distributed computing backends: Yarn, dask.distributed, IPython.parallel import distributed.joblib from joblib import Parallel, parallel backend with parallel backend( dask.distributed, scheduler host= HOST:PORT ): # normal Joblib code G Varoquaux 19
38 3 Parallel-computing engine: joblib sklearn.estimator(n jobs=2) Under the hood: joblib Parallel for loops concurrency is hard New: distributed computing backends: Yarn, dask.distributed, IPython.parallel import distributed.joblib from joblib import Parallel, parallel backend with parallel backend( dask.distributed, scheduler host= HOST:PORT ): # normal Joblib code Middleware to plug in distributed infrastructures G Varoquaux 19
39 3 Distributed data flow and storage G Varoquaux Moving data around is costly
40 G Varoquaux 20 3 Distributed data flow and storage Read-only data store Moving data around is costly Parameter database Node's memory
41 G Varoquaux 20 3 Distributed data flow and storage Read-only Why databases and not files? Maintain integrity themselves Moving data around Know how to do data replication & distribution is costly Fast lookup via indexes Not bound by POSIX FS specs Very big data calls for coupling a database to a computing engine Parameter database data store Node's memory
42 3 joblib.memory as a storage pool A caching / function memoizing system Stores results of function executions G Varoquaux 21
43 3 joblib.memory as a storage pool A caching / function memoizing system Stores results of function executions G Varoquaux 21 Out-of-memory computing >>> result = mem.cache(g).call and shelve(a) >>> result MemorizedResult(cachedir=..., func= g, argument hash=... ) >>> c = result.get()
44 3 joblib.memory as a storage pool A caching / function memoizing system Stores results of function executions G Varoquaux 21 Out-of-memory computing >>> result = mem.cache(g).call and shelve(a) >>> result MemorizedResult(cachedir=..., func= g, argument hash=... ) >>> c = result.get() S3/HDFS/cloud backend: joblib.memory( uri, backend= s3 )
45 Challenges and dreams Read-only data store High-level constructs for distributed computation & data exchange MPI feels too low level and without data concepts Parameter database Node's memory Goal: reusable algorithms from laptops to datacenters Capturing data access patterns is the missing piece G Varoquaux 22
46 Challenges and dreams Read-only data store High-level constructs for distributed computation & data exchange MPI feels too low level and without data concepts Parameter database Node's memory Goal: reusable algorithms from laptops to datacenters Capturing data access patterns is the missing piece Dask project: Limit to purely-functional code Lazy computation / compilation Build a data flow + execution graph Also: deep-learning engines, for GPUs G Varoquaux 22
47 @GaelVaroquaux Lessons from scikit-learn Small-computer machine-learning trying to scale Python gets us very far Enables focusing on algorithmic optimization Great to grow a community Can easily drop to compiled code
48 @GaelVaroquaux Lessons from scikit-learn Small-computer machine-learning trying to scale Python gets us very far Statistical algorithmics Algorithms operate on expectancies Stochastic Gradient Descent Random projections Can bring data locality
49 Lessons from scikit-learn Small-computer machine-learning trying to scale Python gets us very far Statistical algorithmics Distributed data computing Data acces is central Must be optimized for algorithm File system and memory no longer suffice
50 Lessons from scikit-learn Small-computer machine-learning trying to scale Python gets us very far Statistical algorithmics Distributed data computing If you know what your doing, you can scale scikit-learn The challenge is to make this easy and
51 4 References I N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., 53, ISSN doi: / URL J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19, A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux. Stochastic subsampling for factorizing huge matrices. Arxiv preprint, 2017.
ECE521 W17 Tutorial 1. Renjie Liao & Min Bai
ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2016
CPSC 340: Machine Learning and Data Mining More PCA Fall 2016 A2/Midterm: Admin Grades/solutions posted. Midterms can be viewed during office hours. Assignment 4: Due Monday. Extra office hours: Thursdays
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationCONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE
CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE Copyright 2013, SAS Institute Inc. All rights reserved. Agenda (Optional) History Lesson 2015 Buzzwords Machine Learning for X Citizen Data
More informationScalable Asynchronous Gradient Descent Optimization for Out-of-Core Models
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine
More informationCoordinate Update Algorithm Short Course The Package TMAC
Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading
More informationTales from fmri Learning from limited labeled data. Gae l Varoquaux
Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationINF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018
INF 5860 Machine learning for image classification Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 OUTLINE Deep learning frameworks TensorFlow TensorFlow graphs TensorFlow session
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationStochastic Gradient Descent. CS 584: Big Data Analytics
Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationPostgraduate Course Signal Processing for Big Data (MSc)
Postgraduate Course Signal Processing for Big Data (MSc) Jesús Gustavo Cuevas del Río E-mail: gustavo.cuevas@upm.es Work Phone: +34 91 549 57 00 Ext: 4039 Course Description Instructor Information Course
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationMini-project in scientific computing
Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30 Scientific computing Involves the solution of large computational
More informationDot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics
Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationOptimization for Machine Learning
Optimization for Machine Learning Elman Mansimov 1 September 24, 2015 1 Modified based on Shenlong Wang s and Jake Snell s tutorials, with additional contents borrowed from Kevin Swersky and Jasper Snoek
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationMATH36061 Convex Optimization
M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 26, 2017 Outline General information What is optimization?
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationTensorFlow. Dan Evans
TensorFlow Presentation references material from https://www.tensorflow.org/get_started/get_started and Data Science From Scratch by Joel Grus, 25, O Reilly, Ch. 8 Dan Evans TensorFlow www.tensorflow.org
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationCompressing Tabular Data via Pairwise Dependencies
Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationFaster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)
Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine
More informationNeural Networks Teaser
1/11 Neural Networks Teaser February 27, 2017 Deep Learning in the News 2/11 Go falls to computers. Learning 3/11 How to teach a robot to be able to recognize images as either a cat or a non-cat? This
More informationWhat s Cooking? Predicting Cuisines from Recipe Ingredients
What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationMatrix Factorization and Factorization Machines for Recommender Systems
Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen
More informationQuantum Recommendation Systems
Quantum Recommendation Systems Iordanis Kerenidis 1 Anupam Prakash 2 1 CNRS, Université Paris Diderot, Paris, France, EU. 2 Nanyang Technological University, Singapore. April 4, 2017 The HHL algorithm
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationScripting Languages Fast development, extensible programs
Scripting Languages Fast development, extensible programs Devert Alexandre School of Software Engineering of USTC November 30, 2012 Slide 1/60 Table of Contents 1 Introduction 2 Dynamic languages A Python
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationBinary Principal Component Analysis in the Netflix Collaborative Filtering Task
Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a
More informationCS 542G: Conditioning, BLAS, LU Factorization
CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles
More informationGradient Coding: Mitigating Stragglers in Distributed Gradient Descent.
Gradient Coding: Mitigating Stragglers in Distributed Gradient Descent. Alex Dimakis joint work with R. Tandon, Q. Lei, UT Austin N. Karampatziakis, Microsoft Setup Problem: Large-scale learning. Given
More informationFitting. PHY 688: Numerical Methods for (Astro)Physics
Fitting Fitting Data We get experimental/observational data as a sequence of times (or positions) and associate values N points: (x i, y i ) Often we have errors in our measurements at each of these values:
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationAlgorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address
More informationAccelerating SVRG via second-order information
Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationTall-and-skinny! QRs and SVDs in MapReduce
A 1 A 2 A 3 Tall-and-skinny! QRs and SVDs in MapReduce Yangyang Hou " Purdue, CS Austin Benson " Stanford University Paul G. Constantine Col. School. Mines " Joe Nichols U. of Minn James Demmel " UC Berkeley
More informationCS60021: Scalable Data Mining. Large Scale Machine Learning
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance
More information15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Decision trees Training (classification) decision trees Interpreting
More informationCSC 411 Lecture 6: Linear Regression
CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationContinuous Machine Learning
Continuous Machine Learning Kostiantyn Bokhan, PhD Project Lead at Samsung R&D Ukraine Kharkiv, October 2016 Agenda ML dev. workflows ML dev. issues ML dev. solutions Continuous machine learning (CML)
More informationDreem Challenge report (team Bussanati)
Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the
More informationThis ensures that we walk downhill. For fixed λ not even this may be the case.
Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant
More informationChenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015
GSKS GSKNN BLIS-Based High Performance Computing Kernels in N-body Problems Chenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015 N-body Problems Hellstorm Astronomy and 3D https://youtu.be/bllwkx_mrfk 2 N-body
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationThree right directions and three wrong directions for tensor research
Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney
More informationLecture 2: Linear Algebra Review
CS 4980/6980: Introduction to Data Science c Spring 2018 Lecture 2: Linear Algebra Review Instructor: Daniel L. Pimentel-Alarcón Scribed by: Anh Nguyen and Kira Jordan This is preliminary work and has
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationNumerical Optimization Techniques
Numerical Optimization Techniques Léon Bottou NEC Labs America COS 424 3/2/2010 Today s Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification,
More informationMidterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm
Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm Name: Student number: This is a closed-book test. It is marked out of 15 marks. Please answer ALL of the
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationParallel Numerics. Scope: Revise standard numerical methods considering parallel computations!
Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationApplied Machine Learning Lecture 2, part 1: Basic linear algebra; linear algebra in NumPy and SciPy. Richard Johansson
Applied Machine Learning Lecture 2, part 1: Basic linear algebra; linear algebra in NumPy and SciPy Richard Johansson linear algebra linear algebra is the branch of mathematics that deals with relationships
More informationSTCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016
Adjoint Code Design Patterns Uwe Naumann RWTH Aachen University, Germany QuanTech Conference, London, April 2016 Outline Why Adjoints? What Are Adjoints? Software Tool Support: dco/c++ Adjoint Code Design
More informationB629 project - StreamIt MPI Backend. Nilesh Mahajan
B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected
More informationProcessing Big Data Matrix Sketching
Processing Big Data Matrix Sketching Dimensionality reduction Linear Principal Component Analysis: SVD-based Compressed sensing Matrix sketching Non-linear Kernel PCA Isometric mapping Matrix sketching
More informationCourse Announcements. Bacon is due next Monday. Next lab is about drawing UIs. Today s lecture will help thinking about your DB interface.
Course Announcements Bacon is due next Monday. Today s lecture will help thinking about your DB interface. Next lab is about drawing UIs. John Jannotti (cs32) ORMs Mar 9, 2017 1 / 24 ORMs John Jannotti
More informationCS246 Final Exam. March 16, :30AM - 11:30AM
CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions
More information