Continuous Machine Learning Kostiantyn Bokhan, PhD Project Lead at Samsung R&D Ukraine Kharkiv, October 2016
Agenda ML dev. workflows ML dev. issues ML dev. solutions Continuous machine learning (CML) Aspects of CML CML infrastructure CML delivery 2
ML dev. workflows Gathering Gathering Datasets Datasets Feature Feature design design Model Model Training Training Model Model Validation Validation Model Model Testing Testing Train framework (Python/R/Matlab) Requirements Requirements QA QA Planning Planning Tests Tests FE FE Model Model tools tools Classifier Classifier Application (C++/Java) 3 UI UI Market Market
ML dev. workflows Data selection Feature design Data Data transformation transformation Clean Datasets Hypothesize, Hypothesize, Model Model Visualize, Explore Define the Problem Gather Datasets Measure, Evaluate 4 Deploy
ML dev. workflows Training Test data Datasets Train data Feature Extraction Training the model features Model Eval the model labels Input Data Feature Extractio n features Predict Predicting Model 5 Labels
ML dev. issues Bigger Data Size Big Small 1996 Model complexity 2006 2016 Sim ple Complex Complex 6
ML dev. issues Performance Machine Learning on PC 7
Performance ML dev. issues Machine Learning on PC Machine Learning My other computer on AWS is Amazon EC2 8
Performance ML dev. issues Machine Learning on PC Machine Learning My other computer on AWS is Amazon EC2 Machine Learning on dedicated cluster 9
ML dev. issues Uncertainty 10
ML dev. issues Variety 11
ML dev. issues Reliabilit y 12
ML dev. issues Resource management 13
ML dev. solutions Performance + 17
ML dev. solutions Performance + Reliability + Mesos 18
ML dev. solutions Variety 19
ML dev. solutions Resource management Theano dcos CLI MPI Singularity Aurora Init.rd Marathon Kernel Mesos CUDA Memory CPU Storage 20 GPU
ML dev. solutions Mesosphere Enterprise DC/OS is an enterprise grade datacenter-scale operating system, providing a single platform for running containers, big data, and distributed apps in production. Services & Applications Easily deploy and run datacenter-wide app services such Docker, Cassandra, Spark pooled on a single platform DC/OS Powered by Apache Mesos Runtime, tools and best practices built-in to simplify operations and deliver a production self-healing infrastructure Run Anywhere Bare-metal, virtual, cloud or hybrid DC/OS runs on it all only requirement is a modern Linux distro, Windows support coming soon :) Source: https://dcos.io/docs/1.8/overview/architecture/ 22
ML dev. solutions Source: https://dcos.io/docs/1.8/overview/architecture/ 23
ML dev. solutions Apache Mesos is the open-source distributed systems kernel at the heart of the Mesosphere DC/OS. It abstracts the entire datacenter into a single pool of computing resources, simplifying running distributed systems at scale. Sources: http://nvidia.com, http://blog.arungupta.me/docker-apache-mesos-marathon/ 24
CML Resources Hybrid Virtual nodes Test devices 25 Bare-metal nodes with GPU
ML dev. solutions Uncertainty Continuous Machine Learning We can t remove uncertainty but we can automate routines especially delivery and integration 26
Aspects of CML Continuous Continuous Development Continuous Integration Continuous Deployment Continuous Delivery Continuous Everything 28
CML Infrastructure Data scientists Train Validation Test Pool of Devices Singularity Mesos Master Mesos Agent Marathon Standby Master Standby Master Mesos Agent Mesos Agent Spark job GIT Jenkins Docker registry Developers 31 Batch docker job Spark CUDA job Mesos Agent Spark job Spark Docker job CUDA job Singularity Marathon Build Test Verification
CML deploy 32
CML deploy 33
CML deploy 34
CML deploy 35
Questions? 36 Samsung R&D Institute Ukraine