IE598 Big Data Optimization Introduction

Similar documents
CS 6375 Machine Learning

6.036 midterm review. Wednesday, March 18, 15

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Machine Learning (CS 567) Lecture 2

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Machine Learning CSE546

Support Vector Machine I

Large-scale Stochastic Optimization

Machine Learning. Instructor: Pranjal Awasthi

ECE521 week 3: 23/26 January 2017

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Probabilistic Machine Learning. Industrial AI Lab.

Lecture 2 Machine Learning Review

ECE521 lecture 4: 19 January Optimization, MLE, regularization

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Loss Functions, Decision Theory, and Linear Models

Bayesian Networks Inference with Probabilistic Graphical Models

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Statistical Data Mining and Machine Learning Hilary Term 2016

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Lecture 1: Introduction

CPSC 340: Machine Learning and Data Mining

Machine Learning. 7. Logistic and Linear Regression

Mathematical Formulation of Our Example

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

ECS171: Machine Learning

Foundations of Machine Learning

Course in Data Science

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

STATISTICS-STAT (STAT)

Overfitting, Bias / Variance Analysis

Introduction to Course

Machine Learning (CS 567) Lecture 3

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Learning From Data: Modelling as an Optimisation Problem

ISyE 691 Data mining and analytics

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Announcements Kevin Jamieson

Lecture 1: Supervised Learning

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Logistic Regression. COMP 527 Danushka Bollegala

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann

CS Machine Learning Qualifying Exam

Pattern Recognition and Machine Learning

Introduction to Convex Optimization

Discriminative Models

Qualifying Exam in Machine Learning

Support Vector Machine II

CS 570: Machine Learning Seminar. Fall 2016

Bits of Machine Learning Part 1: Supervised Learning

Empirical Risk Minimization

ECE521 Lecture7. Logistic Regression

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Introduction to Machine Learning

Online Learning and Sequential Decision Making

Introduction to Machine Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Regression with Numerical Optimization. Logistic

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Machine Learning (COMP-652 and ECSE-608)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Introduction to Logistic Regression and Support Vector Machine

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Probabilistic Graphical Models for Image Analysis - Lecture 1

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

OPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano

Introduction to Machine Learning Midterm, Tues April 8

Logic and machine learning review. CS 540 Yingyu Liang

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

CSCI-567: Machine Learning (Spring 2019)

MATH36061 Convex Optimization

Recent Advances in Bayesian Inference Techniques

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

Support Vector Machines for Classification: A Statistical Portrait

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

Statistical Methods for SVM

Machine Learning Theory (CS 6783)

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Artificial Neural Networks

Master of Science in Statistics A Proposal

Statistical Learning Reading Assignments

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Lecture 2: Linear Algebra Review

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE

STA141C: Big Data & High Performance Statistical Computing

Transcription:

IE598 Big Data Optimization Introduction Instructor: Niao He Jan 17, 2018 1

A little about me Assistant Professor, ISE & CSL UIUC, 2016 Ph.D. in Operations Research, M.S. in Computational Sci. & Eng. Georgia Tech, 2010 2015 B.S. in Mathematics, University of Sci. & Tech. of China, 2006 2010 2

A little about the course Big Data Optimization Explore modern optimization theories, algorithms, and big data applications Emphasize a deep understanding of structure of optimization problems and computation complexity of numerical algorithms Expose to the frontier of research in the intersection of large-scale optimization and machine learning 3

A little about you PhD or Master? ISE, ECE, Stat, CS? Took any optimization courses? 4

Course Details Prerequisites: no formal ones, but assume knowledge in linear algebra, real analysis, and probability theory basic machine learning and optimization at graduate level Textbooks: no required ones, but recommend to read the listed references on syllabus Ben-Tal & Nemirovski (2011) Nesterov (2003) Beck (2017) Bubeck (2015) 5

Course Details Evaluation: groups of size 1~3 Paper Presentation (25%) Final Project (75%) Proposal (10%) : 1~3 pages Report (40%): 10~15 pages under given format Presentation (25%): 15~20 mins Bonus (20 pts): a conference or journal submission Deadlines: see syllabus 6

Course Admin Syllabus & Website http://niaohe.ise.illinois.edu/ie598/ (pwd: spring2018) Where to get help No TAs Email: niaohe@illinois.edu with [IE 598] in your subject Office Location: 211 Transport Building Office Hours: Mon. 3:00-4:00 or by appointment via email 7

Introduction 8

Starts with the buzzword 9

Era of Big Data Big data heat in academia 10

Era of Big Data Big data heat in industry LinkedIn: 48,000+ Data Scientist jobs in United States 11

More than a buzzword Big data revolution in various areas Robotics and autonomous car Natural language processing Computer vision Healthcare Healthcare Finance Environment Lifestyle Aerospace 12

How to do data analysis? Key Steps Pose a problem Collect data Pre-process and clean data Formulate a mathematical model Find a solution Evaluate and interpret the results 13

What is Optimization? Find the optimal solution that minimize/maximize an objective function subject to constraints 14

Why do we care? Optimization lies at the heart of many fields, especially machine learning. Finance Portfolio selection, asset pricing, etc. Electrical Engineering Signal and image processing, control and robotics, etc. Industrial Engineering Supply chain, revenue management, transportation etc. Computer Science Machine learning, computer vision, etc. 15

Example Portfolio Selection Markowitz Mean-Variance Model where w is a vector of portfolio weights R is the expected returns Σ is the variance of portfolio returns λ > 0 is the risk tolerance factor 16

Example Image Denoising Total Variation Denoising Model where x is image matrix O is the noisy image, P is the observed entries TV(x) is the total variation 17

Example Inventory Newsvendor Model where q is number of newspaper to be stocked D is the random demand c is the unit purchase price p is the sell price 18

Example Regression Linear Regression Model where x i : predictor vector (feature) y i : response vector (label) w: parameters to be learned n: number of data points 19

Example Regularization Ridge Regression Model 2 w 2 = σj=1 d w j 2 is the L 2 -regularization LASSO (Least Absolute Shrinkage and Selection Operator) w 1 = σd j=1 w j is the L 1 -regularization 20

Example Classification Maximum Margin Classifier Model where x i : predictor vector (feature) y i {1, 1}: label/class w: parameters to be learned n: number of data points 21

Example More Classification Soft Margin SVM (support vector machine) Logistic Regression 22

Example Maximum Likelihood Estimation Assume data points x 1,..., x n are drawn i.i.d. from some distribution and we want to fit the data with a model p(x w) with parameter w, the maximum likelihood estimation is to solve Least square regression as a special case Logistic regression as a special case 23

Example Clustering K-Means Model where x 1,, x n : data μ 1,, μ k : cluster centers to be learned C 1,, C k : clusters to be assigned to 24

Many More Examples in ML Supervised learning (predictive models) Regression Classification Neural networks Boosting Unsupervised learning (data exploration) Clustering (K-means) Dimension reduction (PCA) Density estimation Reinforcement learning Collaborative filtering Graphical models Probabilistic inference 25

Theme of This Course How to solve optimization problems efficiently in the Big Data environment? 26

Structure of Optimization Linear vs. Nonlinear Deterministic vs. Stochastic Continuous vs. Combinatorial Smooth vs. Nonsmooth Convex vs. Nonconvex Low-dimensional vs. High-dimensional Static vs. Online Single vs. Sequential Decision Making 27

Easy or Hard? What makes an optimization problem easy or hard? Find minimum volume ellipsoid Find maximum volume ellipsoid NP-hard Polynomial solvable Example from L. Xiao, CS286 seminar 28

Easy or Hard? What makes an optimization problem easy or hard? Linear Optimization Polynomial Optimization Polynomial solvable P ~ NP-hard 29

Complexity and Convexity The great watershed in optimization isn t between linearity and nonlinearity, but convexity and nonconvexity. R. Rockafellar, SIAM Review 1993 Non-Convex Optimization Convex Optimization 30

Types of Algorithms Polynomial-time algorithms (dates back to 1970s or so) E.g., ellipsoid method, interior point method (IPM) First-order algorithms (dates back to 1900s, resurrection since 1980) E.g., accelerated gradient descent method (AGD) Second-order algorithms E.g., Newton method, L-BFGS Stochastic and randomized algorithms (dates back to 1950s, resurrection since 2004) E.g., stochastic approximation (SA) 31

Central Topics Algorithms Large-Scale Optimization Complexity Applications 32

Courtesy Warning If you are looking for software/tools for big data analytics : IE 529 Stats of Big Data & Clustering Check CS/CSE related data mining courses If you are looking for introduction-level optimization: ECE 490 Introduction to Optimization IE 510 Advanced Nonlinear Programming If you are looking for combinatorial optimization: IE 598 Advanced Integer Programming Otherwise, welcome to the class and see you next week! 33