Machine Learning. Instructor: Pranjal Awasthi

Similar documents
Machine Learning CSE546

CS 6375 Machine Learning

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Introduction to Stochastic Processes

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

Machine Learning (CS 567) Lecture 5

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine Learning, Fall 2009: Midterm

Machine Learning: Homework Assignment 2 Solutions

CMPSCI 240: Reasoning about Uncertainty

Lecture 1: Bayesian Framework Basics

Introduction. Le Song. Machine Learning I CSE 6740, Fall 2013

Qualifier: CS 6375 Machine Learning Spring 2015

Lecture 2: Repetition of probability theory and statistics

Bayesian Learning (II)

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Probability Theory for Machine Learning. Chris Cremer September 2015

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

COS 341: Discrete Mathematics

CS Machine Learning Qualifying Exam

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Data Mining Techniques. Lecture 3: Probability

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

Be able to define the following terms and answer basic questions about them:

General Info. Grading

Overfitting, Bias / Variance Analysis

PAC-learning, VC Dimension and Margin-based Bounds

Review for Exam Spring 2018

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Machine Learning CSE546

Introduction to Machine Learning Midterm Exam Solutions

Be able to define the following terms and answer basic questions about them:

Introduction to Machine Learning Midterm Exam

Review. December 4 th, Review

AMCS243/CS243/EE243 Probability and Statistics. Fall Final Exam: Sunday Dec. 8, 3:00pm- 5:50pm VERSION A

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

MAT 271E Probability and Statistics

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

COMS 4771 Introduction to Machine Learning. Nakul Verma

FINAL: CS 6375 (Machine Learning) Fall 2014

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Pattern Recognition. Parameter Estimation of Probability Density Functions

6.041/6.431 Fall 2010 Quiz 2 Solutions

Assignment 1: Probabilistic Reasoning, Maximum Likelihood, Classification

The Multivariate Gaussian Distribution [DRAFT]

IE598 Big Data Optimization Introduction

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian Networks Inference with Probabilistic Graphical Models

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Midterm: CS 6375 Spring 2015 Solutions

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Welcome to Physics 331: Introduction to Numerical Techniques in Physics

Machine Learning (CS 567) Lecture 2

18.05 Practice Final Exam

Lecture 11. Probability Theory: an Overveiw

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Naïve Bayes classification

Machine Learning Theory (CS 6783)

Algorithms for Uncertainty Quantification

Foundations of Machine Learning

MAT 271E Probability and Statistics

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Announcements. Proposals graded

Midterm: CS 6375 Spring 2018

15-780: Grad AI Lecture 17: Probability. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

Astro 32 - Galactic and Extragalactic Astrophysics/Spring 2016

Intelligent Systems (AI-2)

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

10-701: Introduction to Deep Neural Networks Machine Learning.

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

Quick Tour of Basic Probability Theory and Linear Algebra

Introduction to Bayesian Learning

10708 Graphical Models: Homework 2

Course Staff. Textbook

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

The remains of the course

Probabilistic Graphical Models

The Central Limit Theorem

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Introduction to Machine Learning

Advanced Machine Learning & Perception

Introduction to Probability and Statistics (Continued)

EXAM # 3 PLEASE SHOW ALL WORK!

X = X X n, + X 2

CS446: Machine Learning Spring Problem Set 4

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Intelligent Systems (AI-2)

Computational Genomics

Introduction to Machine Learning CMU-10701

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

ECE531: Principles of Detection and Estimation Course Introduction

Transcription:

Machine Learning Instructor: Pranjal Awasthi

Course Info Requested an SPN and emailed me Wait for Carol Difrancesco to give them out. Not registered and need SPN Email me after class No promises It s a large class so won t allow to sit in without registering. Sorry

Course Staff Instructor: Pranjal Awasthi (pranjal.awasthi@rutgers.edu) Research Interests: Semi supervised learning Clustering Online learning Learning theory Office hours: Monday 2-3pm. Hill 448. Course website: www.rci.rutgers.edu/~pa336/ml_s16.html

Course Staff TA: Yan Zhu (yanzhu.cbim@cs.rutgers.edu) Research Interests: Large scale machine learning Deep learning Computer vision Office hours: Friday 11-12am. CBIM.

Course Info No required textbook Recommended

Course Info ~ 5 Homeworks (40%) In class midterm (30%) (March 10 no makeup exam) Final project (30%) Zero tolerance for cheating Academic integrity policy Grading: [90-100] [85-89] [80-84] [75-79] [70-74] [< 70] A B+ B C+ C F

Homework Policy ~2 weeks/hw. Submit via sakai. Should be typeset in LaTex. See website. Late homeworks not accepted. No regrading policy. TA is the boss. Encouraged to discuss Write solution in your own words Write names of people you discussed with Start Early!

Typically two parts Conceptual/Analytical Programming Homework Policy Conceptual justify your solution, rigorous proofs when asked for. Aim to test fundamentals. Programming Matlab for homeworks. Justify your findings. Submit code. Well documented. Make sure the code runs. HW0 up on the webpage no need to submit

A word about the course The course is designed to be tough More theoretical than previous courses. Should be comfortable with basic probability, linear algebra, algorithms. If cannot do HW0, consider dropping the course. How can I do well? Come to lectures, ask questions. Take notes. Play around with data and methods.

What is Machine Learning? Statistics Computer Science The science of making sound inference and predictions from data. The study of algorithms that improve performance on a given task with time and experience. The part of AI that is actually useful!

History of Machine Learning? Pre 1950 s Statistics and probability theory 1950-80 s The AI phase Post 90 s modern machine learning

Pre 1950 s Collection and analysis of data has always been around Traditionally for governance and politics 1500 s collection of data on deaths, marriages, baptisms in England and France. Analyzed by humans. Not very scientific. 1700 s: Probability theory became a big tool Lot of work on studying gambling.

Pre 1950 s Pearson: Analyzed crab population near Naples Wanted to understand the nature of the population. Claimed that there are two underlying species. Statistical Modeling

Pre 1950 s I can taste and tell whether tea was added first or the milk. Hmm how do i verify that? Experimental Design

Pre 1950 s Lots of fundamental questions that are still relevant How to design an experiment? How to collect data and do survey/polls? How to choose between different hypothesis? Understand hidden structure in the data?

Post 1950 s CS enters AI is coined Can intelligent machines be built? Turing test

Post 1950 s 1952: Program for checkers

60 s: ELIZA Post 1950 s

Post 1950 s 70 s: MYCIN for medical diagnosis Knowledge base of ~600 rules Most machine learning systems we rule based or knowledge based Limitations quickly became clear

Post 90 s Statistical machine learning Data driven algorithm design

Modern ML ML algorithm

Modern ML ML algorithm

What you ll learn in this course ML algorithm Support vector machines, Naïve Bayes, Logistic regression, linear regression, Decision trees, Boosting, Graphical models, Reinforcement learning, Deep learning, Model selection, Optimization, Kernel methods, Learning theory, Bayesian methods, Semi supervised learning.

Probability Overview Random variable X a map from a set Ω to R Ω equipped with probability P. P X A = P(ω Ω: X ω A) X has distribution P, denoted as X P.

Probability Overview Cumulative Density Function(cdf) F X x = P(X x) If X is discrete probability mass function (pmf), p(x) P X = x = p(x) If X is continuous probability density function (pdf), p(x) P X A = A p(x) dx

Probability Overview Expected value of X E X = x p x dx (continuous) E X = x p x (discrete) Variance of X Var X = E X E X 2 = E X 2 E[X] 2

Probability Overview Independence: X and Y are independent iff P X A, Y B = P X A P Y B, A, B Covariance between X and Y Cov X, Y = E[(X E[X])(Y E[Y])] If X and Y are independent then Cov X, Y = 0 Var X + Y = Var X + Var Y

Probability Overview Conditional distribution Distribution of X conditioned on Y = y pdf: p x y = p x,y p(y) Joint distribution of X and Y pdf: p(x, y) marginal density of x, p x = y p x, y dy

Probability Inequalities Markov s inequality If X > 0, P X > te X 1 t Chebychev s inequality P X E X tvar X 1 t 2

Probability Inequalities Let X 1, X 2, X n be independent and identically distributed(i.i.d.), taking values in {0,1}. E X i = μ X n = i X i n Chernoff bound: For δ [0,1], P X n > μ 1 + δ e nμδ2 3 P X n < μ 1 δ e nμδ2 2

Probability Inequalities Let X 1, X 2, X n be independent and identically distributed(i.i.d.), taking values in {0,1}. E X i = μ X n = i X i n Hoeffding bound: For δ [0,1], P X n > μ + δ e 2nδ2 P X n < μ δ e 2nδ2

Onto new content

Point Estimation Goal: Estimate the bias of a coin. Why?? I came here to master deep learning

Point Estimation Given a coin Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Your idea: toss it a few times and see.. What is the estimate? How many flips needed?

Point Estimation A random variable X distributed according to D(θ) Given i.i.d. samples Goal: Estimate θ from D

Point Estimation Given a coin Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Your idea: toss it a few times and see.. What is the estimate? How many flips needed?

Point Estimation A random variable X distributed according to D(θ) Given i.i.d. samples Goal: Estimate θ from D Three methods Method of moments(mom) Maximum Likelihood Estimation(MLE) Bayesian Estimation ( )

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: match observed distribution to true distribution Moments: an elegant way to achieve this.

Method of Moments A random variable X distributed according to D(θ) Given i.i.d. samples from D Goal: Estimate θ Moments of X

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: match observed distribution to true distribution Moments: an elegant way to achieve this.

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: match observed distribution to true distribution Moments: an elegant way to achieve this. All moments of our distribution are p. What about moments of the observed data?

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: match observed distribution to true distribution Moments: an elegant way to achieve this. All moments of our distribution are p. What about moments of the observed data?

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: match observed distribution to true distribution Moments: an elegant way to achieve this. All moments of our distribution are p. All moments of the observed data are

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? How good is the estimate? How many samples(n) do we need?

Method of Moments How good is the estimate? Need a notion of error Mean squared error(mse)

Method of Moments How good is the estimate? Need a notion of error Mean squared error(mse)

Method of Moments

Method of Moments

Method of Moments How good is the estimate? Need a notion of error Mean squared error(mse)

Method of Moments How good is the estimate? Need a notion of error Mean squared error(mse) How many samples?

Method of Moments How good is the estimate? Need a notion of error Mean squared error(mse) How many samples?

Point Estimation A random variable X distributed according to D(θ) Given i.i.d. samples Goal: Estimate θ Your estimate:θ( ) from D Is MSE always equal to

Point Estimation A random variable X distributed according to D(θ) Given i.i.d. samples Goal: Estimate θ Your estimate:θ( ) from D

Point Estimation

Given a coin Method of Moments Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: match observed distribution to true distribution Moments: an elegant way to achieve this. A natural approach Matching moments = solving system of equations Equations get messy pretty soon! Limited algorithmic tools, limited theory what does the optimal classifier look like?

Maximum Likelihood Estimation Given a coin Comes up heads(1) with probability p. Comes up tails(0) with probability 1 p. Estimate p? Idea: find p that is most likely to generate the given data.

Maximum Likelihood Estimation A random variable X distributed according to D(θ) Given i.i.d. samples Goal: Estimate θ from D Idea: output መθ that is most likely to generate the data.