Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Size: px
Start display at page:

Download "Supervised Machine Learning (Spring 2014) Homework 2, sample solutions"

Transcription

1 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen Problem 1 The logarithmic loss is by definition Let y t 0 first. Now From the update rule we get L log (y, ŷ) { ln(1 ŷ) if y 0 ln ŷ if y 1. L log (y t, ŷ t ) ln(1 ŷ t ) ln(1 W t+1 v t,i h i (x t )). w t,i exp ( ηl log (0, h i (x t )) ) w t,i exp ( η( ln(1 h i (x t )) ) w t,i (1 h i (x t )), where we used the assumption η 1. Because c 1, and Let next y 1. In this case, P t P t+1 c ln W t c ln W t+1 ln W t+1 W t n ln w t,i(1 h i (x t )) ln ( 1 W t v t,i h i (x t ) ) L log (y t, ŷ t ). L log (y t, ŷ t ) ln ŷ t ln W t+1 v t,i h i (x t ), w t,i exp ( ηl log (1, h i (x t )) ) w t,i exp ( η( ln h i (x t )) ) w t,i h i (x t ).

2 Therefore, P t P t+1 ln W t+1 W t n ln w t,ih i (x t ) ln W t v t,i h i (x t ) L log (y t, ŷ t ) which completes the proof. Loss bound. Let η c 1 as before and let H {h 1, h,..., h n }. We get the bound directly by using the previous result L log (y t, ŷ t ) P t P t+1. For all h H we have the bound L log (S, W A) T L log (y t, ŷ t ) t1 T (P t P t+1 ) t1 P 1 P T +1 c ln W 1 c ln W T +1 T ln n ln exp( ηl log (y t, h(x t ))) ln n Taking the minimum over H gives the bound Problem (a) t1 T L log (y t, h(x t )) t1 ln n + L log (S, h). L log (S, W A) min h H L log(s, h i ) + ln n. Let A i be the event heads come up in the toss number i {1,,..., 10}. Then 10 P(A 1, A,..., A 10 ) P(A i ) ( ) 10 1 because the tosses are independently and identically-distributed. This is an example of a binomial random variable. The number of events heads come up happening in a series of 10 fair coin tosses can be modelled as a random variable X Bin(10, 1/), and P(X 10) ( 10) (1/) 10 (1/) 0 (1/) 10. (b) From a) we know that the probability that a given coin does not come up heads 10 times is q 1 (1/) 10. Because the coins tosses are independent, the probability that no coin comes up heads

3 10 times is q Finally, the probability that at least one coin comes up heads 10 times is ( ( ) ) q ( ) We can formulate the situation also as follows. The series of 10 tosses with 1000 coins are again a sequence of repeated experiments. Let the length of the sequence be n 1000, and let the probability of success (10 heads with a single coin) in an experiment be p 10. The number of successes X in the sequence is a binomial distributed random variable X Bin(n, p). What is asked is the probability P(X 1) 1 P(X 0) 1 ( n 0) p 0 (1 p) n 1 (1 p) (c) Let B i be the event coin number i comes up heads 10 times and let C be the event at least one coin comes up heads 10 times. The union bound gives us the estimate ( 1000 ) 1000 ( ) 10 1 P(C) P B i P(B i ) Our estimate turns out to be very inaccurate. Actually, if we had considered coin tosses, we would have got only the trivial bound P(C) 1. Exercise 3 (a) We calculate the risk using the expression from page 4 of lecture notes: R(h) ν + (1 ν)p X (x), (1) Using this for h f we directly get h(x) f(x) R(f) x νp X (x) ν. On the other hand, for any h such that P X (h(x) f(x)) > ɛ, using (1) we get R(h) ν + (1 ν)p X (h(x) f(x)) > ν + ε(1 ν) since we assume ν < 1/. (b) Following the proof of Theorem 1.9, we are going to show that with probability at least 1 we have ɛ(1 ν) ˆR(f) R(f) + () for the target classifier, and ɛ(1 ν) ˆR(h) R(h). (3) 3

4 for all h f. Combining () and (3) with the previous estimates shows that, for any h such that P X (h(x) f(x)) > ɛ holds, we have ɛ(1 ν) ˆR(h) R(h) ɛ(1 ν) > ν + ε(1 ν) ɛ(1 ν) R(f) + ˆR(f). Hence, ERM cannot pick any such h as its hypothesis. Therefore, if we prove that () and (3) hold with probability at least 1, we have the desired result. We use Hoeffding s inequality ( t ) Pr S ES] + t] exp m (b. i a i ) The probability that () fails is estimated as ] ɛ(1 ν) Pr ˆR(f) > R(f) + ] Pr m ˆR(f) ɛ(1 ν) > mr(f) + m ( exp m ɛ (1 ν) ) /4 m exp ( mɛ (1 ν) / ). Since we assume this implies Pr Similarly, we get ˆR(f) > R(f) + H m ɛ ln, (1 ν) ] ( ɛ(1 ν) exp ln H ) ( exp ln Pr ˆR(h) R(h) The union bound then gives the final results. Problem 4 (a) ] ɛ(1 ν) H. ) H H H. Positive examples Each positive example (x j, 1) in the sample will be classified by ĥ as positive iff ĥ does not contain a literal whose negation is in x j. The negations of the literals in x j have been removed from ĥ during training, so all positive examples are classified correctly by ĥ. Negative examples All negative examples (x j, 0) are classified as negative if all the literals of h are also in ĥ. The algorithm proceeds by removing from L only the literals that are contradicting with some x j. Those contradicting literals cannot be in h, because h classifies all positive examples correctly. Thus, no literal of h has been removed from ĥ. 4

5 (b) As we argued in part (a), the hypothesis ĥ includes all the literals of the target h. Therefore, In addition to this, we need to show Pr x ĥ(x) 1 and h (x) 0] 0. Pr x ĥ(x) 0 and h (x) 1] ε holds with probability at least 1. Fix ε > 0, and call a literal dangerous if it has probability greater than ε/n of being false on a positive example. More precisely, denote by ṽ(x) the value of literal ṽ on instance x: if ṽ v i, then ṽ(x) x i, and if ṽ v i, then ṽ(x) 1 x i. Then a literal ṽ is dangerous if Pr x ṽ(x) 0 and h (x) 1] > ε. There are at most n literals ṽ in ĥ. If ĥ(x) 0 and h (x) 1, then at least one of the literals in ĥ satisfies ṽ(x) 0 and h (x) 1. If none of the literals in ĥ are dangerous, then by union bound the probability of drawing x such that ṽ(x) 0 and h (x) 1 holds for at least one ṽ in ĥ is at most n (ε/n) ε. Thus, we are done if we can show that with probability at least 1, no dangerous literals remain. There are n literals to consider, so by union bound it is sufficient to show that for a fixed dangerous literal, the probability that it remains is at most /(n). If a literal ṽ is dangerous, then each example x has probability at least ε/n of satisfying both h (x) 1 and ṽ(x) 0, causing ṽ to be removed from ĥ. The probability that ṽ remains after m examples is at most ( 1 ε ) m ( exp mε ). n n For any this is at most /(n), as desired. (c) m n ε ln n First, there is one conjunction with no positive examples, which we can represent, for example, as x 1 x 1. Consider now conjunctions that have at least one positive example. For each variable i, there are three alternatives: literal v i is included, but literal v i is not literal v i is included, but literal v i is not neither literal v i nor v i is included. Since we can choose from these three alternatives independently for each of the n variables, we get 3 n conjunctions. This includes the conjunction with no literals at all, which by convention represents the hypothesis that always outputs 1. We therefore get C n 3 n + 1. The required number can be calculated directly with the theorem, m ln (ln ( ) ln 0.001) , 5

6 and 1168 is enough. Calculating m with the formula in the previous part gives Hence, m 107 is enough. Exercise 5 m n ɛ ln n ln The Set Cover problem can be formulated as follows. Let A {a 1, a,..., a n } be a set, and let U {B 1, B,..., B k } P(A) be a collection of subsets of A such that C U C A. We are asked to find the smallest V U such that C V C A. In other words, we should cover the set A with the smallest possible number of sets from U. We show how the Set Cover problem can be solved with an algorithm that takes as input a sample ((x 1, y 1 ),..., (x m, y m )) and outputs a monotone conjunction f such that ˆR(f) is minimized. We generate input vectors with label 0 for every a A and vectors with label 1 for every B U. The coordinates of an input vector correspond to the sets in U. In the monotone conjunction f, the variables indicate which sets belong to the set cover. For every i {1,,..., n}, make two identical sample pairs (x i, 0) (x i+n, 0) ((x i1, x i,..., x ik ), 0) such that x ij 0 if a i B j and x ij 1 otherwise. We call the vectors of the pairs 0-vectors. If the outputted conjunction corresponds to a set cover, all the labels of these vectors are correctly predicted. And if some element a A belongs to none of the sets indicated by f, that causes two prediction errors. Also, make for every i {1,,..., k} a pair (x n+i, 1) ((x n+i,1, x n+i,,..., x n+i,k ), 1) where x n+i,j 0 if i j and x n+i,j 1 otherwise. We call these 1-vectors. The purpose of 1-vectors is to incur one prediction error for each set included to the set cover. Using this input, assume that the output f of the algorithm corresponds to a collection V U, and there is a A such that a / C V C. But then we can improve f and ˆR(f) by choosing some B U such that a B. Including B increases the number of prediction errors in the set of 1-vectors by 1, and covering a reduces the number of errors among 0-vectors by. So f always corresponds to a set cover. For every set cover, all prediction errors happen with the 1-vectors, and the number of prediction errors is equal to the size of the set cover. Thus, the output of the algorithm is the smallest set cover. The sample can be generated in a linear time, so the given problem is NP-hard. 6

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016 12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3. Example: A fair die is tossed and we want to guess the outcome. The outcomes will be 1, 2, 3, 4, 5, 6 with equal probability 1 6 each. If we are interested in getting the following results: A = {1, 3,

More information

1 The Probably Approximately Correct (PAC) Model

1 The Probably Approximately Correct (PAC) Model COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

MA : Introductory Probability

MA : Introductory Probability MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:

More information

Hierarchical Concept Learning

Hierarchical Concept Learning COMS 6998-4 Fall 2017 Octorber 30, 2017 Hierarchical Concept Learning Presenter: Xuefeng Hu Scribe: Qinyao He 1 Introduction It has been shown that learning arbitrary polynomial-size circuits is computationally

More information

Statistical Learning Theory: Generalization Error Bounds

Statistical Learning Theory: Generalization Error Bounds Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Classification: The PAC Learning Framework

Classification: The PAC Learning Framework Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine

More information

Chapter 18 Sampling Distribution Models

Chapter 18 Sampling Distribution Models Chapter 18 Sampling Distribution Models The histogram above is a simulation of what we'd get if we could see all the proportions from all possible samples. The distribution has a special name. It's called

More information

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016 Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of

More information

Statistical Theory 1

Statistical Theory 1 Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is

More information

Homework 4 Solutions

Homework 4 Solutions CS 174: Combinatorics and Discrete Probability Fall 01 Homework 4 Solutions Problem 1. (Exercise 3.4 from MU 5 points) Recall the randomized algorithm discussed in class for finding the median of a set

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Homework #1 RELEASE DATE: 09/26/2013 DUE DATE: 10/14/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Homework #1 RELEASE DATE: 09/26/2013 DUE DATE: 10/14/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Homework #1 REEASE DATE: 09/6/013 DUE DATE: 10/14/013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIAS ARE WECOMED ON THE FORUM. Unless granted by the instructor in advance, you must turn in a printed/written

More information

2. AXIOMATIC PROBABILITY

2. AXIOMATIC PROBABILITY IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop

More information

http://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Real Variables: Solutions to Homework 3

Real Variables: Solutions to Homework 3 Real Variables: Solutions to Homework 3 September 3, 011 Exercise 0.1. Chapter 3, # : Show that the cantor set C consists of all x such that x has some triadic expansion for which every is either 0 or.

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with

More information

ORIE 4741: Learning with Big Messy Data. Generalization

ORIE 4741: Learning with Big Messy Data. Generalization ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

PAC Model and Generalization Bounds

PAC Model and Generalization Bounds PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating

More information

Name: Firas Rassoul-Agha

Name: Firas Rassoul-Agha Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE

More information

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation EE 178 Probabilistic Systems Analysis Spring 2018 Lecture 6 Random Variables: Probability Mass Function and Expectation Probability Mass Function When we introduce the basic probability model in Note 1,

More information

Midterm #1 - Solutions

Midterm #1 - Solutions Midterm # - olutions Math/tat 94 Quizzes. Let A be the event Andrea and Bill are both in class. The complementary event is (choose one): A c = Neither Andrea nor Bill are in class A c = Bill is not in

More information

ASSIGNMENT 1 SOLUTIONS

ASSIGNMENT 1 SOLUTIONS MATH 271 ASSIGNMENT 1 SOLUTIONS 1. (a) Let S be the statement For all integers n, if n is even then 3n 11 is odd. Is S true? Give a proof or counterexample. (b) Write out the contrapositive of statement

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Learning From Data Lecture 3 Is Learning Feasible?

Learning From Data Lecture 3 Is Learning Feasible? Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron

More information

Learning theory Lecture 4

Learning theory Lecture 4 Learning theory Lecture 4 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What s next We gave several machine learning algorithms: Perceptron Linear support vector

More information

Chapter 10 Markov Chains and Transition Matrices

Chapter 10 Markov Chains and Transition Matrices Finite Mathematics (Mat 119) Lecture week 3 Dr. Firozzaman Department of Mathematics and Statistics Arizona State University Chapter 10 Markov Chains and Transition Matrices A Markov Chain is a sequence

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Show Your Work! Point values are in square brackets. There are 35 points possible. Some facts about sets are on the last page.

Show Your Work! Point values are in square brackets. There are 35 points possible. Some facts about sets are on the last page. Formal Methods Name: Key Midterm 2, Spring, 2007 Show Your Work! Point values are in square brackets. There are 35 points possible. Some facts about sets are on the last page.. Determine whether each of

More information

Lecture Lecture 5

Lecture Lecture 5 Lecture 4 --- Lecture 5 A. Basic Concepts (4.1-4.2) 1. Experiment: A process of observing a phenomenon that has variation in its outcome. Examples: (E1). Rolling a die, (E2). Drawing a card form a shuffled

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

Carmen s Core Concepts (Math 135)

Carmen s Core Concepts (Math 135) Carmen s Core Concepts (Math 135) Carmen Bruni University of Waterloo Week 3 1 Translating From Mathematics to English 2 Contrapositive 3 Example of Contrapositive 4 Types of Implications 5 Contradiction

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Probability Notes (A) , Fall 2010

Probability Notes (A) , Fall 2010 Probability Notes (A) 18.310, Fall 2010 We are going to be spending around four lectures on probability theory this year. These notes cover approximately the first three lectures on it. Probability theory

More information

M17 MAT25-21 HOMEWORK 6

M17 MAT25-21 HOMEWORK 6 M17 MAT25-21 HOMEWORK 6 DUE 10:00AM WEDNESDAY SEPTEMBER 13TH 1. To Hand In Double Series. The exercises in this section will guide you to complete the proof of the following theorem: Theorem 1: Absolute

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Assignment 4: Solutions

Assignment 4: Solutions Math 340: Discrete Structures II Assignment 4: Solutions. Random Walks. Consider a random walk on an connected, non-bipartite, undirected graph G. Show that, in the long run, the walk will traverse each

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

A simple algorithmic explanation for the concentration of measure phenomenon

A simple algorithmic explanation for the concentration of measure phenomenon A simple algorithmic explanation for the concentration of measure phenomenon Igor C. Oliveira October 10, 014 Abstract We give an elementary algorithmic argument that sheds light on the concentration of

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 4: Feasibility of Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Recap of Basic Probability Theory

Recap of Basic Probability Theory 02407 Stochastic Processes? Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk

More information

Reasoning with Probabilities. Eric Pacuit Joshua Sack. Outline. Basic probability logic. Probabilistic Epistemic Logic.

Reasoning with Probabilities. Eric Pacuit Joshua Sack. Outline. Basic probability logic. Probabilistic Epistemic Logic. Reasoning with July 28, 2009 Plan for the Course Day 1: Introduction and Background Day 2: s Day 3: Dynamic s Day 4: Reasoning with Day 5: Conclusions and General Issues Probability language Let Φ be a

More information

Introduction to Statistics

Introduction to Statistics MTH4106 Introduction to Statistics Notes 6 Spring 2013 Testing Hypotheses about a Proportion Example Pete s Pizza Palace offers a choice of three toppings. Pete has noticed that rather few customers ask

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

ORIE 4741 Final Exam

ORIE 4741 Final Exam ORIE 4741 Final Exam December 15, 2016 Rules for the exam. Write your name and NetID at the top of the exam. The exam is 2.5 hours long. Every multiple choice or true false question is worth 1 point. Every

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

A PECULIAR COIN-TOSSING MODEL

A PECULIAR COIN-TOSSING MODEL A PECULIAR COIN-TOSSING MODEL EDWARD J. GREEN 1. Coin tossing according to de Finetti A coin is drawn at random from a finite set of coins. Each coin generates an i.i.d. sequence of outcomes (heads or

More information

Homework 3 Solutions

Homework 3 Solutions 5233/IOC5063 Theory of Cryptology, Fall 205 Instructor Prof. Wen-Guey Tzeng Homework 3 Solutions 7-Dec-205 Scribe Amir Rezapour. Consider an unfair coin with head probability 0.5. Assume that the coin

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Lecture 6: The Pigeonhole Principle and Probability Spaces

Lecture 6: The Pigeonhole Principle and Probability Spaces Lecture 6: The Pigeonhole Principle and Probability Spaces Anup Rao January 17, 2018 We discuss the pigeonhole principle and probability spaces. Pigeonhole Principle The pigeonhole principle is an extremely

More information

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions CS 70 Discrete Mathematics and Probability Theory Fall 20 Rao Midterm 2 Solutions True/False. [24 pts] Circle one of the provided answers please! No negative points will be assigned for incorrect answers.

More information