COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

Size: px
Start display at page:

Download "COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013"

Transcription

1 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce. hs model s very dfferent from models we saw earler n the semester. In ths model, the learner can see only one example at a tme, and has to make predctons based on the advce of experts and the hstory. Formally, the model can be wrtten as N = # of experts for t = 1,, each expert predcts ξ {0, 1} learner predcts ŷ {0, 1} learner observes y {0, 1} (mstake f y ŷ) In the prevous lecture, we gave a Halvng algorthm whch works when there exsts a perfect expert. In ths case, the number of mstakes that the Halvng algorthm makes s at most lg(n). 1. Connecton wth PAC learnng Now we consder an analogue of the PAC learnng model : hypothess space H = {h 1,, h N } target concept c H on each round get x predct ŷ observe c(x) hs problem s a very natural onlne learnng problem. However, we can relate ths problem to learnng wth expert advce by consderng each hypothess h as an expert. Snce the target concept s chosen nsde the hypothess space, we always have a perfect expert. herefore, we can apply the Halvng algorthm to solvng ths problem and the Halvng algorthm wll make at most lg(n) = lg( H ) mstakes.

2 Boundng the number of mstakes In the prevous secton, t s natural to ask whether the Halvng algorthm s the best possble. In order to answer ths problem, we have to defne what s best. Let A be any determnstc algorthm, defne M A (H) = max(# mstakes made by A) c,x M A (H) s the number of mstakes A wll make on hypothess space H n the worst case. An algorthm s consdered good f ts M A (H) s small. Now defne opt(h) = mn M A(H). A e are gong to lower bound opt(h) by V Cdm(H) as the followng theorem: heorem 1. V Cdm(H) opt(h) Proof: Let A be any determnstc algorthm. Let d = V Cdm(H). Let x 1,..., x d be shattered by H. Now we can construct an adversaral strategy that forces A to make d mstakes as followng: for t = 1,, d - present x t to A - ŷ t = A s predcton - choose y t ŷ t. Snce x 1,..., x d s shattered by H, we know there exsts a concept c H, such that c(x t ) = y t for all t. In addton, snce A s determnstc, we can smulate t ahead of tme. herefore such adversaral constructon s possble. hus there exsts an adversaral strategy to force A to make d mstakes. o sum up ths secton, we have the followng result, V Cdm(H) opt(h) M Halvng (H) lg( H ). 3 eghted majorty algorthm Now we are gong back to learnng wth expert advce and we want to devse algorthms when there s no perfect expert. In ths settng, we wll compare the number of mstakes of our algorthm and the number of mstakes of the best expert. Here we modfy the Halvng algorthm to get an algorthm when there s no perfect expert. Instead of dscardng experts, we wll keep a weght for each expert and lower t when ths expert makes a mstake. e call ths algorthm weghted majorty algorthm: w = weght on expert Parameter 0 β < 1 N = # of experts

3 Intally, w = 1 for t = 1,, - each expert predcts ξ {0, 1} - q 0 = :ξ =0 w, q 1 = :ξ =1 w. - learner predcts ŷ {0, 1} { 1 f q1 > q 0 - ŷ = (weghted majorty vote) 0 else - learner observes y {0, 1} - (mstake f y ŷ) - for each, f ξ y, then w w β Now we are gong to analyze weghted majorty algorthm(ma) by provng the followng theorem: heorem. (# of mstakes of MA) a β (# of mstakes of the best expert) +c β lg(n), where a β = lg(1/β) and c lg( 1+β ) β = 1 lg( ). 1+β Before provng ths theorem, let s frst try to understand what ths theorem mples. he followng table gves a good understandng of the parameter: β a β c β 1/ If we dvde both sdes of the nequalty by, we get (# of mstakes of MA) a β(# of mstakes of the best expert) + c β lg(n) hen +, c β lg(n) 0. hen ths theorem means that the rate that MA makes mstakes s bounded by a constant tmes the rate that the best expert makes mstakes. Proof: Defne as the sum of the weghts of all the experts: = N =1 w. Intally, we have = N. Now we consder how changes n some round. On some round, wthout loss of generalty, we assume that y = 0. hen new = N =1 = :ξ =1 w new w β + :ξ =0 = q 1 β + q 0 = q 1 β + ( q 1 ) = (1 β)q 1 w 3

4 Now suppose MA makes a mstake. hen ŷ y ŷ = 1 q 1 q 0 q 1 new (1 β) = (1 + β ) So f MA makes a mstake, the sum of weghts wll decrease by multplyng 1+β. herefore, after m mstakes, we get an upper bound of the sum of weghts as N ( 1 + β ) m. Defne L to be the number of mstakes that expert makes. hen we have, w = β L N ( 1 + β ) m. Solvng ths nequalty, and notng that t holds for all experts, we get m (mn L ) lg(1/β) + lg(n) lg( 1+β ). 4 Randomzed weghted majorty algorthm he prevous proof has shown that, at best, the number of mstakes of MA s at most twce the number of mstakes of the best expert. hs s a weak result f the best expert actually makes a substantal number of mstakes. In order to get a better algorthm, we have to ntroduce randomness. he next algorthm we are gong to ntroduce s called randomzed weghted majorty algorthm. hs algorthm s very smlar to weghted majorty algorthm. he only dfference s that rather than makng predcton by weghted majorty vote, n each round, the algorthm pcks an expert wth some probablty accordng to ts weght, and uses that randomly chosen expert to make ts predcton. he algorthm s as followng: w = weght on expert Parameter 0 β < 1 N = # of experts Intally, w = 1 for t = 1,, - each expert predcts ξ {0, 1} - q 0 = :ξ =0 w, q 1 = :ξ =1 w. - learner predcts ŷ {0, 1} 4

5 1 wth probablty - ŷ = 0 wth probablty - learner observes y {0, 1} - (mstake f y ŷ) :ξ =1 w :ξ =0 w - for each, f ξ y, then w w β = q 1 = q 0 Now let s analyze randomzed weght majorty algorthm(rma) by provng the followng theorem: heorem 3. E(# of mstakes of RMA) a β (# of mstakes of the best expert) +c β ln(n), where a β = ln(1/β) 1 β and c β = 1 1 β. Notce here we are consderng the expectaton of number of mstakes RMA makes because RMA s a randomzed algorthm. hen β 1, a β 1. hs means RMA can do really close to the optmal. Proof: Smlar to the proof of the prevous theorem, we prove ths theorem by consderng the sum of weghts. On some round, defne l as the probablty that RMA makes a mstake: :ξ l = P r[ŷ y] = y w. hen the weght n ths round s changed as new = w β + :ξ y :ξ =y = l β + (1 l) = (1 l(1 β)) w hen we can wrte the sum of weghts at the end as hen we have fnal = N (1 l 1 (1 β)) (1 l t (1 β)) N exp( l t (1 β)) = N exp( (1 β) l t ) β L w fnal N exp( (1 β) l t ). Defne L A = l t as the expected number of mstakes RMA makes. e have L A = l t ln(1/β) 1 (#of mstakes of the best expert) + ln N. 1 β 1 β 5

6 5 Varant of RMA Let s frst dscuss how to choose the parameter β for RMA. Suppose we have an upper on the number of mstakes that the best expert makes as mn L K, then we can choose 1 β as ln(n). hen we have 1+ K L A mn L + K ln(n) + ln(n). It s natural to ask whether we can do better than RMA. he answer s yes. he hgh level dea s to work n the mddle of MA and RMA. he followng fgure shows how ths algorthm works. he y-axs s the probablty of predctng ŷ as 1. And the x-axs s the weghted fracton of experts predctng 1. he red lne s MA, the blue lne s RMA, and the green lne s the new algorthm. ln(1/β) ln( he new algorthm can reach a β = and c 1 1+β ) β = ln( ). hese are exactly half of 1+β those of MA. And f we make the assumpton that mn L K agan, the new algorthm has L + K ln(n) + lg(n) L A mn. If we set K = 0 whch means there exsts a perfect expert, we have L A lg(n) hus we know ths algorthm s better than the Halvng algorthm. e usually can assume that K = by addng two experts: one always predcts 1 and the other always predcts 0. hen we have L A mn L + ln(n) 6 + lg(n).

7 If we dvde both sdes by, we get L A mn L + ln(n) + lg(n). hen gets large, the rght hand sde of the nequalty s domnated by the term 6 Lower bound ln(n). Fnally we are gong to gve a lower bound for learnng wth expert advce. hs lower bound matches the upper bound we get from the prevous secton even for constant factors. So ths lower bound s tght and we cannot mprove the algorthm n the prevous secton. o prove the lower bound, we consder the followng scenaro: On each round, the adversary chooses the predcton of each expert randomly, ξ = { 1 w.p. 1/ 0 w.p. 1/ On each round, the adversary chooses the feedback randomly, y = For any learnng algorthm A, E y,ξ [L A ] = E ξ [E y [L A ξ]] = /. { 1 w.p. 1/ 0 w.p. 1/ For any expert, we have However, we can prove that E[L ] = /. E[mn L ] ln(n). hus, ln(n) E[L A ] E[mn L ]. ln(n) he term meets the domnatng term n the prevous secton. Also, n the proof of the lower bound, the adversary only uses entrely random expert predctons and outcomes. So we would not gan anythng by assumng that the data s random rather than adversaral. he bounds we proved n an adversaral settng are just as good as could possbly be obtaned even f we assumed that the data are entrely random. 7

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

1 Review From Last Time

1 Review From Last Time COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven - exaples

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

8.6 The Complex Number System

8.6 The Complex Number System 8.6 The Complex Number System Earler n the chapter, we mentoned that we cannot have a negatve under a square root, snce the square of any postve or negatve number s always postve. In ths secton we want

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Lecture 5 Decoding Binary BCH Codes

Lecture 5 Decoding Binary BCH Codes Lecture 5 Decodng Bnary BCH Codes In ths class, we wll ntroduce dfferent methods for decodng BCH codes 51 Decodng the [15, 7, 5] 2 -BCH Code Consder the [15, 7, 5] 2 -code C we ntroduced n the last lecture

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Math 217 Fall 2013 Homework 2 Solutions

Math 217 Fall 2013 Homework 2 Solutions Math 17 Fall 013 Homework Solutons Due Thursday Sept. 6, 013 5pm Ths homework conssts of 6 problems of 5 ponts each. The total s 30. You need to fully justfy your answer prove that your functon ndeed has

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Finding Primitive Roots Pseudo-Deterministically

Finding Primitive Roots Pseudo-Deterministically Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

1 Definition of Rademacher Complexity

1 Definition of Rademacher Complexity COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Lecture 4: September 12

Lecture 4: September 12 36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

HMMT February 2016 February 20, 2016

HMMT February 2016 February 20, 2016 HMMT February 016 February 0, 016 Combnatorcs 1. For postve ntegers n, let S n be the set of ntegers x such that n dstnct lnes, no three concurrent, can dvde a plane nto x regons (for example, S = {3,

More information

A 2D Bounded Linear Program (H,c) 2D Linear Programming

A 2D Bounded Linear Program (H,c) 2D Linear Programming A 2D Bounded Lnear Program (H,c) h 3 v h 8 h 5 c h 4 h h 6 h 7 h 2 2D Lnear Programmng C s a polygonal regon, the ntersecton of n halfplanes. (H, c) s nfeasble, as C s empty. Feasble regon C s unbounded

More information

The Experts/Multiplicative Weights Algorithm and Applications

The Experts/Multiplicative Weights Algorithm and Applications Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Lecture 17. Solving LPs/SDPs using Multiplicative Weights Multiplicative Weights

Lecture 17. Solving LPs/SDPs using Multiplicative Weights Multiplicative Weights Lecture 7 Solvng LPs/SDPs usng Multplcatve Weghts In the last lecture we saw the Multplcatve Weghts (MW) algorthm and how t could be used to effectvely solve the experts problem n whch we have many experts

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13] Algorthms Lecture 11: Tal Inequaltes [Fa 13] If you hold a cat by the tal you learn thngs you cannot learn any other way. Mark Twan 11 Tal Inequaltes The smple recursve structure of skp lsts made t relatvely

More information

( ) 1/ 2. ( P SO2 )( P O2 ) 1/ 2.

( ) 1/ 2. ( P SO2 )( P O2 ) 1/ 2. Chemstry 360 Dr. Jean M. Standard Problem Set 9 Solutons. The followng chemcal reacton converts sulfur doxde to sulfur troxde. SO ( g) + O ( g) SO 3 ( l). (a.) Wrte the expresson for K eq for ths reacton.

More information

Volume 18 Figure 1. Notation 1. Notation 2. Observation 1. Remark 1. Remark 2. Remark 3. Remark 4. Remark 5. Remark 6. Theorem A [2]. Theorem B [2].

Volume 18 Figure 1. Notation 1. Notation 2. Observation 1. Remark 1. Remark 2. Remark 3. Remark 4. Remark 5. Remark 6. Theorem A [2]. Theorem B [2]. Bulletn of Mathematcal Scences and Applcatons Submtted: 016-04-07 ISSN: 78-9634, Vol. 18, pp 1-10 Revsed: 016-09-08 do:10.1805/www.scpress.com/bmsa.18.1 Accepted: 016-10-13 017 ScPress Ltd., Swtzerland

More information

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities Algorthms Non-Lecture E: Tal Inequaltes If you hold a cat by the tal you learn thngs you cannot learn any other way. Mar Twan E Tal Inequaltes The smple recursve structure of sp lsts made t relatvely easy

More information

Graph Reconstruction by Permutations

Graph Reconstruction by Permutations Graph Reconstructon by Permutatons Perre Ille and Wllam Kocay* Insttut de Mathémathques de Lumny CNRS UMR 6206 163 avenue de Lumny, Case 907 13288 Marselle Cedex 9, France e-mal: lle@ml.unv-mrs.fr Computer

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

For example, if the drawing pin was tossed 200 times and it landed point up on 140 of these trials,

For example, if the drawing pin was tossed 200 times and it landed point up on 140 of these trials, Probablty In ths actvty you wll use some real data to estmate the probablty of an event happenng. You wll also use a varety of methods to work out theoretcal probabltes. heoretcal and expermental probabltes

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Min Cut, Fast Cut, Polynomial Identities

Min Cut, Fast Cut, Polynomial Identities Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014) 0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes

More information

P exp(tx) = 1 + t 2k M 2k. k N

P exp(tx) = 1 + t 2k M 2k. k N 1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Lecture 4: Constant Time SVD Approximation

Lecture 4: Constant Time SVD Approximation Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),

More information

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we

More information

Machine Learning, 43, , 2001.

Machine Learning, 43, , 2001. Machne Learnng, 43, 65-9, 00. Drftng Games Robert E. Schapre (schapre@research.att.com) AT&T Labs Research, Shannon Laboratory, 80 Park Avenue, Room A79, Florham Park, NJ 0793-097 Abstract. We ntroduce

More information

Digital Signal Processing

Digital Signal Processing Dgtal Sgnal Processng Dscrete-tme System Analyss Manar Mohasen Offce: F8 Emal: manar.subh@ut.ac.r School of IT Engneerng Revew of Precedent Class Contnuous Sgnal The value of the sgnal s avalable over

More information

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8 U.C. Berkeley CS278: Computatonal Complexty Handout N8 Professor Luca Trevsan 2/21/2008 Notes for Lecture 8 1 Undrected Connectvty In the undrected s t connectvty problem (abbrevated ST-UCONN) we are gven

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 6 Luca Trevsan September, 07 Scrbed by Theo McKenze Lecture 6 In whch we study the spectrum of random graphs. Overvew When attemptng to fnd n polynomal

More information

First day August 1, Problems and Solutions

First day August 1, Problems and Solutions FOURTH INTERNATIONAL COMPETITION FOR UNIVERSITY STUDENTS IN MATHEMATICS July 30 August 4, 997, Plovdv, BULGARIA Frst day August, 997 Problems and Solutons Problem. Let {ε n } n= be a sequence of postve

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

Eigenvalues of Random Graphs

Eigenvalues of Random Graphs Spectral Graph Theory Lecture 2 Egenvalues of Random Graphs Danel A. Spelman November 4, 202 2. Introducton In ths lecture, we consder a random graph on n vertces n whch each edge s chosen to be n the

More information

between standard Gibbs free energies of formation for products and reactants, ΔG! R = ν i ΔG f,i, we

between standard Gibbs free energies of formation for products and reactants, ΔG! R = ν i ΔG f,i, we hermodynamcs, Statstcal hermodynamcs, and Knetcs 4 th Edton,. Engel & P. ed Ch. 6 Part Answers to Selected Problems Q6.. Q6.4. If ξ =0. mole at equlbrum, the reacton s not ery far along. hus, there would

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

Computational and Statistical Learning theory Assignment 4

Computational and Statistical Learning theory Assignment 4 Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)( ) 8/25/2010

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)( ) 8/25/2010 FE REVEW OPERATONAL AMPLFERS (OP-AMPS)( ) 1 The Op-amp 2 An op-amp has two nputs and one output. Note the op-amp below. The termnal labeled l wth the (-) sgn s the nvertng nput and the nput labeled wth

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Text S1: Detailed proofs for The time scale of evolutionary innovation

Text S1: Detailed proofs for The time scale of evolutionary innovation Text S: Detaled proofs for The tme scale of evolutonary nnovaton Krshnendu Chatterjee Andreas Pavloganns Ben Adlam Martn A. Nowak. Overvew and Organzaton We wll present detaled proofs of all our results.

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information