Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Size: px
Start display at page:

Download "Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)"

Transcription

1 Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web page whe you re ready to take it (available by ed of day Moday 2 hours to complete Must had-i (or i by 11:59pm Friday Oct. 7 Ca use: class otes, your otes, the book, your assigmets ad Wikipedia. You may ot use: your eighbor, aythig else o the web, etc. 1

2 What ca be covered Aythig we ve talked about i class Aythig i the readig (these are ot ecessarily the same thigs Aythig we ve covered i the assigmets Midterm topics Machie learig basics - differet types of learig problems - feature-based machie learig - data assumptios/data geeratig distributio Classificatio problem setup Proper experimetatio - trai/dev/test - evaluatio/accuracy/traiig error - optimizig hyperparameters Midterm topics Learig algorithms - Decisio trees - K-NN - Perceptro - Gradiet descet Algorithm properties - traiig/learig - ratioal/why it works - classifyig - hyperparameters - avoidig overfittig - algorithm variats/improvemets Midterm topics Geometric view of data - distaces betwee examples - decisio boudaries Features - example features - removig erroeous features/pickig good features - challeges with high-dimesioal data - feature ormalizatio Other pre-processig - outlier detectio 2

3 Midterm topics Comparig algorithms - -fold cross validatio - leave oe out validatio - bootstrap resamplig - t-test imbalaced data - evaluatio - precisio/recall, F1, AUC - subsamplig - oversamplig - weighted biary classifiers Midterm topics Multiclass classificatio - Modifyig existig approaches - Usig biary classifier - OVA - AVA - Tree-based - micro- vs. macro-averagig Rakig - usig biary classifier - usig weighted biary classifier - evaluatio Midterm topics Gradiet descet - 0/1 loss - Surrogate loss fuctios - Covexity - miimizatio algorithm - regularizatio - differet regularizers - p-orms Misc - good codig habits - JavaDoc Midterm geeral advice 2 hours goes by fast! - Do t pla o lookig everythig up - Lookup equatios, algorithms, radom details - Make sure you uderstad the key cocepts - Do t sped too much time o ay oe questio - Skip questios you re stuck o ad come back to them - Watch the time as you go Be careful o the T/F questios For writte questios - thik before you write - make your argumet/aalysis clear ad cocise 3

4 How may have you heard of? (Ordiary Least squares Ridge regressio Lasso regressio Elastic regressio Logistic regressio Model-based machie learig 1. pick a model 2. pick a criteria to optimize (aka objective fuctio 1[ y i + b 0] 3. develop a learig algorithm [ ] argmi w,b 1 y i + b 0 m 0 = b + Fid w ad b that miimize the 0/1 loss Model-based machie learig 1. pick a model 2. pick a criteria to optimize (aka objective fuctio 3. develop a learig algorithm argmi w,b m 0 = b + use a covex surrogate loss fuctio Fid w ad b that miimize the surrogate loss Surrogate loss fuctios 0/1 loss: Hige: Expoetial: l(y, y' =1[ yy' 0] l(y, y' = max(0,1 yy' l(y, y' = exp( yy' Squared loss: l(y, y' = (y y' 2 4

5 Fidig the miimum Gradiet descet! pick a startig poit (w! repeat util loss does t decrease i ay dimesio:! pick a dimesio! move a small amout i that dimesio towards decreasig loss (usig the derivative = η d d loss(w You re blidfolded, but you ca see out of the bottom of the blidfold to the groud right by your feet. I drop you off somewhere ad tell you that you re i a covex shaped valley ad escape is at the bottom/miimum. How do you get out? Perceptro learig algorithm! repeat util covergece (or for some # of iteratios: for each traiig example (f 1, f 2,, f m, label: predictio = b + m if predictio * label 0: // they do t agree for each : Note: for gradiet descet, we always update = + *label b = b + label The costat c = η learig rate label predictio Whe is this large/small? = +ηy i or = + y i c where c = η 5

6 The costat Oe cocer c = η label predictio If they re the same sig, as the predicted gets larger there update gets smaller argmi w,b We re calculatig this o the traiig set We still eed to be careful about overfittig! w loss If they re differet, the more differet they are, the bigger the update The mi w,b o the traiig set is geerally NOT the mi for the test set How did we deal with this for the perceptro algorithm? Overfittig revisited: regularizatio A regularizer is a additioal criterio to the loss fuctio to make sure that we do t overfit It s called a regularizer sice it tries to keep the parameters more ormal/regular It is a bias o the model that forces the learig to prefer certai types of weights over others argmi w,b loss(yy'+ λ regularizer(w, b Regularizers 0 = b + Should we allow all possible weights? Ay prefereces? What makes for a simpler model for a liear model? 6

7 Regularizers Regularizers 0 = b + Geerally, we do t wat huge weights If weights are large, a small chage i a feature ca result i a large chage i the predictio Also gives too much weight to ay oe feature How do we ecourage small weights? or pealize large weights? argmi w,b loss(yy'+ λ regularizer(w, b 0 = b + Might also prefer weights of 0 for features that are t useful Commo regularizers Commo regularizers sum of the weights r(w, b = sum of the squared weights r(w, b = 2 sum of the weights sum of the squared weights r(w, b = r(w, b = 2 What s the differece betwee these? Squared weights pealizes large values more Sum of weights will pealize small values more 7

8 p-orm p-orms visualized sum of the weights (1-orm r(w, b = sum of the squared weights (2-orm r(w, b = 2 w 1 lies idicate pealty = 1 w 2 p p-orm r(w, b = p = w p Smaller values of p (p < 2 ecourage sparser vectors Larger values of p discourage large weights more For example, if w 1 = 0.5 p w p-orms visualized Model-based machie learig all p-orms pealize larger weights p < 2 teds to create sparse (i.e. lots of 0 weights p > 2 teds to like similar weights 1. pick a model 0 = b + 2. pick a criteria to optimize (aka objective fuctio 3. develop a learig algorithm loss(yy' + λregularizer(w argmi w,b loss(yy' + λregularizer(w Fid w ad b that miimize 8

9 Miimizig with a regularizer Covexity revisited We kow how to solve covex miimizatio problems usig gradiet descet: argmi w,b argmi w,b loss(yy' If we ca esure that the loss + regularizer is covex the we could still use gradiet descet: loss(yy' + λregularizer(w make covex Oe defiitio: The lie segmet betwee ay two poits o the fuctio is above the fuctio Mathematically, f is covex if for all x 1, x 2 : f (tx 1 tf (x 1 + (1 t f (x 2 0 < t <1 the value of the fuctio at some poit betwee x 1 ad x 2 the value at some poit o the lie segmet betwee x 1 ad x 2 Addig covex fuctios Claim: If f ad g are covex fuctios the so is the fuctio z=f+g Prove: z(tx 1 tz(x 1 + (1 tz(x 2 0 < t <1 Mathematically, f is covex if for all x 1, x 2 : f (tx 1 tf (x 1 + (1 t f (x 2 0 < t <1 Addig covex fuctios By defiitio of the sum of two fuctios: z(tx 1 = f (tx 1 + g(tx 1 tz(x 1 + (1 tz(x 2 = tf (x 1 + tg(x 1 + (1 t f (x 2 + (1 tg(x 2 = tf (x 1 + (1 t f (x 2 + tg(x 1 + (1 tg(x 2 The, give that: f (tx 1 tf (x 1 + (1 t f (x 2 We kow: So: g(tx 1 tg(x 1 + (1 tg(x 2 f (tx 1 + g(tx 1 tf (x 1 + (1 t f (x 2 + tg(x 1 + (1 tg(x 2 z(tx 1 tz(x 1 + (1 tz(x 2 9

10 Miimizig with a regularizer p-orms are covex We kow how to solve covex miimizatio problems usig gradiet descet: argmi w,b loss(yy' If we ca esure that the loss + regularizer is covex the we could still use gradiet descet: r(w, b = p p = w p p-orms are covex for p >= 1 argmi w,b loss(yy' + λregularizer(w covex as log as both loss ad regularizer are covex Model-based machie learig 1. pick a model 0 = b + 2. pick a criteria to optimize (aka objective fuctio + λ 2 w 2 3. develop a learig algorithm argmi w,b + λ 2 w 2 Fid w ad b that miimize Our optimizatio criterio argmi w,b + λ 2 w 2 Loss fuctio: pealizes examples where the predictio is differet tha the label Regularizer: pealizes large weights Key: this fuctio is covex allowig us to use gradiet descet 10

11 Gradiet descet Some more maths! pick a startig poit (w! repeat util loss does t decrease i ay dimesio:! pick a dimesio! move a small amout i that dimesio towards decreasig loss (usig the derivative = η d d (loss(w+ regularizer(w, b argmi w,b + λ 2 w 2 d d objective = d + λ d 2 w 2 = y i + λ (some math happes Gradiet descet The update! pick a startig poit (w! repeat util loss does t decrease i ay dimesio:! pick a dimesio! move a small amout i that dimesio towards decreasig loss (usig the derivative = η d (loss(w+ regularizer(w, b d = +ηy i ηλ learig rate directio to regularizatio update costat: how far from wrog = +η y i ηλ What effect does the regularizer have? 11

12 The update L1 regularizatio = +ηy i ηλ learig rate directio to regularizatio update costat: how far from wrog If is positive, reduces If is egative, icreases moves towards 0 argmi w,b d d objective = + w d d + λ w = y i + λsig( L1 regularizatio = +ηy i ηλsig( L1 regularizatio = +ηy i ηλsig( learig rate directio to regularizatio update costat: how far from wrog learig rate directio to regularizatio update costat: how far from wrog What effect does the regularizer have? If is positive, reduces by a costat If is egative, icreases by a costat moves towards 0 regardless of magitude 12

13 Regularizatio with p-orms L1: = +η(loss _ correctio λsig( L2: = +η(loss _ correctio λ Lp: = +η(loss _ correctio λcw p 1 j How do higher order orms affect the weights? Model-based machie learig develop a learig algorithm argmi w,b + λ 2 w 2 Fid w ad b that miimize Is gradiet descet the oly way to fid w ad b? No! May other ways to fid the miimum. Some are do t eve require iteratio Whole field called covex optimizatio Regularizers summarized L1 is popular because it teds to result i sparse solutios (i.e. lots of zero weights However, it is ot differetiable, so it oly works for gradiet descet solvers L2 is also popular because for some loss fuctios, it ca be solved directly (o gradiet descet required, though ofte iterative solvers still Lp is less popular sice they do t ted to shrik the weights eough The other loss fuctios Without regularizatio, the geeric update is: = +ηy i c where c = c =1[yy' <1] expoetial hige loss = +η(y i + b squared error 13

14 May tools support these differet combiatios Look at scikit learig package: Commo ames (Ordiary Least squares: squared loss Ridge regressio: squared loss with L2 regularizatio Lasso regressio: squared loss with L1 regularizatio Elastic regressio: squared loss with L1 AND L2 regularizatio Logistic regressio: logistic loss 14

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Math 312 Lecture Notes One Dimensional Maps

Math 312 Lecture Notes One Dimensional Maps Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

10.6 ALTERNATING SERIES

10.6 ALTERNATING SERIES 0.6 Alteratig Series Cotemporary Calculus 0.6 ALTERNATING SERIES I the last two sectios we cosidered tests for the covergece of series whose terms were all positive. I this sectio we examie series whose

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor. Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat

More information

7 LINEAR MODELS. 7.1 The Optimization Framework for Linear Models. Learning Objectives:

7 LINEAR MODELS. 7.1 The Optimization Framework for Linear Models. Learning Objectives: 7 LINEAR MODELS The essece of mathematics is ot to make simple thigs complicated, but to make complicated thigs simple. Staley Gudder I Chapter 4, you leared about the perceptro algorithm for liear classificatio.

More information

A Course in Machine Learning

A Course in Machine Learning A Course i Machie Learig Hal Daumé III 6 LINEAR MODELS The essece of mathematics is ot to make simple thigs complicated, but to make complicated thigs simple. Staley Gudder I Chapter, you leared about

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b Logistic Regressio Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

1 Generating functions for balls in boxes

1 Generating functions for balls in boxes Math 566 Fall 05 Some otes o geeratig fuctios Give a sequece a 0, a, a,..., a,..., a geeratig fuctio some way of represetig the sequece as a fuctio. There are may ways to do this, with the most commo ways

More information

ME 539, Fall 2008: Learning-Based Control

ME 539, Fall 2008: Learning-Based Control ME 539, Fall 2008: Learig-Based Cotrol Neural Network Basics 10/1/2008 & 10/6/2008 Uiversity Orego State Neural Network Basics Questios??? Aoucemet: Homework 1 has bee posted Due Friday 10/10/08 at oo

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Recitation 4: Lagrange Multipliers and Integration

Recitation 4: Lagrange Multipliers and Integration Math 1c TA: Padraic Bartlett Recitatio 4: Lagrage Multipliers ad Itegratio Week 4 Caltech 211 1 Radom Questio Hey! So, this radom questio is pretty tightly tied to today s lecture ad the cocept of cotet

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

is also known as the general term of the sequence

is also known as the general term of the sequence Lesso : Sequeces ad Series Outlie Objectives: I ca determie whether a sequece has a patter. I ca determie whether a sequece ca be geeralized to fid a formula for the geeral term i the sequece. I ca determie

More information

AP Calculus BC Review Applications of Derivatives (Chapter 4) and f,

AP Calculus BC Review Applications of Derivatives (Chapter 4) and f, AP alculus B Review Applicatios of Derivatives (hapter ) Thigs to Kow ad Be Able to Do Defiitios of the followig i terms of derivatives, ad how to fid them: critical poit, global miima/maima, local (relative)

More information

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading : ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

Measures of Spread: Standard Deviation

Measures of Spread: Standard Deviation Measures of Spread: Stadard Deviatio So far i our study of umerical measures used to describe data sets, we have focused o the mea ad the media. These measures of ceter tell us the most typical value of

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

We will conclude the chapter with the study a few methods and techniques which are useful

We will conclude the chapter with the study a few methods and techniques which are useful Chapter : Coordiate geometry: I this chapter we will lear about the mai priciples of graphig i a dimesioal (D) Cartesia system of coordiates. We will focus o drawig lies ad the characteristics of the graphs

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Root Finding COS 323

Root Finding COS 323 Root Fidig COS 323 Remider Sig up for Piazza Assigmet 0 is posted, due Tue 9/25 Last time.. Floatig poit umbers ad precisio Machie epsilo Sources of error Sesitivity ad coditioig Stability ad accuracy

More information

MA131 - Analysis 1. Workbook 3 Sequences II

MA131 - Analysis 1. Workbook 3 Sequences II MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

CS537. Numerical Analysis and Computing

CS537. Numerical Analysis and Computing CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex

More information

Homework #2. CSE 546: Machine Learning Prof. Kevin Jamieson Due: 11/2 11:59 PM

Homework #2. CSE 546: Machine Learning Prof. Kevin Jamieson Due: 11/2 11:59 PM Homework #2 CSE 546: Machie Learig Prof. Kevi Jamieso Due: 11/2 11:59 PM 1 A Taste of Learig Theory 1. [5 poits] For i = 1,..., fix x i R d ad let y i = x T i w i.i.d. + ɛ i where ɛ i N (0, σ 2 ). All

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb 9 2011 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios

More information

Lecture #18

Lecture #18 18-1 Variatioal Method (See CTDL 1148-1155, [Variatioal Method] 252-263, 295-307[Desity Matrices]) Last time: Quasi-Degeeracy Diagoalize a part of ifiite H * sub-matrix : H (0) + H (1) * correctios for

More information

Lecture 11: Decision Trees

Lecture 11: Decision Trees ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other Liear Regressio Aalysis Aalysis of paired data ad usig a give value of oe variable to predict the value of the other 5 5 15 15 1 1 5 5 1 3 4 5 6 7 8 1 3 4 5 6 7 8 Liear Regressio Aalysis E: The chirp rate

More information

4. Linear Classification. Kai Yu

4. Linear Classification. Kai Yu 4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model

More information

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

September 2012 C1 Note. C1 Notes (Edexcel) Copyright   - For AS, A2 notes and IGCSE / GCSE worksheets 1 September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright

More information

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0. THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

CS321. Numerical Analysis and Computing

CS321. Numerical Analysis and Computing CS Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 September 8 5 What is the Root May physical system ca

More information

Multilayer perceptrons

Multilayer perceptrons Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer

More information

MATH 147 Homework 4. ( = lim. n n)( n + 1 n) n n n. 1 = lim

MATH 147 Homework 4. ( = lim. n n)( n + 1 n) n n n. 1 = lim MATH 147 Homework 4 1. Defie the sequece {a } by a =. a) Prove that a +1 a = 0. b) Prove that {a } is ot a Cauchy sequece. Solutio: a) We have: ad so we re doe. a +1 a = + 1 = + 1 + ) + 1 ) + 1 + 1 = +

More information

Induction: Solutions

Induction: Solutions Writig Proofs Misha Lavrov Iductio: Solutios Wester PA ARML Practice March 6, 206. Prove that a 2 2 chessboard with ay oe square removed ca always be covered by shaped tiles. Solutio : We iduct o. For

More information

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Accuracy assessment methods and challenges

Accuracy assessment methods and challenges Accuracy assessmet methods ad challeges Giles M. Foody School of Geography Uiversity of Nottigham giles.foody@ottigham.ac.uk Backgroud Need for accuracy assessmet established. Cosiderable progress ow see

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

Math 120 Answers for Homework 23

Math 120 Answers for Homework 23 Math 0 Aswers for Homewor. (a) The Taylor series for cos(x) aroud a 0 is cos(x) x! + x4 4! x6 6! + x8 8! x0 0! + ( ) ()! x ( ) π ( ) ad so the series ()! ()! (π) is just the series for cos(x) evaluated

More information