Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
|
|
- Adam Richards
- 5 years ago
- Views:
Transcription
1 Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web page whe you re ready to take it (available by ed of day Moday 2 hours to complete Must had-i (or i by 11:59pm Friday Oct. 7 Ca use: class otes, your otes, the book, your assigmets ad Wikipedia. You may ot use: your eighbor, aythig else o the web, etc. 1
2 What ca be covered Aythig we ve talked about i class Aythig i the readig (these are ot ecessarily the same thigs Aythig we ve covered i the assigmets Midterm topics Machie learig basics - differet types of learig problems - feature-based machie learig - data assumptios/data geeratig distributio Classificatio problem setup Proper experimetatio - trai/dev/test - evaluatio/accuracy/traiig error - optimizig hyperparameters Midterm topics Learig algorithms - Decisio trees - K-NN - Perceptro - Gradiet descet Algorithm properties - traiig/learig - ratioal/why it works - classifyig - hyperparameters - avoidig overfittig - algorithm variats/improvemets Midterm topics Geometric view of data - distaces betwee examples - decisio boudaries Features - example features - removig erroeous features/pickig good features - challeges with high-dimesioal data - feature ormalizatio Other pre-processig - outlier detectio 2
3 Midterm topics Comparig algorithms - -fold cross validatio - leave oe out validatio - bootstrap resamplig - t-test imbalaced data - evaluatio - precisio/recall, F1, AUC - subsamplig - oversamplig - weighted biary classifiers Midterm topics Multiclass classificatio - Modifyig existig approaches - Usig biary classifier - OVA - AVA - Tree-based - micro- vs. macro-averagig Rakig - usig biary classifier - usig weighted biary classifier - evaluatio Midterm topics Gradiet descet - 0/1 loss - Surrogate loss fuctios - Covexity - miimizatio algorithm - regularizatio - differet regularizers - p-orms Misc - good codig habits - JavaDoc Midterm geeral advice 2 hours goes by fast! - Do t pla o lookig everythig up - Lookup equatios, algorithms, radom details - Make sure you uderstad the key cocepts - Do t sped too much time o ay oe questio - Skip questios you re stuck o ad come back to them - Watch the time as you go Be careful o the T/F questios For writte questios - thik before you write - make your argumet/aalysis clear ad cocise 3
4 How may have you heard of? (Ordiary Least squares Ridge regressio Lasso regressio Elastic regressio Logistic regressio Model-based machie learig 1. pick a model 2. pick a criteria to optimize (aka objective fuctio 1[ y i + b 0] 3. develop a learig algorithm [ ] argmi w,b 1 y i + b 0 m 0 = b + Fid w ad b that miimize the 0/1 loss Model-based machie learig 1. pick a model 2. pick a criteria to optimize (aka objective fuctio 3. develop a learig algorithm argmi w,b m 0 = b + use a covex surrogate loss fuctio Fid w ad b that miimize the surrogate loss Surrogate loss fuctios 0/1 loss: Hige: Expoetial: l(y, y' =1[ yy' 0] l(y, y' = max(0,1 yy' l(y, y' = exp( yy' Squared loss: l(y, y' = (y y' 2 4
5 Fidig the miimum Gradiet descet! pick a startig poit (w! repeat util loss does t decrease i ay dimesio:! pick a dimesio! move a small amout i that dimesio towards decreasig loss (usig the derivative = η d d loss(w You re blidfolded, but you ca see out of the bottom of the blidfold to the groud right by your feet. I drop you off somewhere ad tell you that you re i a covex shaped valley ad escape is at the bottom/miimum. How do you get out? Perceptro learig algorithm! repeat util covergece (or for some # of iteratios: for each traiig example (f 1, f 2,, f m, label: predictio = b + m if predictio * label 0: // they do t agree for each : Note: for gradiet descet, we always update = + *label b = b + label The costat c = η learig rate label predictio Whe is this large/small? = +ηy i or = + y i c where c = η 5
6 The costat Oe cocer c = η label predictio If they re the same sig, as the predicted gets larger there update gets smaller argmi w,b We re calculatig this o the traiig set We still eed to be careful about overfittig! w loss If they re differet, the more differet they are, the bigger the update The mi w,b o the traiig set is geerally NOT the mi for the test set How did we deal with this for the perceptro algorithm? Overfittig revisited: regularizatio A regularizer is a additioal criterio to the loss fuctio to make sure that we do t overfit It s called a regularizer sice it tries to keep the parameters more ormal/regular It is a bias o the model that forces the learig to prefer certai types of weights over others argmi w,b loss(yy'+ λ regularizer(w, b Regularizers 0 = b + Should we allow all possible weights? Ay prefereces? What makes for a simpler model for a liear model? 6
7 Regularizers Regularizers 0 = b + Geerally, we do t wat huge weights If weights are large, a small chage i a feature ca result i a large chage i the predictio Also gives too much weight to ay oe feature How do we ecourage small weights? or pealize large weights? argmi w,b loss(yy'+ λ regularizer(w, b 0 = b + Might also prefer weights of 0 for features that are t useful Commo regularizers Commo regularizers sum of the weights r(w, b = sum of the squared weights r(w, b = 2 sum of the weights sum of the squared weights r(w, b = r(w, b = 2 What s the differece betwee these? Squared weights pealizes large values more Sum of weights will pealize small values more 7
8 p-orm p-orms visualized sum of the weights (1-orm r(w, b = sum of the squared weights (2-orm r(w, b = 2 w 1 lies idicate pealty = 1 w 2 p p-orm r(w, b = p = w p Smaller values of p (p < 2 ecourage sparser vectors Larger values of p discourage large weights more For example, if w 1 = 0.5 p w p-orms visualized Model-based machie learig all p-orms pealize larger weights p < 2 teds to create sparse (i.e. lots of 0 weights p > 2 teds to like similar weights 1. pick a model 0 = b + 2. pick a criteria to optimize (aka objective fuctio 3. develop a learig algorithm loss(yy' + λregularizer(w argmi w,b loss(yy' + λregularizer(w Fid w ad b that miimize 8
9 Miimizig with a regularizer Covexity revisited We kow how to solve covex miimizatio problems usig gradiet descet: argmi w,b argmi w,b loss(yy' If we ca esure that the loss + regularizer is covex the we could still use gradiet descet: loss(yy' + λregularizer(w make covex Oe defiitio: The lie segmet betwee ay two poits o the fuctio is above the fuctio Mathematically, f is covex if for all x 1, x 2 : f (tx 1 tf (x 1 + (1 t f (x 2 0 < t <1 the value of the fuctio at some poit betwee x 1 ad x 2 the value at some poit o the lie segmet betwee x 1 ad x 2 Addig covex fuctios Claim: If f ad g are covex fuctios the so is the fuctio z=f+g Prove: z(tx 1 tz(x 1 + (1 tz(x 2 0 < t <1 Mathematically, f is covex if for all x 1, x 2 : f (tx 1 tf (x 1 + (1 t f (x 2 0 < t <1 Addig covex fuctios By defiitio of the sum of two fuctios: z(tx 1 = f (tx 1 + g(tx 1 tz(x 1 + (1 tz(x 2 = tf (x 1 + tg(x 1 + (1 t f (x 2 + (1 tg(x 2 = tf (x 1 + (1 t f (x 2 + tg(x 1 + (1 tg(x 2 The, give that: f (tx 1 tf (x 1 + (1 t f (x 2 We kow: So: g(tx 1 tg(x 1 + (1 tg(x 2 f (tx 1 + g(tx 1 tf (x 1 + (1 t f (x 2 + tg(x 1 + (1 tg(x 2 z(tx 1 tz(x 1 + (1 tz(x 2 9
10 Miimizig with a regularizer p-orms are covex We kow how to solve covex miimizatio problems usig gradiet descet: argmi w,b loss(yy' If we ca esure that the loss + regularizer is covex the we could still use gradiet descet: r(w, b = p p = w p p-orms are covex for p >= 1 argmi w,b loss(yy' + λregularizer(w covex as log as both loss ad regularizer are covex Model-based machie learig 1. pick a model 0 = b + 2. pick a criteria to optimize (aka objective fuctio + λ 2 w 2 3. develop a learig algorithm argmi w,b + λ 2 w 2 Fid w ad b that miimize Our optimizatio criterio argmi w,b + λ 2 w 2 Loss fuctio: pealizes examples where the predictio is differet tha the label Regularizer: pealizes large weights Key: this fuctio is covex allowig us to use gradiet descet 10
11 Gradiet descet Some more maths! pick a startig poit (w! repeat util loss does t decrease i ay dimesio:! pick a dimesio! move a small amout i that dimesio towards decreasig loss (usig the derivative = η d d (loss(w+ regularizer(w, b argmi w,b + λ 2 w 2 d d objective = d + λ d 2 w 2 = y i + λ (some math happes Gradiet descet The update! pick a startig poit (w! repeat util loss does t decrease i ay dimesio:! pick a dimesio! move a small amout i that dimesio towards decreasig loss (usig the derivative = η d (loss(w+ regularizer(w, b d = +ηy i ηλ learig rate directio to regularizatio update costat: how far from wrog = +η y i ηλ What effect does the regularizer have? 11
12 The update L1 regularizatio = +ηy i ηλ learig rate directio to regularizatio update costat: how far from wrog If is positive, reduces If is egative, icreases moves towards 0 argmi w,b d d objective = + w d d + λ w = y i + λsig( L1 regularizatio = +ηy i ηλsig( L1 regularizatio = +ηy i ηλsig( learig rate directio to regularizatio update costat: how far from wrog learig rate directio to regularizatio update costat: how far from wrog What effect does the regularizer have? If is positive, reduces by a costat If is egative, icreases by a costat moves towards 0 regardless of magitude 12
13 Regularizatio with p-orms L1: = +η(loss _ correctio λsig( L2: = +η(loss _ correctio λ Lp: = +η(loss _ correctio λcw p 1 j How do higher order orms affect the weights? Model-based machie learig develop a learig algorithm argmi w,b + λ 2 w 2 Fid w ad b that miimize Is gradiet descet the oly way to fid w ad b? No! May other ways to fid the miimum. Some are do t eve require iteratio Whole field called covex optimizatio Regularizers summarized L1 is popular because it teds to result i sparse solutios (i.e. lots of zero weights However, it is ot differetiable, so it oly works for gradiet descet solvers L2 is also popular because for some loss fuctios, it ca be solved directly (o gradiet descet required, though ofte iterative solvers still Lp is less popular sice they do t ted to shrik the weights eough The other loss fuctios Without regularizatio, the geeric update is: = +ηy i c where c = c =1[yy' <1] expoetial hige loss = +η(y i + b squared error 13
14 May tools support these differet combiatios Look at scikit learig package: Commo ames (Ordiary Least squares: squared loss Ridge regressio: squared loss with L2 regularizatio Lasso regressio: squared loss with L1 regularizatio Elastic regressio: squared loss with L1 AND L2 regularizatio Logistic regressio: logistic loss 14
Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationMath 312 Lecture Notes One Dimensional Maps
Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More information10.6 ALTERNATING SERIES
0.6 Alteratig Series Cotemporary Calculus 0.6 ALTERNATING SERIES I the last two sectios we cosidered tests for the covergece of series whose terms were all positive. I this sectio we examie series whose
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationRegression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.
Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat
More information7 LINEAR MODELS. 7.1 The Optimization Framework for Linear Models. Learning Objectives:
7 LINEAR MODELS The essece of mathematics is ot to make simple thigs complicated, but to make complicated thigs simple. Staley Gudder I Chapter 4, you leared about the perceptro algorithm for liear classificatio.
More informationA Course in Machine Learning
A Course i Machie Learig Hal Daumé III 6 LINEAR MODELS The essece of mathematics is ot to make simple thigs complicated, but to make complicated thigs simple. Staley Gudder I Chapter, you leared about
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationStep 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b
Logistic Regressio Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More information1 Generating functions for balls in boxes
Math 566 Fall 05 Some otes o geeratig fuctios Give a sequece a 0, a, a,..., a,..., a geeratig fuctio some way of represetig the sequece as a fuctio. There are may ways to do this, with the most commo ways
More informationME 539, Fall 2008: Learning-Based Control
ME 539, Fall 2008: Learig-Based Cotrol Neural Network Basics 10/1/2008 & 10/6/2008 Uiversity Orego State Neural Network Basics Questios??? Aoucemet: Homework 1 has bee posted Due Friday 10/10/08 at oo
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationRecitation 4: Lagrange Multipliers and Integration
Math 1c TA: Padraic Bartlett Recitatio 4: Lagrage Multipliers ad Itegratio Week 4 Caltech 211 1 Radom Questio Hey! So, this radom questio is pretty tightly tied to today s lecture ad the cocept of cotet
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationPolynomial Functions and Their Graphs
Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationis also known as the general term of the sequence
Lesso : Sequeces ad Series Outlie Objectives: I ca determie whether a sequece has a patter. I ca determie whether a sequece ca be geeralized to fid a formula for the geeral term i the sequece. I ca determie
More informationAP Calculus BC Review Applications of Derivatives (Chapter 4) and f,
AP alculus B Review Applicatios of Derivatives (hapter ) Thigs to Kow ad Be Able to Do Defiitios of the followig i terms of derivatives, ad how to fid them: critical poit, global miima/maima, local (relative)
More informationWeek 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :
ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationMeasures of Spread: Standard Deviation
Measures of Spread: Stadard Deviatio So far i our study of umerical measures used to describe data sets, we have focused o the mea ad the media. These measures of ceter tell us the most typical value of
More informationMath 113 Exam 3 Practice
Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationWe will conclude the chapter with the study a few methods and techniques which are useful
Chapter : Coordiate geometry: I this chapter we will lear about the mai priciples of graphig i a dimesioal (D) Cartesia system of coordiates. We will focus o drawig lies ad the characteristics of the graphs
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationRoot Finding COS 323
Root Fidig COS 323 Remider Sig up for Piazza Assigmet 0 is posted, due Tue 9/25 Last time.. Floatig poit umbers ad precisio Machie epsilo Sources of error Sesitivity ad coditioig Stability ad accuracy
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the
More informationCS537. Numerical Analysis and Computing
CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be
More informationLecture 2 October 11
Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex
More informationHomework #2. CSE 546: Machine Learning Prof. Kevin Jamieson Due: 11/2 11:59 PM
Homework #2 CSE 546: Machie Learig Prof. Kevi Jamieso Due: 11/2 11:59 PM 1 A Taste of Learig Theory 1. [5 poits] For i = 1,..., fix x i R d ad let y i = x T i w i.i.d. + ɛ i where ɛ i N (0, σ 2 ). All
More informationMassachusetts Institute of Technology
Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationOnline Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb
Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb 9 2011 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios
More informationLecture #18
18-1 Variatioal Method (See CTDL 1148-1155, [Variatioal Method] 252-263, 295-307[Desity Matrices]) Last time: Quasi-Degeeracy Diagoalize a part of ifiite H * sub-matrix : H (0) + H (1) * correctios for
More informationLecture 11: Decision Trees
ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces
More informationChapter 10: Power Series
Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because
More informationLinear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other
Liear Regressio Aalysis Aalysis of paired data ad usig a give value of oe variable to predict the value of the other 5 5 15 15 1 1 5 5 1 3 4 5 6 7 8 1 3 4 5 6 7 8 Liear Regressio Aalysis E: The chirp rate
More information4. Linear Classification. Kai Yu
4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model
More informationSeptember 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1
September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright
More informationTHE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.
THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationCS321. Numerical Analysis and Computing
CS Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 September 8 5 What is the Root May physical system ca
More informationMultilayer perceptrons
Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer
More informationMATH 147 Homework 4. ( = lim. n n)( n + 1 n) n n n. 1 = lim
MATH 147 Homework 4 1. Defie the sequece {a } by a =. a) Prove that a +1 a = 0. b) Prove that {a } is ot a Cauchy sequece. Solutio: a) We have: ad so we re doe. a +1 a = + 1 = + 1 + ) + 1 ) + 1 + 1 = +
More informationInduction: Solutions
Writig Proofs Misha Lavrov Iductio: Solutios Wester PA ARML Practice March 6, 206. Prove that a 2 2 chessboard with ay oe square removed ca always be covered by shaped tiles. Solutio : We iduct o. For
More information3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials
Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationAccuracy assessment methods and challenges
Accuracy assessmet methods ad challeges Giles M. Foody School of Geography Uiversity of Nottigham giles.foody@ottigham.ac.uk Backgroud Need for accuracy assessmet established. Cosiderable progress ow see
More informationLecture 24: Variable selection in linear models
Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet
More informationMath 120 Answers for Homework 23
Math 0 Aswers for Homewor. (a) The Taylor series for cos(x) aroud a 0 is cos(x) x! + x4 4! x6 6! + x8 8! x0 0! + ( ) ()! x ( ) π ( ) ad so the series ()! ()! (π) is just the series for cos(x) evaluated
More information