Dimensionality Reduction and Learning

Size: px
Start display at page:

Download "Dimensionality Reduction and Learning"

Transcription

1 CMSC (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that for L methods we eed ot work fte dmesoal spaces. I partcular, we ca uadaptvely fd ad work a low dmesoal space ad acheve about as good results. These results questo the eed for explctly workg fte (or hgh) dmesoal spaces for L methods. I cotrast, for sparsty based methods (cludg L regularzato), such o-adaptve proecto methods sgfcatly loose predctve power. Marg Based Classfcato For ow, assume we have a dstrbuto over X X R d ad Y {, }. Assume that there exsts a weght vector β such that sg(β X) = Y, wth probablty oe. Hece, the dstrbuto s separable. Furthermore, let us scale the dstrbuto so that t s separable at marg,.e.: Y (β X) What learg algorthm should we use? The The VC dmeso of halfspaces s Ω(D), so avely mmzg the 0/ loss D dmesos may ot be lead to good geeralzato propertes (ad t s ot clear how to do ths ayways). Istead, maxmzg the marg ca be show to provde good geeralzato propertes however computatoally, ths may be a lttle cumbersome (eve though t s polytme). Let us say say we have a trag set T = {(X, Y )}. Ofte, what s doe, s that the perceptro algorthm s ru o the trag set. The perceptro algorthm ru o ay sequece of pots {(X, Y )} T sampled from ths dstrbuto makes at most: M X β mstakes (regardless of the legth of the sequece) where X = max X X X. Hece, f we repeatedly cycle through the dataset, the evetually we wll o loger make mstakes. But what about geeralzato? Navely usg ths perceptro predctor does ot ecessarly lead to good geeralzato behavor sce the VC dmeso of halfspaces s Ω(D) (ad o boud s kow for ths coverget pot of the perceptro).. Radom Proectos ad Marg Preservato Now let us proect β ad X by P = k A, where A Rk d ad each etry A s sample depedetly from N(0, ). Is separablty preserved uder our trag set? Lemma.. Assume X. If k = O( β X log δ ), the wth probablty greater tha δ for all P β P X ad X P X X, β P β β

2 Proof. Choose ɛ = β X ad apply the er product preservg lemma, whch mples that for ay partcular X ad β, that P β P X β X, so that: Y P β P X Y β X For O( ) evets, we use O(δ/ ) so the total error probablty s δ. The fal clam follows from the orm preservg lemma.. Geeralzato If we ru the perceptro algorthm, o the trag set, the the total umber of mstakes made s: M t O( β X ) Note that ths mples that after O( β X ) terato the perceptro wll stablze to a costat soluto, whch has zero error. For geeralzato, we are ow workg wth a space of dmeso O( β log δ ). There are other methods to obta geeralzato but the mportat pot here s that uder the marg assumpto, we are essetally workg a fte dmesoal space (ad ths subspace ca be determe o-adaptvely from the labels {Y }). 3 Rdge Regresso ad Dmesoalty Reducto Let us ow cosder the ormal meas problem, sometmes referred to as the fxed desg settg. Here, we have a set of pots X = {X } R d, ad let X deote the R d matrx where the row of X s X. We also observe a output vector Y R. We desre to lear E[Y ]. I partcular, we seek to predct E[Y ] as X ˆβ. The square loss of a estmator w s: L(w) = E Y Y Xw = where the expectato s wth respect to Y. Let β be the optmal predctor: The rsk of a estmator ˆβ s defed as: E(Y X w) = β = argm w L(w) R( ˆβ) = L( ˆβ) L(β) = X ˆβ Xβ (whch s the fxed desg rsk). Deotg, Σ := X X we ca wrte the rsk as: R( ˆβ) = ( ˆβ β) Σ( ˆβ β) := ˆβ β Σ Aother terpretato of the rsk s how well we accurately lear the parameters of the model. Assume that ˆβ(Y ) s a estmator costructed wth the outcome Y we drop the explct Y depedece as ths s clear from cotext. Let β = E Y ˆβ be expected weght. We ca decompose the expected rsk as: E Y [R( ˆβ)] = E Y X ˆβ Xβ + Xβ Xβ = E Y ˆβ β Σ + β β Σ

3 where we have that: ad (average) varace = E Y X ˆβ Xβ predcto bas vector = Xβ Xβ whch shows a certa bas/varace decomposto of the error. 3. Rsk Bouds for Rdge Regresso The rdge regresso estmator usg a outcome Y s ust: The estmator s the: For smplcty, let us rotate X such that: ˆβ λ = argm w Y Xw + λ w ˆβ λ = (Σ + λi) ( X Y ) = (Σ + λi) ( Y X ) Σ := X X = dag(λ, λ,... λ d ) (ote ths rotato does ot alter the predctos of rotatoally varat algorthms). Wth ths choce, we have that: It s straghtforward to see that: ad t follows that: by ust takg expectatos. [ ˆβ λ ] = Lemma 3.. (Rsk Boud) If Var(Y ), we have that: Ths holds wth equalty f Var(Y ) =. Proof. For the varace term, we have: = Y [X ] λ + λ β = E[ ˆβ 0 ] [β λ ] := E[ ˆβ λ ] = λ λ + λ β E Y [R( ˆβ λ )] λ ( λ + λ ) + β λ ( + λ /λ) E Y ˆβ λ β λ Σ = λ E Y ([ ˆβ λ ] [β λ ] ) = = = λ (λ + λ) E[ (Y E[Y ])[X ] (Y E[Y ])[X ] ] λ (λ + λ) λ (λ + λ) λ (λ + λ) = Var(Y )[X ] = [X ] = = 3

4 Ths holds wth equalty f Var(Y ) =. For the bas term, β λ β Σ = λ ([β λ ] [β] ) = = λ β λ ( λ + λ ) β λ λ ( λ + λ ) ad the result follows from algebrac mapulatos. There followg boud characterzes the rsk for two atural settgs for λ. Corollary 3.. Assume Var(Y ) (Fte Dms) For λ = 0, E Y [R( ˆβ λ )] d Ad f V ar(y ) =, the E Y [R( ˆβ λ )] = d. (Ifte Dms) For λ = Σ trace β, the: E Y [R( ˆβ λ )] β Σ trace = β X β X where the trace orm s the sum of the sgular values ad X = max X. Furthermore, for all there exsts a dstrbuto Pr[Y ] ad a X such that the f λ E Y [R( ˆβ λ )] s Ω ( β Σ trace ) (so the above boud s tght up to log factors). Coceptually, the secod boud s dmeso free,.e. t does ot deped explctly o d, whch could be fte. Ad we are effectvely dog regresso a large (potetally) fte dmesoal space. Proof. The λ = 0 case follows drectly from the prevous lemma. Usg that (a + b) ab, we ca boud the varace term for geeral λ as follows: λ ( λ + λ ) λ λ λ = λ λ Aga, usg that (a + b) ab, the bas term s bouded as: So we have that: β λ ( + λ /λ) β λ λ /λ = λ β E Y [R( ˆβ λ )] Σ trace λ + λ β ad usg the choce of λ completes the proof. 4

5 To see the above boud s tght, cosder the followg problem. Let X = ad β = ad let Y = Xβ +η where η s ut varace. Here, we have that λ = so λ log ad β log, so the upper s log. Now oe ca wrte the rsk as: E Y [R( ˆβ λ )] = ( + + λ) ( + () λ ) ad ths s Ω( ), for all λ. = + λ ( + λ) () + λ dx (3) ( + xλ) = ( + λ )( λ( + λ) λ( + λ) ) (4) = ( λ + λ)( + λ + λ ) (5) However, ow we show that wth L complexty, we ca effectvely workg fte dmesos (where the dmeso s chose as a fucto of ). 3. Radom Proectos ad Maxmum Lkelhood Estmato Frst ote that f we proect to k = O( log ɛ ) dmesos the (usg P = k A), we have that for all : Let us defe the loss usg oly XP as: Let β P be the best ft of Y wth XP,.e. P β P X β X β X ɛ L P (w) = E Y Y XP w β P = argm w L P (w) ad let ˆβ P be the MLE ft of Y wth XP (so λ = 0). Now by the prevous corollary, the: (6) E Y [L P ( ˆβ P )] L P (β P ) = E Y [ XP ˆβP XP β P ] k Also ote that: L P (β P ) L P (P β) = E[ Y XP P β ] = E[ Y Xβ ] + Xβ XP P β = L(β) + (P β P X β X ) L(β) β ( X )ɛ = L(β) β Σ trace ɛ 5

6 Theorem 3.3. (Rsk Boud after Radom Proecto) Assumg Var(Y ), ad that P s ɛ er product preservg for k = O( log ɛ ), the: E Y XP ˆβP Xβ = E Y [L P ( ˆβ P )] L(β) k + β Σ traceɛ 0 log = ɛ + β Σ trace ɛ Hece, choosg ɛ = O( log β Σ trace ), mples that k = O( β Σ trace log ) ad: E Y XP ˆβP Xβ O ( ) log β Σ trace Proof. From above we have that: so that: L(β) L P (β P ) β Σ trace ɛ E Y [L P ( ˆβ P )] L(β) E Y [L P ( ˆβ P )] L P (β P ) + β Σ trace ɛ = E Y [ XP ˆβP XP β P ] + β Σ trace ɛ ad we have bouded the rsk the last terms as k. Ths matches the rsk boud up to log factors. Also, our algorthm s smply a MLE estmate k = O( β Σ trace log ) dmesos. Note that the umber of dmesos we choose s growg as O( ). Refereces The dscusso o classfcato used results from Satosh Vempala s moograph o radom proectos. The rdge regresso results, to my kowledge, are ew. 6

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

1 Solution to Problem 6.40

1 Solution to Problem 6.40 1 Soluto to Problem 6.40 (a We wll wrte T τ (X 1,...,X where the X s are..d. wth PDF f(x µ, σ 1 ( x µ σ g, σ where the locato parameter µ s ay real umber ad the scale parameter σ s > 0. Lettg Z X µ σ we

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

Qualifying Exam Statistical Theory Problem Solutions August 2005

Qualifying Exam Statistical Theory Problem Solutions August 2005 Qualfyg Exam Statstcal Theory Problem Solutos August 5. Let X, X,..., X be d uform U(,),

More information

Lecture Note to Rice Chapter 8

Lecture Note to Rice Chapter 8 ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though

More information

Rademacher Complexity. Examples

Rademacher Complexity. Examples Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed

More information

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde

More information

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Mache Learg CSE6740/CS764/ISYE6740, Fall 0 Itroducto to Regresso Le Sog Lecture 4, August 30, 0 Based o sldes from Erc g, CMU Readg: Chap. 3, CB Mache learg for apartmet hutg Suppose ou are to move to

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

18.413: Error Correcting Codes Lab March 2, Lecture 8

18.413: Error Correcting Codes Lab March 2, Lecture 8 18.413: Error Correctg Codes Lab March 2, 2004 Lecturer: Dael A. Spelma Lecture 8 8.1 Vector Spaces A set C {0, 1} s a vector space f for x all C ad y C, x + y C, where we take addto to be compoet wse

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Lecture 4 Sep 9, 2015

Lecture 4 Sep 9, 2015 CS 388R: Radomzed Algorthms Fall 205 Prof. Erc Prce Lecture 4 Sep 9, 205 Scrbe: Xagru Huag & Chad Voegele Overvew I prevous lectures, we troduced some basc probablty, the Cheroff boud, the coupo collector

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

ρ < 1 be five real numbers. The

ρ < 1 be five real numbers. The Lecture o BST 63: Statstcal Theory I Ku Zhag, /0/006 Revew for the prevous lecture Deftos: covarace, correlato Examples: How to calculate covarace ad correlato Theorems: propertes of correlato ad covarace

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

Naïve Bayes MIT Course Notes Cynthia Rudin

Naïve Bayes MIT Course Notes Cynthia Rudin Thaks to Şeyda Ertek Credt: Ng, Mtchell Naïve Bayes MIT 5.097 Course Notes Cytha Rud The Naïve Bayes algorthm comes from a geeratve model. There s a mportat dstcto betwee geeratve ad dscrmatve models.

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematcs of Mache Learg Lecturer: Phlppe Rgollet Lecture 3 Scrbe: James Hrst Sep. 6, 205.5 Learg wth a fte dctoary Recall from the ed of last lecture our setup: We are workg wth a fte dctoary

More information

Kernel-based Methods and Support Vector Machines

Kernel-based Methods and Support Vector Machines Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg

More information

The Mathematical Appendix

The Mathematical Appendix The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.

More information

STK4011 and STK9011 Autumn 2016

STK4011 and STK9011 Autumn 2016 STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

D KL (P Q) := p i ln p i q i

D KL (P Q) := p i ln p i q i Cheroff-Bouds 1 The Geeral Boud Let P 1,, m ) ad Q q 1,, q m ) be two dstrbutos o m elemets, e,, q 0, for 1,, m, ad m 1 m 1 q 1 The Kullback-Lebler dvergece or relatve etroy of P ad Q s defed as m D KL

More information

The Occupancy and Coupon Collector problems

The Occupancy and Coupon Collector problems Chapter 4 The Occupacy ad Coupo Collector problems By Sarel Har-Peled, Jauary 9, 08 4 Prelmares [ Defto 4 Varace ad Stadard Devato For a radom varable X, let V E [ X [ µ X deote the varace of X, where

More information

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab Lear Regresso Lear Regresso th Shrkage Some sldes are due to Tomm Jaakkola, MIT AI Lab Itroducto The goal of regresso s to make quattatve real valued predctos o the bass of a vector of features or attrbutes.

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information

Convergence of Large Margin Separable Linear Classification

Convergence of Large Margin Separable Linear Classification Covergece of Large Marg Separable Lear Classfcato Tog Zhag Mathematcal Sceces Departmet IBM T.J. Watso Research Ceter Yorktow Heghts, NY 0598 tzhag@watso.bm.com Abstract Large marg lear classfcato methods

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Aalyss of Varace ad Desg of Exermets-I MODULE II LECTURE - GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCE Dr Shalabh Deartmet of Mathematcs ad Statstcs Ida Isttute of Techology Kaur Tukey s rocedure

More information

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632

More information

9.1 Introduction to the probit and logit models

9.1 Introduction to the probit and logit models EC3000 Ecoometrcs Lecture 9 Probt & Logt Aalss 9. Itroducto to the probt ad logt models 9. The logt model 9.3 The probt model Appedx 9. Itroducto to the probt ad logt models These models are used regressos

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Marquette Uverst Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Coprght 08 b Marquette Uverst Maxmum Lkelhood Estmato We have bee sag that ~

More information

Mu Sequences/Series Solutions National Convention 2014

Mu Sequences/Series Solutions National Convention 2014 Mu Sequeces/Seres Solutos Natoal Coveto 04 C 6 E A 6C A 6 B B 7 A D 7 D C 7 A B 8 A B 8 A C 8 E 4 B 9 B 4 E 9 B 4 C 9 E C 0 A A 0 D B 0 C C Usg basc propertes of arthmetc sequeces, we fd a ad bm m We eed

More information

A tighter lower bound on the circuit size of the hardest Boolean functions

A tighter lower bound on the circuit size of the hardest Boolean functions Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model ECON 48 / WH Hog The Smple Regresso Model. Defto of the Smple Regresso Model Smple Regresso Model Expla varable y terms of varable x y = β + β x+ u y : depedet varable, explaed varable, respose varable,

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

Supervised learning: Linear regression Logistic regression

Supervised learning: Linear regression Logistic regression CS 57 Itroducto to AI Lecture 4 Supervsed learg: Lear regresso Logstc regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI Data: D { D D.. D D Supervsed learg d a set of eamples s

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

III-16 G. Brief Review of Grand Orthogonality Theorem and impact on Representations (Γ i ) l i = h n = number of irreducible representations.

III-16 G. Brief Review of Grand Orthogonality Theorem and impact on Representations (Γ i ) l i = h n = number of irreducible representations. III- G. Bref evew of Grad Orthogoalty Theorem ad mpact o epresetatos ( ) GOT: h [ () m ] [ () m ] δδ δmm ll GOT puts great restrcto o form of rreducble represetato also o umber: l h umber of rreducble

More information

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all

More information

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 30 Recap: regresso problems Iput to learg problem: trag

More information

An Introduction to. Support Vector Machine

An Introduction to. Support Vector Machine A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Radom Varables ad Probablty Dstrbutos * If X : S R s a dscrete radom varable wth rage {x, x, x 3,. } the r = P (X = xr ) = * Let X : S R be a dscrete radom varable wth rage {x, x, x 3,.}.If x r P(X = x

More information

Generative classification models

Generative classification models CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato

More information

Introduction to Probability

Introduction to Probability Itroducto to Probablty Nader H Bshouty Departmet of Computer Scece Techo 32000 Israel e-mal: bshouty@cstechoacl 1 Combatorcs 11 Smple Rules I Combatorcs The rule of sum says that the umber of ways to choose

More information

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

Generalized Linear Regression with Regularization

Generalized Linear Regression with Regularization Geeralze Lear Regresso wth Regularzato Zoya Bylsk March 3, 05 BASIC REGRESSION PROBLEM Note: I the followg otes I wll make explct what s a vector a what s a scalar usg vec t or otato, to avo cofuso betwee

More information

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018 Chrs Pech Fal Practce CS09 Dec 5, 08 Practce Fal Examato Solutos. Aswer: 4/5 8/7. There are multle ways to obta ths aswer; here are two: The frst commo method s to sum over all ossbltes for the rak of

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s

More information

Dimensionality reduction Feature selection

Dimensionality reduction Feature selection CS 750 Mache Learg Lecture 3 Dmesoalty reducto Feature selecto Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 750 Mache Learg Dmesoalty reducto. Motvato. Classfcato problem eample: We have a put data

More information

5 Short Proofs of Simplified Stirling s Approximation

5 Short Proofs of Simplified Stirling s Approximation 5 Short Proofs of Smplfed Strlg s Approxmato Ofr Gorodetsky, drtymaths.wordpress.com Jue, 20 0 Itroducto Strlg s approxmato s the followg (somewhat surprsg) approxmato of the factoral,, usg elemetary fuctos:

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

4 Inner Product Spaces

4 Inner Product Spaces 11.MH1 LINEAR ALGEBRA Summary Notes 4 Ier Product Spaces Ier product s the abstracto to geeral vector spaces of the famlar dea of the scalar product of two vectors or 3. I what follows, keep these key

More information

LINEAR REGRESSION ANALYSIS

LINEAR REGRESSION ANALYSIS LINEAR REGRESSION ANALYSIS MODULE V Lecture - Correctg Model Iadequaces Through Trasformato ad Weghtg Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Aalytcal methods for

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I Chapter 8 Heterosedastcty Recall MLR 5 Homsedastcty error u has the same varace gve ay values of the eplaatory varables Varu,..., = or EUU = I Suppose other GM assumptos hold but have heterosedastcty.

More information

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ Stat 74 Estmato for Geeral Lear Model Prof. Goel Broad Outle Geeral Lear Model (GLM): Trag Samle Model: Gve observatos, [[( Y, x ), x = ( x,, xr )], =,,, the samle model ca be exressed as Y = µ ( x, x,,

More information

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier Baa Classfcato CS6L Data Mg: Classfcato() Referece: J. Ha ad M. Kamber, Data Mg: Cocepts ad Techques robablstc learg: Calculate explct probabltes for hypothess, amog the most practcal approaches to certa

More information

G S Power Flow Solution

G S Power Flow Solution G S Power Flow Soluto P Q I y y * 0 1, Y y Y 0 y Y Y 1, P Q ( k) ( k) * ( k 1) 1, Y Y PQ buses * 1 P Q Y ( k1) *( k) ( k) Q Im[ Y ] 1 P buses & Slack bus ( k 1) *( k) ( k) Y 1 P Re[ ] Slack bus 17 Calculato

More information

Lattices. Mathematical background

Lattices. Mathematical background Lattces Mathematcal backgroud Lattces : -dmesoal Eucldea space. That s, { T x } x x = (,, ) :,. T T If x= ( x,, x), y = ( y,, y), the xy, = xy (er product of xad y) x = /2 xx, (Eucldea legth or orm of

More information

Extreme Value Theory: An Introduction

Extreme Value Theory: An Introduction (correcto d Extreme Value Theory: A Itroducto by Laures de Haa ad Aa Ferrera Wth ths webpage the authors ted to form the readers of errors or mstakes foud the book after publcato. We also gve extesos for

More information

STK3100 and STK4100 Autumn 2017

STK3100 and STK4100 Autumn 2017 SK3 ad SK4 Autum 7 Geeralzed lear models Part III Covers the followg materal from chaters 4 ad 5: Sectos 4..5, 4.3.5, 4.3.6, 4.4., 4.4., ad 4.4.3 Sectos 5.., 5.., ad 5.5. Ørulf Borga Deartmet of Mathematcs

More information