Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

Size: px
Start display at page:

Download "Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b"

Transcription

1 Logistic Regressio

2 Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x Icludig all differet w ad b

3 Step : Fuctio Set x x i w w i z i w i x i z z f w,b x P w,b C x x I w I b Sigmoid Fuctio z e z z z

4 Logistic Regressio Step : f w,b x = σ i w i x i + b Output: betwee ad Liear Regressio f w,b x = w i x i + b i Output: ay value Step 2: Step 3:

5 Step 2: Goodess of a Fuctio Traiig Data x x 3 x N C C C 2 C Assume the data is geerated based o f w,b x = P w,b C x Give a set of w ad b, what is its probability of geeratig the data? L w, b = f w,b x f w,b f w,b x 3 f w,b x N The most likely w * ad b * is the oe with the largest L w, b. w, b = arg max w,b L w, b

6 x x 3 x x 3 C C C 2 y = y 2 = y 3 = L w, b = f w,b x f w,b f w,b x 3 y : for class, for class 2 w, b = arg max w,b L w, b w =, b = arg mi w,b ll w, b ll w, b = lf w,b x lf w,b l f w,b x 3 y lf x + y l f x y 2 lf + y 2 l f y 3 lf x 3 + y 3 l f x 3

7 Step 2: Goodess of a Fuctio L w, b = f w,b x f w,b f w,b x 3 f w,b x N ll w, b = lf w,b x + lf w,b + l f w,b x 3 y : for class, for class 2 = y lf w,b x + y l f w,b x Cross etropy betwee two Beroulli distributio Distributio p: p x = = y p x = = y cross etropy Distributio q: q x = = f x q x = = f x H p, q = p x l q x x

8 Step : Step 2: Logistic Regressio f w,b x = σ w i x i + b Output: betwee ad Traiig data: x, y y : for class, for class 2 i Liear Regressio f w,b x = w i x i + b i Output: ay value Traiig data: x, y y : a real umber L f = C f x, y L f = 2 f x y 2 Cross etropy: C f x, y = y lf x + y l f x Questio: Why do t we simply use square error as liear regressio?

9 Step 3: Fid the best fuctio ll w, b = f w,b x x i y lf w,b x + y l f w,b x lf w,b x lσ z z = σ z = lf w,b x z σ z z z = σ z σ z z = x i σ z σ z σ z z f w,b x = σ z = Τ + exp z z = w x + b = w i x i + b i

10 Step 3: Fid the best fuctio ll w, b = f w,b x x i y lf w,b x f w,b x x i + y l f w,b x l f w,b x = l f w,b x z z z = x i l σ z z = σ z σ z z = σ z σ z σ z f w,b x = σ z = Τ + exp z z = w x + b = w i x i + b i

11 Step 3: Fid the best fuctio ll w, b = f w,b x x i y lf w,b x f w,b x x i + y l f w,b x = y f w,b x x i y f w,b x x i = y y f w,b x f w,b x + y f w,b x x i = y f w,b x x i Larger differece, larger update w i w i η y f w,b x x i

12 Step : Step 2: Logistic Regressio f w,b x = σ w i x i + b Output: betwee ad Traiig data: x, y y : for class, for class 2 i Liear Regressio f w,b x = w i x i + b i Output: ay value Traiig data: x, y y : a real umber L f = C f x, y L f = 2 f x y 2 Logistic regressio: w i w i η y f w,b x x i Step 3: Liear regressio: w i w i η y f w,b x x i

13 Logistic Regressio + Square Error Step : Step 2: f w,b x = σ w i x i + b i Traiig data: x, y, y : for class, for class 2 L f = 2 f w,b x y 2 Step 3: (f w,b (x) y) 2 = 2 f w,b x y f w,b x z z = 2 f w,b x y f w,b x f w,b x x i y = If f w,b x = (close to target) LΤ = If f w,b x = (far from target) LΤ =

14 Logistic Regressio + Square Error Step : Step 2: f w,b x = σ w i x i + b i Traiig data: x, y, y : for class, for class 2 L f = 2 f w,b x y 2 Step 3: (f w,b (x) y) 2 = 2 f w,b x y f w,b x z z = 2 f w,b x y f w,b x f w,b x x i y = If f w,b x = (far from target) LΤ = If f w,b x = (close to target) LΤ =

15 Cross Etropy v.s. Square Error Cross Etropy Total Loss Square Error digs/papers/v9/gloro ta/glorota.pdf w w 2

16 Discrimiative v.s. Geerative P C x = σ w x + b directly fid w ad b Fid μ, μ 2, Σ w T = μ μ 2 T Σ Will we obtai the same set of w ad b? b = 2 μ T Σ μ + 2 μ2 T Σ 2 μ 2 + l N N 2 The same model (fuctio set), but differet fuctio is selected by the same traiig data.

17 Geerative v.s. Discrimiative Geerative Discrimiative All: total, hp, att, sp att, de, sp de, speed 73% accuracy 79% accuracy

18 Geerative v.s. Discrimiative Example Traiig Data X 4 X 4 X 4 Class Class 2 Class 2 Class 2 Testig Data Class? Class 2? How about Naïve Bayes? P x C i = P x C i P C i

19 Geerative v.s. Discrimiative Example Traiig Data X 4 X 4 X 4 Class Class 2 Class 2 Class 2 P C = 3 P x = C = P = C = P C 2 = 2 3 P x = C 2 = 3 P = C 2 = 3

20 Traiig Data X 4 X 4 X 4 Testig Data Class Class 2 <.5 P C x = Class 2 Class 2 3 P x C P C P x C P C + P x C 2 P C P C = 3 P x = C = P = C = P C 2 = 2 3 P x = C 2 = 3 P = C 2 = 3

21 Geerative v.s. Discrimiative Beefit of geerative model With the assumptio of probability distributio, less traiig data is eeded With the assumptio of probability distributio, more robust to the oise Priors ad class-depedet probabilities ca be estimated from differet sources.

22 Multi-class Classificatio (3 classes as example) [Bishop, P29-2] C : C 2 : C 3 : w, b w 2, b 2 w 3, b 3 z = w x + b z 2 = w 2 x + b 2 z 3 = w 3 x + b 3 Probability: > y i > σ i y i = yi P Ci x z z 2 z 3 3 e Softmax e e e z 2 e z 2 e z 3 3 j.5 z j e.88 y.2 y y 2 3 e e e z z z j 3 j 3 j e e e z z z j j j

23 Multi-class Classificatio (3 classes as example) x z = w x + b z 2 = w 2 x + b 2 z 3 = w 3 x + b 3 y = Softmax y y y 2 y 3 Cross Etropy 3 i= y i ly i If x class If x class 2 If x class 3 y = y = ŷ ŷ ŷ 2 ŷ 3 target

24 Limitatio of Logistic Regressio x w w 2 z b w x w2 x2 z Iput Feature Label x Class 2 b y Class y.5 Class2 y.5 Ca we? y.5 y <.5 Class Class y <.5 y.5 Class 2 x

25 Limitatio of Logistic Regressio No, we ca t x w w 2 z y b x x

26 Limitatio of Logistic Regressio Feature Trasformatio x : distace to x x : distace to 2 Not always easy to fid a good trasformatio x 2 x

27 Limitatio of Logistic Regressio Cascadig logistic regressio models x z x z y z2 Feature Trasformatio Classificatio (igore bias i this figure)

28 x =.73 x =.27 x z x x =.27 x =.5 x z2 =.5 =.27 =.27 =.73 x

29 x =.73 x =.27 x w z y x =.27 x =.5 w 2 x =.5 =.27 (.73,.5) =.27 =.73 (.27,.27) (.5,.73) x x

30 Deep Learig! z z z z Neuro Neural Network

31 Referece Bishop: Chapter 4.3

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

ME 539, Fall 2008: Learning-Based Control

ME 539, Fall 2008: Learning-Based Control ME 539, Fall 2008: Learig-Based Cotrol Neural Network Basics 10/1/2008 & 10/6/2008 Uiversity Orego State Neural Network Basics Questios??? Aoucemet: Homework 1 has bee posted Due Friday 10/10/08 at oo

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading : ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min) Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

CSIE/GINM, NTU 2009/11/30 1

CSIE/GINM, NTU 2009/11/30 1 Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10 Perceptro Ier-product scalar Perceptro Perceptro learig rule XOR problem liear separable patters Gradiet descet Stochastic Approximatio to gradiet descet LMS Adalie 1 Ier-product et =< w, x >= w x cos(θ)

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Logit regression Logit regression

Logit regression Logit regression Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008 Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i

More information

Pixel Recurrent Neural Networks

Pixel Recurrent Neural Networks Pixel Recurret Neural Networks Aa ro va de Oord, Nal Kalchbreer, Koray Kavukcuoglu Google DeepMid August 2016 Preseter - Neha M Example problem (completig a image) Give the first half of the image, create

More information

Lecture 10: Performance Evaluation of ML Methods

Lecture 10: Performance Evaluation of ML Methods CSE57A Machie Learig Sprig 208 Lecture 0: Performace Evaluatio of ML Methods Istructor: Mario Neuma Readig: fcml: 5.4 (Performace); esl: 7.0 (Cross-Validatio); optioal book: Evaluatio Learig Algorithms

More information

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday Aoucemets MidtermII Review Sta 101 - Fall 2016 Duke Uiversity, Departmet of Statistical Sciece Office Hours Wedesday 12:30-2:30pm Watch liear regressio videos before lab o Thursday Dr. Abrahamse Slides

More information

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other Liear Regressio Aalysis Aalysis of paired data ad usig a give value of oe variable to predict the value of the other 5 5 15 15 1 1 5 5 1 3 4 5 6 7 8 1 3 4 5 6 7 8 Liear Regressio Aalysis E: The chirp rate

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science. BACKGROUND EXAM September 30, 2004.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science. BACKGROUND EXAM September 30, 2004. MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departmet of Electrical Egieerig ad Computer Sciece 6.34 Discrete Time Sigal Processig Fall 24 BACKGROUND EXAM September 3, 24. Full Name: Note: This exam is closed

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

CATHOLIC JUNIOR COLLEGE General Certificate of Education Advanced Level Higher 2 JC2 Preliminary Examination MATHEMATICS 9740/01

CATHOLIC JUNIOR COLLEGE General Certificate of Education Advanced Level Higher 2 JC2 Preliminary Examination MATHEMATICS 9740/01 CATHOLIC JUNIOR COLLEGE Geeral Certificate of Educatio Advaced Level Higher JC Prelimiary Examiatio MATHEMATICS 9740/0 Paper 4 Aug 06 hours Additioal Materials: List of Formulae (MF5) Name: Class: READ

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 22 Cocept learig Milos Hauskrecht milos@cs.pitt.edu 5329 Seott Square Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Lecture 7: Linear Classification Methods

Lecture 7: Linear Classification Methods Homeork Homeork Lecture 7: Liear lassificatio Methods Fial rojects? Grous Toics Proosal eek 5 Lecture is oster sessio, Jacobs Hall Lobb, sacks Fial reort 5 Jue. What is liear classificatio? lassificatio

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed:

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed: School of Busiess ad Ecoomics Exam: Code: Examiator: Co-reader: Busiess Statistics E_BK_BS / E_IBA_BS dr. R. Heijugs dr. G.J. Frax Date: 5 May, 08 Time: :00 Duratio: Calculator allowed: Graphical calculator

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

CTL.SC0x Supply Chain Analytics

CTL.SC0x Supply Chain Analytics CTL.SC0x Supply Chai Aalytics Key Cocepts Documet V1.1 This documet cotais the Key Cocepts documets for week 6, lessos 1 ad 2 withi the SC0x course. These are meat to complemet, ot replace, the lesso videos

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700 Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU) Basics Purpose of regressio aalysis: predict the value of a depedet or respose

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

ECONOMIC OPERATION OF POWER SYSTEMS

ECONOMIC OPERATION OF POWER SYSTEMS ECOOMC OEATO OF OWE SYSTEMS TOUCTO Oe of the earliest applicatios of o-lie cetralized cotrol was to provide a cetral facility, to operate ecoomically, several geeratig plats supplyig the loads of the system.

More information

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0. 27 However, β MM is icosistet whe E(x u) 0, i.e., β MM = (X X) X y = β + (X X) X u = β + ( X X ) ( X u ) \ β. Note as follows: X u = x iu i E(x u) 0. I order to obtai a cosistet estimator of β, we fid

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Hybridized Heredity In Support Vector Machine

Hybridized Heredity In Support Vector Machine Hybridized Heredity I Suort Vector Machie May 2015 Hybridized Heredity I Suort Vector Machie Timothy Idowu Yougmi Park Uiversity of Wiscosi-Madiso idowu@stat.wisc.edu yougmi@stat.wisc.edu May 2015 Abstract

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

(all terms are scalars).the minimization is clearer in sum notation:

(all terms are scalars).the minimization is clearer in sum notation: 7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1

More information

Design of Engineering Experiments Chapter 2 Basic Statistical Concepts

Design of Engineering Experiments Chapter 2 Basic Statistical Concepts Desig of Egieerig Experimets Chapter 2 Basic tatistical Cocepts imple comparative experimets The hpothesis testig framework The two-sample t-test Checkig assumptios, validit Motgomer_Chap_2 Portlad Cemet

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

Molecular Mechanisms of Gas Diffusion in CO 2 Hydrates

Molecular Mechanisms of Gas Diffusion in CO 2 Hydrates Supportig Iformatio Molecular Mechaisms of Gas Diffusio i CO Hydrates Shuai Liag, * Deqig Liag, Negyou Wu,,3 Lizhi Yi, ad Gaowei Hu,3 Key Laboratory of Gas Hydrate, Guagzhou Istitute of Eergy Coversio,

More information

FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu FMA90F: Machie Learig Lecture 4: Liear Models for Classificatio Cristia Smichisescu Liear Classificatio Classificatio is itrisically o liear because of the traiig costraits that place o idetical iputs

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

CS322: Network Analysis. Problem Set 2 - Fall 2009

CS322: Network Analysis. Problem Set 2 - Fall 2009 Due October 9 009 i class CS3: Network Aalysis Problem Set - Fall 009 If you have ay questios regardig the problems set, sed a email to the course assistats: simlac@staford.edu ad peleato@staford.edu.

More information

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression 6.867 Machie learig: lecture 3 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics Beod liear regressio models additive regressio models, eamples geeralizatio ad cross-validatio populatio miimizer Statistical

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information