Gradient Ascent Chris Piech CS109, Stanford University
|
|
- Alvin Williams
- 5 years ago
- Views:
Transcription
1 Gradient Ascent Chris Piech CS109, Stanford University
2 Our Path Deep Learning Linear Regression Naïve Bayes Logistic Regression Parameter Estimation
3 Our Path Deep Learning Linear Regression Naïve Bayes Logistic Regression Unbiased estimators Maximizing likelihood Bayesian estimation
4 Review
5
6 Parameter Learning Consider n I.I.D. random variables X 1, X 2,..., X n X i is a sample from density function f(x i q) What are the best choice of parameters q?
7 Piech, CS106A, Stanford University Likelihood (of data given parameters): n Õ L( q ) = f ( q ) i= 1 X i
8 Piech, CS106A, Stanford University Maximum Likelihood Estimation n L( q ) = f ( q ) Õ i= 1 X i LL( ) = nx i=1 log f(x i ) ˆ = argmax LL( )
9 Argmax?
10 Option #1: Straight optimization
11 Computing the MLE General approach for finding MLE of q Determine formula for LL(q) LL( q ) Differentiate LL(q) w.r.t. (each) q : q LL( q ) To maximize, set = 0 q Solve resulting (simultaneous) equations to get q MLE o Make sure that derived is actually a maximum (and not a qˆmle minimum or saddle point). E.g., check LL(q MLE ± e) < LL(q MLE ) This step often ignored in expository derivations So, we ll ignore it here too (and won t require it in this class)
12 End Review Piech, CS106A, Stanford University
13 Maximizing Likelihood with Bernoulli Consider I.I.D. random variables X 1, X 2,..., X n X i ~ Ber(p) Probability mass function, f(x i p):
14 Maximizing Likelihood with Bernoulli Consider I.I.D. random variables X 1, X 2,..., X n X i ~ Ber(p) Probability mass function, f(x i p): PMF of Bernoulli PMF of Bernoulli (p = 0.2) 1 - p p 0 1 f ( X i p) = p x i (1 - p) 1- x i where f(x) xi = =0.2 0 or x 1(1 0.2) 1 x
15 Bernoulli PMF X Ber(p) f(x = x p) =p x (1 p) 1 x
16 Consider I.I.D. random variables X 1, X 2,..., X n X i ~ Ber(p) Probability mass function, f(x i p), can be written as: Likelihood: Log-likelihood: Differentiate w.r.t. p, and set to 0: 1 0 ) (1 ) ( 1 or where = - = - i x x i x p p p X f i i Õ = - - = n i X X i i p p L 1 1 ) (1 (q ) å[ ] å = = = - = n i i i n i X X p X p X p p LL i i ) )log(1 (1 ) (log ) ) (1 log( (q ) = å = = n i i X Y p Y n p Y 1 where ) )log(1 ( ) (log å = = = Þ = = n i MLE X i n n Y p p Y n p Y p p LL ) ( 1 ) ( Maximizing Likelihood with Bernoulli
17 Isn t that the same as the sample mean?
18 Yes. For Bernoulli.
19
20 Maximum Likelihood Algorithm 1. Decide on a model for the distribution of your samples. Define the PMF / PDF for your sample. 2. Write out the log likelihood function. 3. State that the optimal parameters are the argmax of the log likelihood function. 4. Use an optimization algorithm to calculate argmax
21
22 Maximizing Likelihood with Poisson Consider I.I.D. random variables X 1, X 2,..., X n X i ~ Poi(l) -l x e l PMF: f ( X i ) i n l = Likelihood: L( q ) = Õ xi! i= Log-likelihood: n -l X i n e l LL( q) = log( ) = - l log( e) + X log( l) - å i= 1 Differentiate w.r.t. l, and set to 0: X i! = -nl + log(l) n å i= 1 e -l X l 1 X i! å[ i log( X i!) ] i= 1 å X i - log( X i= 1 n n LL( l) 1 1 = -n + å X i = 0 Þ lmle = å l l i= 1 n i= 1 n i!) X i i
23 It is so general!
24 Maximizing Likelihood with Uniform Consider I.I.D. random variables X 1, X 2,..., X n X i ~ Uni(a, b) PDF: Likelihood: o o o 1 ì ïb -a a xi b f ( X i a, b) = í ïî 0 otherwise n ìæ ö ç 1 ï = è - a,,..., ( ) ø 1 2 b q b a x x xn L í ï î 0 otherwise Constraint a x 1, x 2,, x n b makes differentiation tricky Intuition: want interval size (b a) to be as small as possible to maximize likelihood function for each data point But need to make sure all observed data contained in interval If all observed data not in interval, then L(q) = 0 Solution: a MLE = min(x 1,, x n ) b MLE = max(x 1,, x n )
25 Understanding MLE with Uniform Consider I.I.D. random variables X 1, X 2,..., X n X i ~ Uni(0, 1) Observe data: o 0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75 Likelihood: L(a,1) Likelihood: L(0, b) L(a,1) L(0, b) a b
26 Small Samples = Problems How do small samples affect MLE? In many cases, µ = = sample mean o Unbiased. Not too shabby Estimating Normal, o Biased. Underestimates for small n (e.g., 0 for n = 1) As seen with Uniform, a MLE a and b MLE b o Biased. Problematic for small n (e.g., a = b when n = 1) Small sample phenomena intuitively make sense: o o 1 n å MLE X i n i= 1 s 2 MLE 1 = n Maximum likelihood Þ best explain data we ve seen Does not attempt to generalize to unseen data n å i= 1 ( X i - µ MLE ) 2
27 Properties of MLE Maximum Likelihood Estimators are generally: Consistent: lim P( ˆ q -q < e ) = 1 for e > 0 Potentially biased (though asymptotically less so) Asymptotically optimal o Has smallest variance of good estimators for large samples Often used in practice where sample size is large relative to parameter space o n But be careful, there are some very large parameter spaces
28 Piech, CS106A, Stanford University Maximum Likelihood Estimation n L( q ) = f ( q ) Õ i= 1 X i LL( ) = nx i=1 log f(x i ) ˆ = argmax LL( )
29
30 Argmax 2: Gradient Ascent
31 Argmax Option #1: Straight optimization
32 Argmax Option #2: Gradient Ascent
33 Gradient Ascent p(samples ) argmax Walk uphill and you will find a local maxima (if your step size is small enough)
34 Gradient Ascent p(samples ) argmax Walk uphill and you will find a local maxima (if your step size is small enough)
35 Gradient Ascent p(samples ) Especially good if function is convex 1 2 Walk uphill and you will find a local maxima (if your step size is small enough)
36 Gradient Ascent Repeat many times new j = old j old old j This is some profound life philosophy Walk uphill and you will find a local maxima (if your step size is small enough)
37 Piech, CS106A, Stanford University Gradient ascent is your bread and butter algorithm for optimization (eg argmax)
38 Gradient Ascent Initialize: θ j = 0 for all 0 j m Calculate all θ j
39 Gradient Ascent Initialize: θ j = 0 for all 0 j m Repeat many times: gradient[j] = 0 for all 0 j m Calculate all gradient[j] s based on data! j += h * gradient[j] for all 0 j m
40 Review: Maximum Likelihood Algorithm 1. Decide on a model for the likelihood of your samples. This is often using a PMF or PDF. 2. Write out the log likelihood function. 3. State that the optimal parameters are the argmax of the log likelihood function. 4. Use an optimization algorithm to calculate argmax
41 Review: Maximum Likelihood Algorithm 1. Decide on a model for the likelihood of your samples. This is often using a PMF or PDF. 2. Write out the log likelihood function. 3. State that the optimal parameters are the argmax of the log likelihood function. 4. Calculate the derivative of LL with respect to theta 5. Use an optimization algorithm to calculate argmax
42 Linear Regression Lite
43 Predicting Warriors X 1 = Opposing team ELO X 2 = Points in last game X 3 = Curry playing? X 4 = Playing at home? Y = Warriors points
44 Predicting CO 2 (simple) X = CO 2 level Y = Average Global Temperature N training datapoints (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) ) Linear Regression Lite Model Y = X + Z Z N(0, 2 ) Y X N( X, 2 )
45 1) Write Likelihood Fn N training datapoints (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) ) Model Y X N( X, 2 ) First, calculate Likelihood of the data Shorthand for:
46 1) Write Likelihood Fn N training datapoints (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) ) Model Y X N( X, 2 ) First, calculate Likelihood of the data
47 2) Write Log Likelihood Fn N training datapoints: (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) ) Likelihood function: Second, calculate Log Likelihood of the data
48 3) State MLE as Optimization N training datapoints: (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) ) Log Likelihood: Third, celebrate!
49 4) Find derivative N training datapoints: (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) ) Goal: Fourth, optimize!
50 5) Run optimization code N training datapoints: (x (1),y (1) ), (x (2),y (2) ),...(x (n),y (n) )
51 Gradient Ascent Initialize: θ j = 0 for all 0 j m Repeat many times: gradient[j] = 0 for all 0 j m Calculate all gradient[j] s based on data and current setting of theta! j += h * gradient[j] for all 0 j m
52 Linear Gradient Regression Ascent (simple) Initialize: θ = 0 Repeat many times: gradient = 0 Calculate gradient based on data! += h * gradient
53 Linear Regression (simple) Initialize: θ = 0 Repeat many times: gradient = 0 For each training example (x, y): Update gradient for current training example! += h * gradient
54 Linear Regression (simple) Initialize: θ = 0 Repeat many times: gradient = 0 For each training example (x, y): gradient += 2(y - θ x)(x)! += h * gradient
55 Linear Regression
56 Predicting CO 2 X 1 = Temperature X 2 = Elevation X 3 = CO 2 level yesterday X 4 = GDP of region X 5 = Acres of forest growth Y = CO 2 levels
57 Linear Regression Problem: Predict real value Y based on observing variable X Model: Linear weight every feature Ŷ = 1 X m X m + m+1 = T X Training: Gradient ascent to chose the best thetas to describe your data ˆ MLE = argmax nx (Y (i) T x (i) ) 2 i=1
58 Linear Regression Initialize: θ j = 0 for all 0 j m Repeat many times: gradient[j] = 0 for all 0 j m For each training example (x, y): For each parameter j: gradient[j] += (y θ T x)(-x[j])! j += h * gradient[j] for all 0 j m
59 Predicting Warriors Y = Warriors points Ŷ = 1 X m X m + m+1 = T X X 1 = Opposing team ELO X 2 = Points in last game X 3 = Curry playing? X 4 = Playing at home?! 1 = -2.3! 2 = +1.2! 3 = +10.2! 4 = +3.3! 5 = +95.4
60 The Machine Learning Process Learning algorithm g(x) (Classifier) Output (Class) Training data Training data: set of N pre-classified data instances o N training pairs: (x (1),y (1) ), (x (2),y (2) ),, (x (n), y (n) ) Use superscripts to denote i-th training instance Learning algorithm: method for determining g(x) X o o Given a new input observation of x = x 1, x 2,, x m Use g(x) to compute a corresponding output (prediction)
Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationThe Random Variable for Probabilities Chris Piech CS109, Stanford University
The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationCS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel
CS 109 Review CS 109 Review Julia Daniel, 12/3/2018 Julia Daniel Dec. 3, 2018 Where we re at Last week: ML wrap-up, theoretical background for modern ML This week: course overview, open questions after
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationTopic 2: Logistic Regression
CS 4850/6850: Introduction to Machine Learning Fall 208 Topic 2: Logistic Regression Instructor: Daniel L. Pimentel-Alarcón c Copyright 208 2. Introduction Arguably the simplest task that we can teach
More informationTwo hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45
Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationThe Random Variable for Probabilities Chris Piech CS109, Stanford University
The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationDebugging Intuition. How to calculate the probability of at least k successes in n trials?
How to calculate the probability of at least k successes in n trials? X is number of successes in n trials each with probability p # ways to choose slots for success Correct: Debugging Intuition P (X k)
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationMachine Learning (CSE 446): Probabilistic Machine Learning
Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as
More informationCS60021: Scalable Data Mining. Large Scale Machine Learning
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationprobability of k samples out of J fall in R.
Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose
More informationTwo hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.
Two hours MATH38181 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER EXTREME VALUES AND FINANCIAL RISK Examiner: Answer any FOUR
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationCS109: Probability for Computer Scientists. Piech, CS106A, Stanford University
CS109: Probability for Computer Scientists Chris Piech My parents are interesting folks I originally concentrated in graphics and worked at Pixar Childhood: Nairobi, Kenya High School: Kuala Lumpur, Malaysia
More informationNaïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014
Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationLecture 18: Bayesian Inference
Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10 Bayesian Statistical Inference Statiscal inference
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationApplication: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth
Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced
More information