Lecture 4 Logistic Regression Dr.Ammar Mohammed
Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd y x2 x22... x2d y2 : : analytically using Matrix : xm... xm2... : : xmd ym Dr.Ammar Mohammed
Normal Equation x x2... xd y x2 x22... x2d y2 : : : xm... xm2... : Output Vector Y : xmd ym Form a design matrix Dr.Ammar Mohammed
Normal Equation Now Given the Design matrix X and Output Vector Y, we can get the value of the vector θ as the following operation Where Is the inverse matrix of (assuming it is invertible) Is the transpose of the matrix Dr.Ammar Mohammed
Exercise Show that how to drive the following equation Dr.Ammar Mohammed
Gradient Descent Vs Normal Equation Normal Equation Pros: Simple and no need to use learning rate No (iterative) algorithm is needed cons: Need to compute the matrix complexity Where n is the dimension of the matrix Slow if n is very large (e.g 0d d>6) Dr.Ammar Mohammed
Logistic Regression Dr.Ammar Mohammed
Supervised Supervised learning Training data includes desired outputs Given examples of input X and output function (label) Y=F(X) Predict function F(X) for new examples X - Discrete F(X): Classification - Continuous F(X): Regression Logistic Regression is a classification method Why is it called Regression if it is a classification?
Classification Problem Classification Is a function F that assigns c a category to each input vector X. X=(x,x2,..,xd) F(x)=y When k =2. Classification is called Binary classification Example Email classification: Spam/not Spam Tumor classification: Cancer/ non-cancer Transaction classification: Fraudulent/ not-fraudulent Generally classification: happens (yes)/ not-happens (no) Y ={0,} Arbitrary 0 is the negative class (absence) is the positive class (presence)
Can Linear Regression Classify? hθ(x) Y Yes x xxx xx x x Cancer 0.5 No 0 x xxx x Tumor Size Cut- off point hθ(x) at 0.5 If Then the output Prediction If Then the output Prediction 0
Problems in Linear Regression for classification? What can be done to convert the equation such that the probability is always between 0,?. If we succeeded to develop method that makes hθ(x) Y Yes We would like to find a mapping that takes the linear combination And to squash t into the range [0,] x xxx xx x x Cancer 0.5 No 0 x xxx x Tumor Size hθ(x) = estimate the probability that y = on the input feature X Example:hθ(x) =0.47 means that the probability of the Tumor is cancer (y=) is 47%
Binary Logistic Regression Given X= {X(),X(2),,X(m)} where X(i)=(x(i),x(i)2,...x(i)d) y={y,y2,,ym} where We want to model p(y= X;θ) that represents the probability of y= given X and parametrized with θ Takes value in the range [0,] Takes any real value Change the probability to ODDS Odds can represent any positive value.if we take the log of the odds it will represents any real value the same as the right hand
Logistic Function Logistic Regression Model How to estimate the parameter θ? Take the inverse of the previous equation Standard Logistic/sigmoid function Our hypothesis in logistic regression
Logistic Function Logistic Regression Model In logistic regression, Our prediction Model is =
Logistic Function Logistic Regression Model Both equation can be combined into
Logistic Function Logistic Regression Model Example: If a bank wants to build a model that predicts which of their customer will default on their loan. The regression model gives To estimate the probability of default if someone has a high credit score 8000 p ( default= )= +e 0.0634 =0. 4846 So, the decision predict default =0
Decision Boundary When the hypothesis makes prediction y= or y= 0? Example What is the condition on z to predict or 0 f(z) 0.5 To predict y= if f(z) >= 0.5 This is happens whenever z >=0 Similarly predict y=0 if f(z) < 0.5 This is happens whenever z <0 z 0 In this case we say that z=0 is the decision boundary (the blue ray above)
Decision Boundary Generally, the decision boundary on the hypothesis Predicting hθ(x)>= 0.5 whenever Predicting 0 hθ(x)< 0.5 whenever Is the decision boundary
Decision Boundary Example of two variables 3 2 Let the equation of hyperplane Let 2 x 3 So predicting if So predicting 0 if 3 2 The decision boundary 2 3 x
Parameters Estimation Maximum likelihood Estimation (MLE) L(θ)= L(θ;X,y) = P(y X;θ) Likelihood function, where X,y fixed parameters We would like to find θ that maximizes L(θ)
Likelihood Maximization This function is difficult to differentiate. So we will take the log of the likelihood. Log is monotonic function, so any maximum of the likelihood function is maximum of the log likelihood function. Note: log ab = log a + log b log ab = b log a We want to find the maximum and get the gradient ascend
Likelihood Maximization To find the maximum likelihood, we would differentiate the log likelihood with respect to the parameters θ H.w: show how to drive the equation?
Gradient Ascent Repeat until convergence { } Run demo