Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

Similar documents
Classification: Decision Trees


( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

The Solution to Assignment 6

Learning Decision Trees

Decision Trees. Gavin Brown

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Administrative notes. Computational Thinking ct.cs.ubc.ca

Learning Decision Trees

Learning Classification Trees. Sargur Srihari

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning and Inductive Inference

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Tree Learning

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

the tree till a class assignment is reached

Decision Trees. Danushka Bollegala

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Support. Dr. Johan Hagelbäck.

Algorithms for Classification: The Basic Methods

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Trees Entropy, Information Gain, Gain Ratio

Chapter 3: Decision Tree Learning

Machine Learning 2nd Edi7on

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

Decision Trees.

Rule Generation using Decision Trees

Slides for Data Mining by I. H. Witten and E. Frank

CS 6375 Machine Learning

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees.

Classification Using Decision Trees

Machine Learning Chapter 4. Algorithms

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Machine Learning Alternatives to Manual Knowledge Acquisition

Bayesian Classification. Bayesian Classification: Why?

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Inductive Learning. Chapter 18. Why Learn?

Decision Tree Learning

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Decision Tree Learning

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Classification and Regression Trees

Einführung in Web- und Data-Science

Decision Trees. Tirgul 5

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Data classification (II)

Induction on Decision Trees

Lecture 3: Decision Trees

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Administrative notes February 27, 2018

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Bayesian Learning. Bayesian Learning Criteria

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

CS145: INTRODUCTION TO DATA MINING

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Decision Tree Learning

Mining Classification Knowledge

Symbolic methods in TC: Decision Trees

Data Mining. Chapter 1. What s it all about?

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Introduction to Machine Learning CMU-10701

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Artificial Intelligence. Topic

10-701/ Machine Learning: Assignment 1

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Machine Learning: Symbolische Ansätze. Decision-Tree Learning. Introduction C4.5 ID3. Regression and Model Trees

Dan Roth 461C, 3401 Walnut

The Naïve Bayes Classifier. Machine Learning Fall 2017

Mining Classification Knowledge

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

The popular table. Table (relation) Example. Table represents a sample from a larger population Attribute

Chapter 3: Decision Tree Learning (part 2)

Classification and Prediction

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Artificial Intelligence Decision Trees

Tools of AI. Marcin Sydow. Summary. Machine Learning

Decision Tree And Random Forest

Unsupervised Learning. k-means Algorithm

Decision Tree Learning - ID3

Typical Supervised Learning Problem Setting

Lecture 3: Decision Trees

UVA CS 4501: Machine Learning

Chapter 6: Classification

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Classification II: Decision Trees and SVMs

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Bias Correction in Classification Tree Construction ICML 2001

Machine Learning & Data Mining

Transcription:

CS 145 Discussion 2

Reminders HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (email: juwood03@ucla.edu)

Overview Linear Regression Z Score Normalization Multidimensional Newton s Method Decision Tree Twitter Crawler Likelihood

Linear Regression Linear model to predict value of a variable y using features x Least Square Estimation Closed form solution

A ball is rolled down a hallway and its position is recorded at five different times. Use the table shown below to calculate Weights Predicted position at each given time and at time 12 seconds

Step 1: Calculate Weights What are X and Y variables? What are the parameters for our problem? Calculating parameters

Step 1: Calculate Weights What are X and Y variables? Time (X 1 ) and Position(Y) What are the parameters for our problem? መβ 1 for time and መβ 0 for intercept Calculating parameters

X = 1 1 1 2 1 4 1 1 6 8 y = 9 12 17 21 26 X T X =? X T X 1 =? X T y =? መβ = X T X 1 X T y =? መβ 0 =? መβ 1 =?

X = 1 1 1 2 1 4 1 1 6 8 y = 9 12 17 21 26 X T X = 1 1 1 1 1 1 2 4 6 8 1 1 1 2 1 4 1 1 6 8 = 5 21 21 121 X T X 1 =? X T y =? መβ = X T X 1 X T y =? መβ 0 =? መβ 1 =?

X = 1 1 1 2 1 4 1 1 6 8 y = 9 12 17 21 26 X T X = 1 1 1 1 1 1 2 4 6 8 1 1 1 2 1 4 1 1 6 8 = 5 21 21 121 X T X 1 = 1 164 121 21 21 5 X T y =? መβ = X T X 1 X T y =? መβ 0 =? መβ 1 =?

X = 1 1 1 2 1 4 1 1 6 8 y = 9 12 17 21 26 X T X = 1 1 1 1 1 1 2 4 6 8 1 1 1 2 1 4 1 1 6 8 = 5 21 21 121 X T X 1 = 1 164 መβ = X T X 1 X T y 121 21 21 5 X T y = 1 1 1 1 1 1 2 4 6 8 9 12 17 21 26 = 85 435 =? መβ 0 =? መβ 1 =?

X = 1 1 1 2 1 4 1 1 6 8 y = 9 12 17 21 26 X T X = 1 1 1 1 1 1 2 4 6 8 1 1 1 2 1 4 1 1 6 8 = 5 21 21 121 X T X 1 = 1 164 121 21 21 5 X T y = 1 1 1 1 1 1 2 4 6 8 9 12 17 21 26 = 85 435 መβ = X T X 1 X T y = 1 164 = 7.012 2.378 121 21 21 5 85 435 መβ 0 = 7.012 መβ 1 = 2.378

Step 2: Predict positions Plug time values into linear regression equation (Position) = 2.378 (Time) + 7.012 Predicted value at time = 12 secs Position = 2.378 * 12 + 7.012 = 35.548 Matrix form to predict all other positions y = X መβ

Step 2: Predict positions Plug time values into linear regression equation (Position) = 2.378 (Time) + 7.012 Matrix form to predict all other positions y = X መβ y = 1 1 1 2 1 4 1 1 6 8 7.012 2.378 = 9.39 11.768 16.524 21.28 26.036

Plot

Z Score Normalization Why normalize features? Different feature ranges such as [-1, 1] and [-100, 100] may negatively affect algorithm performance Small change in bigger range can affect more than huge change in smaller range Z Score (Standard Score) z ij = x ij μ j σ j z ij is the standard score for feature j of data point i x ij is the value of feature j of data point i μ j and σ j are mean and standard deviation of feature j

Normalize feature Distance Compute Mean μ dist = 1 σ N i=1 N x i.dist =? Computer Standard Deviation σ dist = N (xi.dist μ i=1 dist ) 2 N 1 =?

Normalize feature Distance Compute Mean μ dist = 1 σ N i=1 N x i.dist = 15+200+290+520 4 Computer Standard Deviation σ dist = N (xi.dist μ i=1 dist ) 2 N 1 =? = 256.25

Normalize feature Distance Compute Mean μ dist = 1 σ N i=1 N x i.dist = 15+200+290+520 4 Computer Standard Deviation σ dist = = 256.25 N (xi.dist μ i=1 dist ) 2 = N (15 256.25) 2 +(200 256.25) 2 +(290 256.25) 2 +(520 256.25) 2 4 = 181.7063221

μ dist = 256.25 σ dist = 181.7063221 Compute standard scores z virgo.dist = x virgo.dist μ dist σ dist =? z ursa.dist = x ursa.dist μ dist σ dist =? z corona.dist = x corona.dist μ dist σ dist =? z bootes.dist = x bootes.dist μ dist σ dist =? Similarly, other features like velocity can be standardized

μ dist = 256.25 σ dist = 181.7063221 Compute standard scores z virgo.dist = x virgo.dist μ dist σ dist = 15 256.25 181.706 = 1.328 z ursa.dist = x ursa.dist μ dist = 200 256.25 σ dist 181.706 z corona.dist = x corona.dist μ dist = 290 256.25 σ dist 181.706 z bootes.dist = x bootes.dist μ dist = 520 256.25 σ dist 181.706 = 0.3096 = 0.186 = 1.452 Similarly, other features like velocity can be standardized

Multidimensional Newton s Method x (0) = [3, 1, 0] f x 1, x 2, x 3 = x 1 + 10x 2 2 + 5 x 1 x 3 2 + x 2 2x 3 4 What is f(x (0) )?

Multidimensional Newton s Method x (0) = [3, 1, 0] f x 1, x 2, x 3 = x 1 + 10x 2 2 + 5 x 1 x 3 2 + x 2 2x 3 4 What is f(x (0) )? 3 + 10 1 2 + 5 3 0 2 + 1 2 0 4 = 95

Multidimensional Newton s Method x (0) = [3, 1, 0] f x 1, x 2, x 3 = x 1 + 10x 2 2 + 5 x 1 x 3 2 + x 2 2x 3 4 What is f(x (0) )? 3 + 10 1 2 + 5 3 0 2 + 1 2 0 4 = 95 What is f x?

Multidimensional Newton s Method x (0) = [3, 1, 0] f x 1, x 2, x 3 = x 1 + 10x 2 2 + 5 x 1 x 3 2 + x 2 2x 3 4 What is f(x (0) )? 3 + 10 1 2 + 5 3 0 2 + 1 2 0 4 = 95 What is f x? f x 1 = 2 x 1 + 10 x 2 + 10 (x 1 x 3 ) f x 2 = 20 x 1 + 10 x 2 + 4 x 2 2 x 3 3 f x 2 = 10 x 1 x 3 + 8 x 2 2 x 3 3

Multidimensional Newton s Method What is f x? f x 1 = 2 x 1 + 10 x 2 + 10 (x 1 x 3 ) f x 2 = 20 x 1 + 10 x 2 + 4 x 2 2 x 3 3 f x 2 = 10 x 1 x 3 + 8 x 2 2 x 3 3 What is F(x)?

Multidimensional Newton s Method What is f x? f x 1 = 2 x 1 + 10 x 2 + 10 (x 1 x 3 ) f x 2 = 20 x 1 + 10 x 2 + 4 x 2 2 x 3 3 f x 2 = 10 x 1 x 3 + 8 x 2 2 x 3 3 What is F(x)? 12 20 10 20 2 10 + 12 x 2 2 x 3 2 24 x 2 2 x 3 10 2 24 x 2 2 x 3 2 48 x 2 2 x 3

Multidimensional Newton s Method What is F x? 12 20 10 20 10 + 12 x 2 2 x 2 3 24 x 2 2 x 2 3 10 24 x 2 2 x 2 3 48 x 2 2 x 2 3 What is f x (0)?

Multidimensional Newton s Method x (0) = [3, 1, 0] What is f x? f x 1 = 2 x 1 + 10 x 2 + 10 (x 1 x 3 ) f x 2 = 20 x 1 + 10 x 2 + 4 x 2 2 x 3 3 f x 2 = 10 x 1 x 3 3 + 8 x 2 2 x 3 What is f x (0)?

Multidimensional Newton s Method x (0) = [3, 1, 0] What is f x? f x 1 = 2 x 1 + 10 x 2 + 10 (x 1 x 3 ) f x 2 = 20 x 1 + 10 x 2 + 4 x 2 2 x 3 3 f x 2 = 10 x 1 x 3 3 + 8 x 2 2 x 3 What is f x (0)? 16, 144, 22

Multidimensional Newton s Method What is f x (0)? 16, 144, 22 What is F x (0)?

Multidimensional Newton s Method x (0) = [3, 1, 0] What is F x? 12 20 10 20 10 + 12 x 2 2 x 2 3 24 x 2 2 x 2 3 10 2 24 x 2 2 x 3 2 48 x 2 2 x 3 What is F x (0)?

Multidimensional Newton s Method x (0) = [3, 1, 0] What is F x? 12 20 10 20 10 + 12 x 2 2 x 2 3 24 x 2 2 x 2 3 10 2 24 x 2 2 x 3 2 48 x 2 2 x 3 What is F x (0)? 12 20 10 20 22 24 10 24 48

Multidimensional Newton s Method What is F x 0 1? 0.079 0.119 0.043 0.1192 0.0788 0.0145 0.043 0.0145 0.0225 What is f x (0)? 16, 144, 22 What is F x 0 1 f x 0?

Multidimensional Newton s Method What is F x 0 1? 0.079 0.119 0.043 0.1192 0.0788 0.0145 0.043 0.0145 0.0225 What is f x (0)? 16, 144, 22 What is F x 0 1 f x 0? 19.384, 13.576, 2.291

Multidimensional Newton s Method 1. Guess x (0) 2. Get f x 3. Get F(x) 4. n = 0 5. Calculate f x (n) 6. Calculate F x (n) 7. Calculate F x n 1 8. x (n+1) = x (n) F x n 1 f x n 9. n = n + 1

Multidimensional Newton s Method 1. Guess x (0) 2. Get f x 3. Get F(x) 4. n = 0 5. Calculate f x (n) 6. Calculate F x (n) 7. Calculate F x n 1 8. x (n+1) = x (n) F x n 1 f x n 9. n = n + 1

Weather Data: Play or not Play? Outlook Temperature Humidity Windy Play? sunny hot high false No sunny hot high true No overcast hot high false Yes rain mild high false Yes rain cool normal false Yes rain cool normal true No overcast cool normal true Yes sunny mild high false No sunny cool normal false Yes rain mild normal false Yes sunny mild normal true Yes overcast mild high true Yes overcast hot normal false Yes rain mild high true No Note: Outlook is the Forecast, no relation to Microsoft email program 38

Example Tree for Play? Outlook sunny overcast rain Humidity Yes Windy high normal false true No Yes No Yes 39

Choosing the Splitting Attribute At each node, available attributes are evaluated on the basis of separating the classes of the training examples. A Goodness function is used for this purpose. Typical goodness functions: information gain (ID3/C4.5) information gain ratio gini index 40

Which attribute to select? 41

A criterion for attribute selection Which is the best attribute? The one which will result in the smallest tree Heuristic: choose the attribute that produces the purest nodes Popular impurity criterion: information gain Information gain increases with the average purity of the subsets that an attribute produces Strategy: choose attribute that results in greatest information gain 42

Entropy of a split Information in a split with x items of one class, y items of the second class info([x, y]) entropy( x x y, x y ) y x x y log( x x y ) x y y log( x y y ) 43

Example: attribute Outlook Outlook = Sunny : 2 and 3 split info([2,3] ) entropy(2/ 5,3/5) 2 2 log( ) 5 5 3 3 log( ) 5 5 0.971bits 44

Outlook = Overcast Outlook = Overcast : 4/0 split info([4,0] ) entropy(1,0) 1log(1) 0log(0) 0 bits Note: log(0) is not defined, but we evaluate 0*log(0) as zero 45

Outlook = Rainy Outlook = Rainy : info([3,2] ) entropy(3/ 5,2/5) 3 3 log( ) 5 5 2 2 log( ) 5 5 0.971bits 46

Expected Information Expected information for attribute: info([3,2],[4,0],[3,2]) (5/14) 0.971 (4/14) 0 (5/14) 0.971 0.693bits 47

Computing the information gain Information gain: (information before split) (information after split) gain(" Outlook") info([9,5]) - info([2,3],[4,0],[3,2]) 0.247 bits 0.940-0.693 Information gain for attributes from weather data: 48 gain(" Outlook") gain(" Temperature") gain(" Humidity") gain(" Windy") 0.247 bits 0.029 bits 0.152 bits 0.048 bits

Continuing to split gain(" Temperature") 0.571bits gain(" Humidity") 0.971bits gain(" Windy") 0.020 bits 49

The final decision tree Note: not all leaves need to be pure; sometimes identical instances have different classes Splitting stops when data can t be split any further 50

Twitter API

python Get python (Anaconda recommended) https://www.anaconda.com/ Get an IDE (PyCharm) https://www.jetbrains.com/pycharm/ Set PyCharm interpreter to Anaconda File Settings Project: <name> Python Interpreter

python

python Get python (Anaconda recommended) https://www.anaconda.com/ Get an IDE (PyCharm) https://www.jetbrains.com/pycharm/ Set PyCharm interpreter to Anaconda File Settings Project: <name> Python Interpreter Make sure command line python and pip are pointing to Anaconda

python-twitter pip install tweepy

Twitter Sign-up https://twitter.com/signup (must add phone number) Register an app https://apps.twitter.com/ Create New App

Twitter Sign-up https://twitter.com/signup Register an app https://apps.twitter.com/ Create New App Get the keys and access tokens Keys and Access Tokens tab Create my access token

Twitter Rate limits Searching 24 hours x 4 15-minute increments x 450 requests per 15-minute increments = 43,200 requests per day Streaming 1%?

Likelihood Is likelihood a density or probability?

Likelihood Is likelihood a density or probability? No, it is the multiplication of densities

Likelihood Is likelihood a density or probability? No, it is the multiplication of densities Densities often < 1 Multiplication approaches epsilon (smallest non-zero positive value any language can handle) exponentially Likelihood used in gradient ascent if complex function partial derivative can get messy

Likelihood Solution?

Likelihood Solution? Take the log

Likelihood Solution? Take the log log x y = log x + log y Approaches ± linearly Easier to take derivative Density > 1 Log-likelihood > 1 Density < 1 Log-likelihood < 0