Chicago Crime Category Classification (CCCC)
|
|
- Allan Ellis
- 5 years ago
- Views:
Transcription
1 Chicago Crime Category Classification (CCCC) Sahil Agarwal 1, Shalini Kedlaya 2 and Ujjwal Gulecha 3 Abstract It is very important and crucial to keep a city safe from crime. If the police could be given information regarding the types of crime that occur over time and in different locations, they could be better equipped to fight crime. We use Chicago crime data to predict the category of crime based on date, time and some geographically relevant features such as latitude and longitude. We tried various models like Multinomial Naive Bayes, Decision Trees, Multinomial Logistic Regression and Random Forest Classifier to achieve this classification I. INTRODUCTION We use various statistical classification models to predict the category of a crime based on the date, time and few geographically relevant features. We used the Chicago crime dataset which we obtained from Kaggle [1]. Due to limitations of computational capacity we restricted our analysis and modeling to data for 2015 and The following sections describe the dataset, analysis of data, various models and results. A. Information about Data II. THE DATASET The dataset consists of information about crimes that occurred in Chicago for There are roughly 0.5 million data points. Each data point consists of the following fields as shown in Table I There are 21 different fields for each data point. B. Data representation Our goal is to predict the Primary Type of crime given a set of features. Description, IUCR and FBI code are directly indicative of the primary type and cannot be used as features. ID and Case Number are unique to each crime incident and hence add no value in prediction. Arrest and Updated On features are determined only after the crimes are committed and are not available at the time of prediction. Location, latitude and longitude correspond to X,Y coordinates as defined for Chicago. Field Description ID Unique identifier for the record Case Number The Chicago Police Department RD Number Date Date when the incident occurred in mm/dd/yyyy. Block The partially redacted address where the incident occurred. IUCR The Illinois Uniform Crime Reporting code Primary Type The primary description of the IUCR code. Description The secondary description of the IUCR code Location Description Description of the location where the incident occurred. Arrest Indicates whether an arrest was made. Domestic Indicates whether the incident was domestic-related Beat Indicates the beat (the smallest police geographic area) where the incident occurred. District Indicates the police district where the incident occurred. Ward The ward (City Council district) where the incident occurred. Community Area Indicates the community area where the incident occurred. Chicago has 77 community areas. FBI Code Indicates the crime classification as outlined in the FBI s National Incident-Based Reporting System (NIBRS). X Coordinate The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. Y Coordinate The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. Year Year the incident occurred. Updated On Date and time the record was last updated. Latitude The latitude of the location where the incident occurred. Longitude The longitude of the location where the incident occurred. TABLE I: Fields and Descriptions 1 saa034@eng.ucsd.edu [A ] 2 skedlaya@eng.ucsd.edu [A ] 3 ugulecha@eng.ucsd.edu [A ]
2 Primary Category Occurrences Theft Battery Criminal Damage Narcotics Assault Other Offense Deceptive Practice Burglary Robbery Motor Vehicle Theft Criminal Trespass Weapons Violation Offense Involving Children Public Peace Violation Crim Sexual Assault Total Percentage total crime 22.42% 18.76% 18.76% 6.86% 6.76% 6.54% 6.18% 5.19% 4.09% 4.06% 2.4% 1.29% 0.84% 0.76% 0.53% of Fig. 1: Total crime, Thefts, Battery and Narcotics cases for every month TABLE II: Different crimes and their frequency C. Data Analysis We performed some basic data analysis to understand trends in the data. Theft is the most common type of crime comprising 22.42% of all crime, followed by battery at 18.76%, criminal damage at 18.76%, narcotics at 6.86% and assault at 6.76%. The occurrences of 15 most common categories of crimes over the five years is shown in Table II. Figures 1 and 2 show this variation of total crime and different representative categories over different months of the year and different hours in a day respectively. These seem to follow common sense patterns of reasoning. Total crime in Chicago dips during the winter months because even criminals feel cold. Total crime is least in the period 5 am - 6 am because the bad elements of society also need their sleep. The category of crime is not influenced significantly by the month and we predicted that the month will not play an important role in category prediction. However, it is influenced by the time of day: for example, battery occurring more frequently than theft in the late hours of the day as shown in Figure 2. Geographical location also influences the frequency and nature of crime. Shown in Figure 3 is the heat map of crime in Chicago in the year Heat maps for thefts and narcotics are shown in Figure 4. The red concentration in the narcotics-crimes map is in the infamous Far West Side of Chicago. More cases of theft occur in the affluent neighborhoods or shopping districts (the red concentration in Figure 4a is the Near North, a prime shopping and dining area). Fig. 2: Total crime, Thefts, Battery and Criminal damage cases for every hour Fig. 3: Heat map of crime in Chicago
3 time (like a month) and asked to do predict a category we can t do better than 44%. This became our Holy Grail for category prediction. For comparison we also predicted the top two and top three categories. If any of them match the actual value we consider that a success. For top 3 our initial experiments gave an accuracy of 62% which was quite promising. (a) Thefts (b) Narcotics Fig. 4: Heat maps of different types of crimes III. P REDICTIVE TASK AND I MPLEMENTATION DETAILS Our task is to predict the primary crime type based on the features we have described in the previous sections. A. Dataset Later in our experimental stage, our poor laptops were unable to handle the > 1.4 million data points from the Kaggle dataset. So we decided to restrict ourselves to crime date from 2015 and This had around 0.5 million samples which is a good number and is fairly recent therefore more accurate data. We did a 60:20:20 random split for training, validation and testing respectively. B. Performance evaluation and Baseline We chose a very natural performance measure for multi-class classification: accuracy of predictions vs the actual category (y value). To check how well our models did, we developed a baseline model to judge against. For the baseline, we predicted the most common crime category in the training set (Theft) for every data point in the test set. We obtained an accuracy of 22% as expected. C. Theorizing an upper bound on accuracy We conducted a few initial experiments using Multinomial Naive Bayes classifier with simple feature selection. We improved on our baseline performance by few percentage points but started peaking around 28%. We then attempted to find out the reason behind this. We looked at crime categories for every hour of every month in every police beat (274 of them). No single category dominated by far. We looped over all beats, months and hours and found the average domination by a single crime catergory is 44%. So if we are given a location, time of day and a period of D. Pre-processing date for feature extraction Since all the features were categorical, we had to use one-hot encoding to represent those features. We also removed the data points which had no locations given. There were less than 100 such samples. The below features had to be pre-processed to use them in our experiments: Date : Month was extracted from day to verify that it does not add value to category prediction as shown in Figure 1 Time : hour of the day is indicative of certain kinds of crimes as indicated in in Figure 2. Location is a major factor influencing the type of crime. Figures 4b and 4a showed us the localization of different crimes and we thus used various methods for determining the location of our sample: 1) Beat unique police beats defined by the Chicago police. 2) x-y coordinate grid - location represented by x-y coordinates were divided into a 9x9 grid. 3) K-means - K-means clustering (Number of clusters = 25) was done on the x-y coordinates and the cluster centers were used as a feature. 4) Block unique block names 5) Type of location types of locations 6) Community Area - 77 unique community areas 7) Ward - 50 unique wards 8) District - 23 unique districts Domestic crime: True/False IV. C LASSIFICATION M ODELS AND THEIR P ERFORMANCE We evaluated several models such as Multinomial Naive Bayes, Decision trees and Random Forest classifier. Other models such as Multinomial Logistic regression and Multiclass SVM Classifier were unsuitable for this task and are described in section D.
4 All models were evaluated against the same baseline that predicts THEFT as the most common crime and gives an accuracy of 22%. The performance of our classifiers on the training and validation sets are given in Tables III, IV and V. We are evaluating the accuracy of our top prediction, top 2 predictions and top 3 predictions written as top1/top2/top3 A. Multinomial Naive Bayes Classifier The Multinomial Naive Bayes classifier is a simple probabilistic classifier which is based on Bayes theorem with a strong assumption that all features are conditionally independent. The Multinomial Naive Bayes model is computationally less intensive and this worked well for us with our limited CPU power and memory. The shorter training time meant that we could evaluate multiple combinations of features. Our data set had multiple representations for location. Taking only a limited number of features for location, and combining them with other features such as time and domestic crime, we could achieve a model that was more or less conditionally independent. Our experimental results also show that these features which had the best assumption for conditional independence performed the best for the Multinomial Bayes Model. Table III lists the features that were used for selecting the features to use. 10 different experiments were tried. The train and validation accuracies are listed in the table. B. Decision Tree Classifier A decision tree is built using the train dataset considering all features. Decision tree models are robust to noisy data and are capable of learning expressions that lack connection. This is very suitable to our dataset which has features like time, location, domestic crime which do not have a direct connection with one another. Our dataset is also noisy as the time is approximated in cases where the exact time is unknown and we had to remove several data points for which information about the location was missing. Decision trees can also mirror human decision making better than other approaches. A decision tree can take many hyper - parameters. We initially performed our experiments with the parameters: [max depth=50, min samples split=30, min samples leaf=20] which gave us an accuracy of 38.04%. We then performed grid search and then found the best parameters to be: [max depth=150,min samples split=70, min samples leaf=40]. This improved the accuracy to 38.49%. Table V lists the features that were used with the two models that we described above selecting the features to use. 5 different experiments were conducted on each model. The train and validation accuracies are listed in the table. C. Random Forest Classifier A random forest produces a large number of decision trees. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Categorical variables also increase the computational complexity to create trees. The same features of the dataset that help decision trees also help random forest. We initially performed our experiments with the parameters: [n estimators = 70, min samples split = 30, bootstrap = True, max depth = 50, min samples leaf = 25] which gave us an accuracy of 35.7%. We then performed grid search and then found the best parameters to be: [n estimators = 150, min samples split = 60, bootstrap = True, max depth = 70, min samples leaf = 45]. This improved the accuracy to 35.9%. While we tried to optimize the parameters for random forest, we still observed overfitting. This is validated by the train data having a high accuracy of 38.92%, but the test data performing poorly. Table IV lists the features that were used with the two models that we described above selecting the features to use. 5 different experiments were conducted on each model. The train and validation accuracies are listed in the table. D. Unsuitable Models While Multinomial Logistic Regression works well when features are categorical, it is very expensive to train for a data set that has a large number of classes. Given this and the size of our training data, it was not practical to include it in our experiments. We faced a similar problem while using the multi-class SVM classifier as it failed to fit the model in a reasonable time frame because of the large number of samples. The Gaussian Naive Bayes model is suitable for continuous data that has Gaussian Distribution. Since our features do not follow this distribution, the Gaussian Naive Bayes Model performed horribly, giving an accuracy of 1.6%. V. RELATED LITERATURE We used the dataset from Kaggle [1]. This dataset was not used in a Kaggle competition, rather it was one of the datasets Kaggle has in their datasets collection. A similar dataset that has been analyzed and studied
5 a lot is the San Francisco Crime Dataset [2]. The San Francisco Crime Dataset had features very similar to our dataset. We read some previous year submissions for CSE 255 and submissions on Kaggle. Various models like Multinomial Naive Bayes, Decision Trees, Random Forest Regressors were used. We took inspiration from reading these papers to represent our geographical features in a grid based system and also use k-means. We found a book on predictive policing with descriptions of various models used for predictive analytics pertaining to crime analysis. We have been using this information as inspiration to design our own models used in this report [3]. The conclusions that we could draw were that it is hard to predict one category of crime given date, time and location and this was independent of the dataset, i.e San Francisco Dataset or Chicago Dataset. High Kaggle ranks for San Francisco had accuracies of 23% [4]. This just shows that the hypothesis we have that given date, time and location, you cannot have a good accuracy by predicting one label is true. VI. RESULTS AND CONCLUSION A. Performance of models For Multinomial Naive Bayes, in Table III, row indexes 1,2 and 4, the accuracies on the test data set were: 34.66/50.82/61.76%, 37.52/54.93/66.55% and 37.33/54.56/66.59% respectively. For Decision Trees, in Table V, row indexes 1,2 and 4, the accuracies on the test data set were: 37.11/53.21/64.90%, 32.1/46.11/57.05% 37.96/54.61/66.10% respectively. For Decision Trees, in Table V, row indexes 6,7 and 9, the accuracies on the test data set were: 38.04/54.39/66.12%, 37.85/54.21/66.10% and 38.30/55.22/66.98% respectively. For Random Forest Classifier, in Table IV, row indexes 1,2 and 4, the accuracies on the test data set were: 35.55/52.30/63.55%, 35.50/52.33/64.34% 35.20/51.12/62.66% respectively. For Random Forest Classifiers, in Table IV, row indexes 6,7 and 9, the accuracies on the test data set were: 35.44/52.22/63.34%, 35.23/51.89/64.05% 35.65/51.63/63.77% respectively. Decision Tree learning model with max depth of 150, min samples split of 70 and min samples leaf of 40 (shown in red in Table V) performed the best amongst all models we considered. We think that the classifier does better with more depth compared to previous model as it is able to have longer root to leaf paths which account for the different feature values to get a classification. The model did not over-fit with this depth as it was significantly lesser than the total number of features at any experiment. Also, we think that having a bigger min samples split and min samples leaf helped in better classification of the major crimes in the dataset. From existing documentation, we had expected the random forest classifier to perform better. Having a large number of categorical features that are encoded as one hot may cause it to find patterns in the train data that do not exist in the test data. B. Interpretation of features Adding the day of the month as a feature changed the accuracy by % for different combinations of features. From this we gather that days do not add any useful information to the predictive task and a slight variation in the accuracy may be from random shuffling of the data. We do not expect crime to vary depending on the day of the month. From our data analysis (Fig 1) we saw that while the total number of crimes changes across months, the category of crime is not influenced by the month. we also saw that the category of crime changes depending on the hour of the day (Fig 2). We validated this through our experiments. Adding month as a feature only reduced the accuracy and adding hour increased it. Of all the features to represent location, adding beat provided the most information. Beats are geographical areas defined by the police and we think they have divided area according to category and concentration of crimes. Other location features that we generated such as the grid, KMeans clusters, block and type of location improved the accuracy marginally when used together. They only provided little additional information not already covered by beat. Other location features such as ward, district and community area reduced the accuracy. This could be because adding additional categorical features that categorize location in different ways over-complicates the model and violates Occam s Razor. Conclusion (TLDR) Our best model was the Decision Tree classifier and the best feature representation was one that included a few or one element each representing time, geographic location, type of location and whether it is domestic or not (going with Occam s Razor).
6 The accuracy for top1/top2/top3 category predictions was 38.30/55.22/66.98%. We conclude that it is hard to predict the category of crime given date, time and location and this mirrors the conclusion drawn in the work on San Francisco Dataset. This further reinforces our hypothesis of not getting better accuracy than 44%. REFERENCES [1] Crimes in chicago. currie32/crimes-in-chicago. Accessed: [2] San francisco crime classification. com/c/sf-crime. Accessed: [3] Brian McInnis Perry Walter L. and John S. Hollywood. Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations. Santa Monica, CA: RAND Corporation, [4] Silvia Chyou Shen Ting Ang, Weichen Wang. San francisco crime classification. CSE 255 Fall 2015, 2015.
7 Index Features Performance[In Percentage] Training set validation set 1 month + days + hour + block + location + domestic + beat + district + ward /50.89/ /50.25/ month + days + hour + block + location + beat + district + ward + community area 31.64/47.89/ /47.56/ grid(for + kmeans(for 3 month + hour + block + location + domestic + beat + district + ward + community area 34.64/50.49/ /50.23/ grid(for + kmeans(for 4 month + hour + block + location + domestic + beat + kmeans(for 37.62/55.09/ /54.65/ month + hour + location + domestic + beat + district + ward + community area /50.01/ /49.61/61.16 grid(for + kmeans(for 6 month + hour + days+ location + domestic + beat + district + ward + community area 34.44/49.99/ /49.54/ grid(for + kmeans(for 7 month + hour + block + location + domestic + beat 38.01/ 55.33/ /54.96/ month + days + hour + block + location + domestic + beat + kmeans(for xy 37.77/ 55.01/ /54.67/66.63 coordinates) 9 hour + block + location + domestic + beat + grid(for + kmeans(for 37.66/54.88/ /54.47/ month + hour + block + weekday + location + domestic + beat + kmeans(for xy coordinates) 34.66/ 50.55/ /50.25/61.86 TABLE III: Results of experiments using Multinomial Naive Bayes classifier
8 Index Features Performance[In Percentage] Training set validation set n estimators = 70, min samples split = 30, bootstrap = True, max depth = 50, min samples leaf = 25 1 month + days + hour + block + location + domestic + beat + district + ward /52.55/ /52.31/ month + hour + block + location + domestic + beat 35.85/52.65/ /52.54/ month + hour + block + location + domestic + beat + kmeans(for 35.38/51.27/ /50.95/ hour + block + location + domestic + beat + grid(for + kmeans(for 35.60/51.55/ /51.17/ month + hour + block + location + domestic + beat + kmeans(for 35.65/ 50.91/ /50.78/63.04 n estimators = 150, min samples split = 60, bootstrap = True, max depth = 70, min samples leaf = 45 6 month + days + hour + block + location + domestic + beat + district + ward /52.58/ /52.36/ month + hour + block + location + domestic + beat 35.70/52.57/ /52.35/ month + hour + block + location + domestic + beat + kmeans(for 35.81/52.57/ /52.22/ hour + block + location + domestic + beat + grid(for + kmeans(for 35.90/52.47/ /52.16/ month + hour + block + location + domestic + beat + kmeans(for 35.72/52.57/ /51.57/63.43 TABLE IV: Results of experiments using Random Forest classifier
9 Index Features max depth=50,min samples split=30, min samples leaf=20 1 month + days + hour + block + location + domestic + beat + district + ward + Performance[In Percentage] Training set validation set 39.5/57.2/ /53.88/ month + hour + block + location + domestic + beat 35.4/49.4/ /46.30/ month + hour + block + location + domestic + beat + kmeans(for 35/48.4/ /45.08/ hour + block + location + domestic + beat + grid(for + kmeans(for 40.82/59.37/ /54.74/ month + hour + block + location + domestic + beat + kmeans(for 39.59/59.33/ /53.91/65.25 max depth=150,min samples split=70, min samples leaf=40 6 month + days + hour + block + location + domestic + beat + district + ward /56.52/ /54.97/ month + hour + block + location + domestic + beat 40.01/57.94/ /54.87/ month + hour + block + location + domestic + beat + kmeans(for 39.53/57.51/ /54.98/ hour + block + location + domestic + beat + grid(for + kmeans(for 40.05/ 58.22/ /55.34/ month + hour + block + location + domestic + beat + kmeans(for 39.98/57.94/ /54.98/66.65 TABLE V: Results of experiments using Decision Tree classifier
Area-Specific Crime Prediction Models
Area-Specific Crime Prediction Models Mohammad Al Boni and Matthew S. Gerber Department of Systems and Information Engineering, University of Virginia, Charlottesville, Virginia, USA Email: {ma2sm, msg8u}@virginia.edu
More informationCaesar s Taxi Prediction Services
1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi
More informationCrime Prediction using Businesses and Housing Values in San Francisco
Crime Prediction using Businesses and Housing Values in San Francisco James Jung Lee, Joel Kek, Yik Lun Lee Introduction Predictive policing is the idea of using technology and data analytics to proactively
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationPredictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using
More informationMajor Crime Map Help Documentation
Major Crime Map Help Documentation This web application is designed to make it easier to visualize and understand crime trends in Overland Park. The data for this application are generally limited to relatively
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationDecision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University
Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationForecasting Model for Criminality in Barangay Commonwealth, Quezon City, Philippines using Data Mining Techniques
Forecasting Model for Criminality in Barangay Commonwealth, Quezon City, Philippines using Data Mining Techniques Bertiz Armenico R, Vinluan Albert A, Laureta Marc P, Paltad Anthony B and Marques Westley
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationUC POLICE DEPARTMENT REPORTS DASHBOARD
UC POLICE DEPARTMENT REPORTS DASHBOARD UC MERCED Annual 1. UC Merced FBI Part I Crime Offenses 2 2. UC Merced FBI Part II Crime Offenses 3 3. UC Merced Arrests - FBI Crime Offenses 4 4. UC Merced Value
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationDecision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis
Decision-Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) [read Chapter 3] [some of Chapter 2 might help ] [recommended exercises 3.1, 3.2] Decision tree representation
More informationTutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining
1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class
More informationChapter 3: Decision Tree Learning
Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) Administration Books? New web page: http://www.cs.rutgers.edu/~mlittman/courses/ml03/ schedule lecture notes assignment info.
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning: Assignment 1
10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationPredicting flight on-time performance
1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.
More information- Geography is important when studying crime because when a crime happens it always has a location, a time and a reason.
- Geography is important when studying crime because when a crime happens it always has a location, a time and a reason. - Crime figures are collected from the number of crime incidents reported to the
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationDecision Trees (Cont.)
Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split
More informationAdministration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function
Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationQuestion of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning
Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationClassification II: Decision Trees and SVMs
Classification II: Decision Trees and SVMs Digging into Data: Jordan Boyd-Graber February 25, 2013 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Digging into Data: Jordan Boyd-Graber ()
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationChapter 3: Decision Tree Learning (part 2)
Chapter 3: Decision Tree Learning (part 2) CS 536: Machine Learning Littman (Wu, TA) Administration Books? Two on reserve in the math library. icml-03: instructional Conference on Machine Learning mailing
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationDecision Tree Learning
0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationBayesian Classification. Bayesian Classification: Why?
Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationU.C. Davis FBI Part I & Part II Crime Offenses 2008 to 2010
U.C. Davis FBI Part I & Part II Crime Offenses 2008 to 2010 2010 PART I OFFENSES 2008 Number of Actual Offenses 2009 Number of Actual Offenses 2010 Number of Actual Offenses 2009 to 2010 Percent Change
More informationAlgorithms for Classification: The Basic Methods
Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationFBI Part I & Part II Crime Offenses Arrests Miscellaneous Activity Value of Stolen Property Crime Pie Charts Crime Line Charts Crime Rate Charts
U.C. Davis Medical Center Crime Statistics (Medical Center) PDF Version FBI Part I & Part II Crime Offenses Arrests Miscellaneous Activity Value of Stolen Property Crime Pie Charts Crime Line Charts Crime
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationIntroduction: Name Redacted November 17, 2017 GSP 370 Research Proposal and Literature Review
Name Redacted November 17, 2017 GSP 370 Research Proposal and Literature Review Introduction: The study of spatial patterns and distributions has been conducted for many years. Over the last few decades,
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationAE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)
1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationbrainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon
I Decision Tree Learning Part I brainlinksystem.com $25+ / hr Illah Nourbakhsh s version Chapter 8, Russell and Norvig Thanks to all past instructors Carnegie Mellon Outline Learning and philosophy Induction
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationDreem Challenge report (team Bussanati)
Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationIntroducing GIS analysis
1 Introducing GIS analysis GIS analysis lets you see patterns and relationships in your geographic data. The results of your analysis will give you insight into a place, help you focus your actions, or
More informationModel Averaging With Holdout Estimation of the Posterior Distribution
Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca
More informationDECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]
1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationClassification and Prediction
Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model
More informationCrime and Fire Statistics
Crime and Fire Statistics Monmouth University Police Department Crime Statistics Murder Negligent Manslaughter Forcible Sex Offenses Rape Criminal Sexual Contact Non-Forced Sex Offenses Incest Statutory
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationCrime Forecasting Using Data Mining Techniques
Crime Forecasting Using Data Mining Techniques Chung-Hsien Yu 1, Max W. Ward 1, Melissa Morabito 2, and Wei Ding 1 1 Department of Computer Science, 2 Department of Sociology, University of Massachusetts
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationWhat s Cooking? Predicting Cuisines from Recipe Ingredients
What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationCombing Open-Source Programming Languages with GIS for Spatial Data Science. Maja Kalinic Master s Thesis
Combing Open-Source Programming Languages with GIS for Spatial Data Science Maja Kalinic Master s Thesis International Master of Science in Cartography 14.09.2017 Outline Introduction and Motivation Research
More informationDecision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology
Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:
More informationTackling the Poor Assumptions of Naive Bayes Text Classifiers
Tackling the Poor Assumptions of Naive Bayes Text Classifiers Jason Rennie MIT Computer Science and Artificial Intelligence Laboratory jrennie@ai.mit.edu Joint work with Lawrence Shih, Jaime Teevan and
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More information