Chicago Crime Category Classification (CCCC)

Size: px
Start display at page:

Download "Chicago Crime Category Classification (CCCC)"

Transcription

1 Chicago Crime Category Classification (CCCC) Sahil Agarwal 1, Shalini Kedlaya 2 and Ujjwal Gulecha 3 Abstract It is very important and crucial to keep a city safe from crime. If the police could be given information regarding the types of crime that occur over time and in different locations, they could be better equipped to fight crime. We use Chicago crime data to predict the category of crime based on date, time and some geographically relevant features such as latitude and longitude. We tried various models like Multinomial Naive Bayes, Decision Trees, Multinomial Logistic Regression and Random Forest Classifier to achieve this classification I. INTRODUCTION We use various statistical classification models to predict the category of a crime based on the date, time and few geographically relevant features. We used the Chicago crime dataset which we obtained from Kaggle [1]. Due to limitations of computational capacity we restricted our analysis and modeling to data for 2015 and The following sections describe the dataset, analysis of data, various models and results. A. Information about Data II. THE DATASET The dataset consists of information about crimes that occurred in Chicago for There are roughly 0.5 million data points. Each data point consists of the following fields as shown in Table I There are 21 different fields for each data point. B. Data representation Our goal is to predict the Primary Type of crime given a set of features. Description, IUCR and FBI code are directly indicative of the primary type and cannot be used as features. ID and Case Number are unique to each crime incident and hence add no value in prediction. Arrest and Updated On features are determined only after the crimes are committed and are not available at the time of prediction. Location, latitude and longitude correspond to X,Y coordinates as defined for Chicago. Field Description ID Unique identifier for the record Case Number The Chicago Police Department RD Number Date Date when the incident occurred in mm/dd/yyyy. Block The partially redacted address where the incident occurred. IUCR The Illinois Uniform Crime Reporting code Primary Type The primary description of the IUCR code. Description The secondary description of the IUCR code Location Description Description of the location where the incident occurred. Arrest Indicates whether an arrest was made. Domestic Indicates whether the incident was domestic-related Beat Indicates the beat (the smallest police geographic area) where the incident occurred. District Indicates the police district where the incident occurred. Ward The ward (City Council district) where the incident occurred. Community Area Indicates the community area where the incident occurred. Chicago has 77 community areas. FBI Code Indicates the crime classification as outlined in the FBI s National Incident-Based Reporting System (NIBRS). X Coordinate The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. Y Coordinate The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. Year Year the incident occurred. Updated On Date and time the record was last updated. Latitude The latitude of the location where the incident occurred. Longitude The longitude of the location where the incident occurred. TABLE I: Fields and Descriptions 1 saa034@eng.ucsd.edu [A ] 2 skedlaya@eng.ucsd.edu [A ] 3 ugulecha@eng.ucsd.edu [A ]

2 Primary Category Occurrences Theft Battery Criminal Damage Narcotics Assault Other Offense Deceptive Practice Burglary Robbery Motor Vehicle Theft Criminal Trespass Weapons Violation Offense Involving Children Public Peace Violation Crim Sexual Assault Total Percentage total crime 22.42% 18.76% 18.76% 6.86% 6.76% 6.54% 6.18% 5.19% 4.09% 4.06% 2.4% 1.29% 0.84% 0.76% 0.53% of Fig. 1: Total crime, Thefts, Battery and Narcotics cases for every month TABLE II: Different crimes and their frequency C. Data Analysis We performed some basic data analysis to understand trends in the data. Theft is the most common type of crime comprising 22.42% of all crime, followed by battery at 18.76%, criminal damage at 18.76%, narcotics at 6.86% and assault at 6.76%. The occurrences of 15 most common categories of crimes over the five years is shown in Table II. Figures 1 and 2 show this variation of total crime and different representative categories over different months of the year and different hours in a day respectively. These seem to follow common sense patterns of reasoning. Total crime in Chicago dips during the winter months because even criminals feel cold. Total crime is least in the period 5 am - 6 am because the bad elements of society also need their sleep. The category of crime is not influenced significantly by the month and we predicted that the month will not play an important role in category prediction. However, it is influenced by the time of day: for example, battery occurring more frequently than theft in the late hours of the day as shown in Figure 2. Geographical location also influences the frequency and nature of crime. Shown in Figure 3 is the heat map of crime in Chicago in the year Heat maps for thefts and narcotics are shown in Figure 4. The red concentration in the narcotics-crimes map is in the infamous Far West Side of Chicago. More cases of theft occur in the affluent neighborhoods or shopping districts (the red concentration in Figure 4a is the Near North, a prime shopping and dining area). Fig. 2: Total crime, Thefts, Battery and Criminal damage cases for every hour Fig. 3: Heat map of crime in Chicago

3 time (like a month) and asked to do predict a category we can t do better than 44%. This became our Holy Grail for category prediction. For comparison we also predicted the top two and top three categories. If any of them match the actual value we consider that a success. For top 3 our initial experiments gave an accuracy of 62% which was quite promising. (a) Thefts (b) Narcotics Fig. 4: Heat maps of different types of crimes III. P REDICTIVE TASK AND I MPLEMENTATION DETAILS Our task is to predict the primary crime type based on the features we have described in the previous sections. A. Dataset Later in our experimental stage, our poor laptops were unable to handle the > 1.4 million data points from the Kaggle dataset. So we decided to restrict ourselves to crime date from 2015 and This had around 0.5 million samples which is a good number and is fairly recent therefore more accurate data. We did a 60:20:20 random split for training, validation and testing respectively. B. Performance evaluation and Baseline We chose a very natural performance measure for multi-class classification: accuracy of predictions vs the actual category (y value). To check how well our models did, we developed a baseline model to judge against. For the baseline, we predicted the most common crime category in the training set (Theft) for every data point in the test set. We obtained an accuracy of 22% as expected. C. Theorizing an upper bound on accuracy We conducted a few initial experiments using Multinomial Naive Bayes classifier with simple feature selection. We improved on our baseline performance by few percentage points but started peaking around 28%. We then attempted to find out the reason behind this. We looked at crime categories for every hour of every month in every police beat (274 of them). No single category dominated by far. We looped over all beats, months and hours and found the average domination by a single crime catergory is 44%. So if we are given a location, time of day and a period of D. Pre-processing date for feature extraction Since all the features were categorical, we had to use one-hot encoding to represent those features. We also removed the data points which had no locations given. There were less than 100 such samples. The below features had to be pre-processed to use them in our experiments: Date : Month was extracted from day to verify that it does not add value to category prediction as shown in Figure 1 Time : hour of the day is indicative of certain kinds of crimes as indicated in in Figure 2. Location is a major factor influencing the type of crime. Figures 4b and 4a showed us the localization of different crimes and we thus used various methods for determining the location of our sample: 1) Beat unique police beats defined by the Chicago police. 2) x-y coordinate grid - location represented by x-y coordinates were divided into a 9x9 grid. 3) K-means - K-means clustering (Number of clusters = 25) was done on the x-y coordinates and the cluster centers were used as a feature. 4) Block unique block names 5) Type of location types of locations 6) Community Area - 77 unique community areas 7) Ward - 50 unique wards 8) District - 23 unique districts Domestic crime: True/False IV. C LASSIFICATION M ODELS AND THEIR P ERFORMANCE We evaluated several models such as Multinomial Naive Bayes, Decision trees and Random Forest classifier. Other models such as Multinomial Logistic regression and Multiclass SVM Classifier were unsuitable for this task and are described in section D.

4 All models were evaluated against the same baseline that predicts THEFT as the most common crime and gives an accuracy of 22%. The performance of our classifiers on the training and validation sets are given in Tables III, IV and V. We are evaluating the accuracy of our top prediction, top 2 predictions and top 3 predictions written as top1/top2/top3 A. Multinomial Naive Bayes Classifier The Multinomial Naive Bayes classifier is a simple probabilistic classifier which is based on Bayes theorem with a strong assumption that all features are conditionally independent. The Multinomial Naive Bayes model is computationally less intensive and this worked well for us with our limited CPU power and memory. The shorter training time meant that we could evaluate multiple combinations of features. Our data set had multiple representations for location. Taking only a limited number of features for location, and combining them with other features such as time and domestic crime, we could achieve a model that was more or less conditionally independent. Our experimental results also show that these features which had the best assumption for conditional independence performed the best for the Multinomial Bayes Model. Table III lists the features that were used for selecting the features to use. 10 different experiments were tried. The train and validation accuracies are listed in the table. B. Decision Tree Classifier A decision tree is built using the train dataset considering all features. Decision tree models are robust to noisy data and are capable of learning expressions that lack connection. This is very suitable to our dataset which has features like time, location, domestic crime which do not have a direct connection with one another. Our dataset is also noisy as the time is approximated in cases where the exact time is unknown and we had to remove several data points for which information about the location was missing. Decision trees can also mirror human decision making better than other approaches. A decision tree can take many hyper - parameters. We initially performed our experiments with the parameters: [max depth=50, min samples split=30, min samples leaf=20] which gave us an accuracy of 38.04%. We then performed grid search and then found the best parameters to be: [max depth=150,min samples split=70, min samples leaf=40]. This improved the accuracy to 38.49%. Table V lists the features that were used with the two models that we described above selecting the features to use. 5 different experiments were conducted on each model. The train and validation accuracies are listed in the table. C. Random Forest Classifier A random forest produces a large number of decision trees. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Categorical variables also increase the computational complexity to create trees. The same features of the dataset that help decision trees also help random forest. We initially performed our experiments with the parameters: [n estimators = 70, min samples split = 30, bootstrap = True, max depth = 50, min samples leaf = 25] which gave us an accuracy of 35.7%. We then performed grid search and then found the best parameters to be: [n estimators = 150, min samples split = 60, bootstrap = True, max depth = 70, min samples leaf = 45]. This improved the accuracy to 35.9%. While we tried to optimize the parameters for random forest, we still observed overfitting. This is validated by the train data having a high accuracy of 38.92%, but the test data performing poorly. Table IV lists the features that were used with the two models that we described above selecting the features to use. 5 different experiments were conducted on each model. The train and validation accuracies are listed in the table. D. Unsuitable Models While Multinomial Logistic Regression works well when features are categorical, it is very expensive to train for a data set that has a large number of classes. Given this and the size of our training data, it was not practical to include it in our experiments. We faced a similar problem while using the multi-class SVM classifier as it failed to fit the model in a reasonable time frame because of the large number of samples. The Gaussian Naive Bayes model is suitable for continuous data that has Gaussian Distribution. Since our features do not follow this distribution, the Gaussian Naive Bayes Model performed horribly, giving an accuracy of 1.6%. V. RELATED LITERATURE We used the dataset from Kaggle [1]. This dataset was not used in a Kaggle competition, rather it was one of the datasets Kaggle has in their datasets collection. A similar dataset that has been analyzed and studied

5 a lot is the San Francisco Crime Dataset [2]. The San Francisco Crime Dataset had features very similar to our dataset. We read some previous year submissions for CSE 255 and submissions on Kaggle. Various models like Multinomial Naive Bayes, Decision Trees, Random Forest Regressors were used. We took inspiration from reading these papers to represent our geographical features in a grid based system and also use k-means. We found a book on predictive policing with descriptions of various models used for predictive analytics pertaining to crime analysis. We have been using this information as inspiration to design our own models used in this report [3]. The conclusions that we could draw were that it is hard to predict one category of crime given date, time and location and this was independent of the dataset, i.e San Francisco Dataset or Chicago Dataset. High Kaggle ranks for San Francisco had accuracies of 23% [4]. This just shows that the hypothesis we have that given date, time and location, you cannot have a good accuracy by predicting one label is true. VI. RESULTS AND CONCLUSION A. Performance of models For Multinomial Naive Bayes, in Table III, row indexes 1,2 and 4, the accuracies on the test data set were: 34.66/50.82/61.76%, 37.52/54.93/66.55% and 37.33/54.56/66.59% respectively. For Decision Trees, in Table V, row indexes 1,2 and 4, the accuracies on the test data set were: 37.11/53.21/64.90%, 32.1/46.11/57.05% 37.96/54.61/66.10% respectively. For Decision Trees, in Table V, row indexes 6,7 and 9, the accuracies on the test data set were: 38.04/54.39/66.12%, 37.85/54.21/66.10% and 38.30/55.22/66.98% respectively. For Random Forest Classifier, in Table IV, row indexes 1,2 and 4, the accuracies on the test data set were: 35.55/52.30/63.55%, 35.50/52.33/64.34% 35.20/51.12/62.66% respectively. For Random Forest Classifiers, in Table IV, row indexes 6,7 and 9, the accuracies on the test data set were: 35.44/52.22/63.34%, 35.23/51.89/64.05% 35.65/51.63/63.77% respectively. Decision Tree learning model with max depth of 150, min samples split of 70 and min samples leaf of 40 (shown in red in Table V) performed the best amongst all models we considered. We think that the classifier does better with more depth compared to previous model as it is able to have longer root to leaf paths which account for the different feature values to get a classification. The model did not over-fit with this depth as it was significantly lesser than the total number of features at any experiment. Also, we think that having a bigger min samples split and min samples leaf helped in better classification of the major crimes in the dataset. From existing documentation, we had expected the random forest classifier to perform better. Having a large number of categorical features that are encoded as one hot may cause it to find patterns in the train data that do not exist in the test data. B. Interpretation of features Adding the day of the month as a feature changed the accuracy by % for different combinations of features. From this we gather that days do not add any useful information to the predictive task and a slight variation in the accuracy may be from random shuffling of the data. We do not expect crime to vary depending on the day of the month. From our data analysis (Fig 1) we saw that while the total number of crimes changes across months, the category of crime is not influenced by the month. we also saw that the category of crime changes depending on the hour of the day (Fig 2). We validated this through our experiments. Adding month as a feature only reduced the accuracy and adding hour increased it. Of all the features to represent location, adding beat provided the most information. Beats are geographical areas defined by the police and we think they have divided area according to category and concentration of crimes. Other location features that we generated such as the grid, KMeans clusters, block and type of location improved the accuracy marginally when used together. They only provided little additional information not already covered by beat. Other location features such as ward, district and community area reduced the accuracy. This could be because adding additional categorical features that categorize location in different ways over-complicates the model and violates Occam s Razor. Conclusion (TLDR) Our best model was the Decision Tree classifier and the best feature representation was one that included a few or one element each representing time, geographic location, type of location and whether it is domestic or not (going with Occam s Razor).

6 The accuracy for top1/top2/top3 category predictions was 38.30/55.22/66.98%. We conclude that it is hard to predict the category of crime given date, time and location and this mirrors the conclusion drawn in the work on San Francisco Dataset. This further reinforces our hypothesis of not getting better accuracy than 44%. REFERENCES [1] Crimes in chicago. currie32/crimes-in-chicago. Accessed: [2] San francisco crime classification. com/c/sf-crime. Accessed: [3] Brian McInnis Perry Walter L. and John S. Hollywood. Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations. Santa Monica, CA: RAND Corporation, [4] Silvia Chyou Shen Ting Ang, Weichen Wang. San francisco crime classification. CSE 255 Fall 2015, 2015.

7 Index Features Performance[In Percentage] Training set validation set 1 month + days + hour + block + location + domestic + beat + district + ward /50.89/ /50.25/ month + days + hour + block + location + beat + district + ward + community area 31.64/47.89/ /47.56/ grid(for + kmeans(for 3 month + hour + block + location + domestic + beat + district + ward + community area 34.64/50.49/ /50.23/ grid(for + kmeans(for 4 month + hour + block + location + domestic + beat + kmeans(for 37.62/55.09/ /54.65/ month + hour + location + domestic + beat + district + ward + community area /50.01/ /49.61/61.16 grid(for + kmeans(for 6 month + hour + days+ location + domestic + beat + district + ward + community area 34.44/49.99/ /49.54/ grid(for + kmeans(for 7 month + hour + block + location + domestic + beat 38.01/ 55.33/ /54.96/ month + days + hour + block + location + domestic + beat + kmeans(for xy 37.77/ 55.01/ /54.67/66.63 coordinates) 9 hour + block + location + domestic + beat + grid(for + kmeans(for 37.66/54.88/ /54.47/ month + hour + block + weekday + location + domestic + beat + kmeans(for xy coordinates) 34.66/ 50.55/ /50.25/61.86 TABLE III: Results of experiments using Multinomial Naive Bayes classifier

8 Index Features Performance[In Percentage] Training set validation set n estimators = 70, min samples split = 30, bootstrap = True, max depth = 50, min samples leaf = 25 1 month + days + hour + block + location + domestic + beat + district + ward /52.55/ /52.31/ month + hour + block + location + domestic + beat 35.85/52.65/ /52.54/ month + hour + block + location + domestic + beat + kmeans(for 35.38/51.27/ /50.95/ hour + block + location + domestic + beat + grid(for + kmeans(for 35.60/51.55/ /51.17/ month + hour + block + location + domestic + beat + kmeans(for 35.65/ 50.91/ /50.78/63.04 n estimators = 150, min samples split = 60, bootstrap = True, max depth = 70, min samples leaf = 45 6 month + days + hour + block + location + domestic + beat + district + ward /52.58/ /52.36/ month + hour + block + location + domestic + beat 35.70/52.57/ /52.35/ month + hour + block + location + domestic + beat + kmeans(for 35.81/52.57/ /52.22/ hour + block + location + domestic + beat + grid(for + kmeans(for 35.90/52.47/ /52.16/ month + hour + block + location + domestic + beat + kmeans(for 35.72/52.57/ /51.57/63.43 TABLE IV: Results of experiments using Random Forest classifier

9 Index Features max depth=50,min samples split=30, min samples leaf=20 1 month + days + hour + block + location + domestic + beat + district + ward + Performance[In Percentage] Training set validation set 39.5/57.2/ /53.88/ month + hour + block + location + domestic + beat 35.4/49.4/ /46.30/ month + hour + block + location + domestic + beat + kmeans(for 35/48.4/ /45.08/ hour + block + location + domestic + beat + grid(for + kmeans(for 40.82/59.37/ /54.74/ month + hour + block + location + domestic + beat + kmeans(for 39.59/59.33/ /53.91/65.25 max depth=150,min samples split=70, min samples leaf=40 6 month + days + hour + block + location + domestic + beat + district + ward /56.52/ /54.97/ month + hour + block + location + domestic + beat 40.01/57.94/ /54.87/ month + hour + block + location + domestic + beat + kmeans(for 39.53/57.51/ /54.98/ hour + block + location + domestic + beat + grid(for + kmeans(for 40.05/ 58.22/ /55.34/ month + hour + block + location + domestic + beat + kmeans(for 39.98/57.94/ /54.98/66.65 TABLE V: Results of experiments using Decision Tree classifier

Area-Specific Crime Prediction Models

Area-Specific Crime Prediction Models Area-Specific Crime Prediction Models Mohammad Al Boni and Matthew S. Gerber Department of Systems and Information Engineering, University of Virginia, Charlottesville, Virginia, USA Email: {ma2sm, msg8u}@virginia.edu

More information

Caesar s Taxi Prediction Services

Caesar s Taxi Prediction Services 1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi

More information

Crime Prediction using Businesses and Housing Values in San Francisco

Crime Prediction using Businesses and Housing Values in San Francisco Crime Prediction using Businesses and Housing Values in San Francisco James Jung Lee, Joel Kek, Yik Lun Lee Introduction Predictive policing is the idea of using technology and data analytics to proactively

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last

More information

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using

More information

Major Crime Map Help Documentation

Major Crime Map Help Documentation Major Crime Map Help Documentation This web application is designed to make it easier to visualize and understand crime trends in Overland Park. The data for this application are generally limited to relatively

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Forecasting Model for Criminality in Barangay Commonwealth, Quezon City, Philippines using Data Mining Techniques

Forecasting Model for Criminality in Barangay Commonwealth, Quezon City, Philippines using Data Mining Techniques Forecasting Model for Criminality in Barangay Commonwealth, Quezon City, Philippines using Data Mining Techniques Bertiz Armenico R, Vinluan Albert A, Laureta Marc P, Paltad Anthony B and Marques Westley

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

UC POLICE DEPARTMENT REPORTS DASHBOARD

UC POLICE DEPARTMENT REPORTS DASHBOARD UC POLICE DEPARTMENT REPORTS DASHBOARD UC MERCED Annual 1. UC Merced FBI Part I Crime Offenses 2 2. UC Merced FBI Part II Crime Offenses 3 3. UC Merced Arrests - FBI Crime Offenses 4 4. UC Merced Value

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis Decision-Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) [read Chapter 3] [some of Chapter 2 might help ] [recommended exercises 3.1, 3.2] Decision tree representation

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class

More information

Chapter 3: Decision Tree Learning

Chapter 3: Decision Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) Administration Books? New web page: http://www.cs.rutgers.edu/~mlittman/courses/ml03/ schedule lecture notes assignment info.

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Machine Learning & Data Mining

Machine Learning & Data Mining Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Predicting flight on-time performance

Predicting flight on-time performance 1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.

More information

- Geography is important when studying crime because when a crime happens it always has a location, a time and a reason.

- Geography is important when studying crime because when a crime happens it always has a location, a time and a reason. - Geography is important when studying crime because when a crime happens it always has a location, a time and a reason. - Crime figures are collected from the number of crime incidents reported to the

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20. 10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the

More information

Classification II: Decision Trees and SVMs

Classification II: Decision Trees and SVMs Classification II: Decision Trees and SVMs Digging into Data: Jordan Boyd-Graber February 25, 2013 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Digging into Data: Jordan Boyd-Graber ()

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Chapter 3: Decision Tree Learning (part 2)

Chapter 3: Decision Tree Learning (part 2) Chapter 3: Decision Tree Learning (part 2) CS 536: Machine Learning Littman (Wu, TA) Administration Books? Two on reserve in the math library. icml-03: instructional Conference on Machine Learning mailing

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Decision Tree Learning

Decision Tree Learning 0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

U.C. Davis FBI Part I & Part II Crime Offenses 2008 to 2010

U.C. Davis FBI Part I & Part II Crime Offenses 2008 to 2010 U.C. Davis FBI Part I & Part II Crime Offenses 2008 to 2010 2010 PART I OFFENSES 2008 Number of Actual Offenses 2009 Number of Actual Offenses 2010 Number of Actual Offenses 2009 to 2010 Percent Change

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

FBI Part I & Part II Crime Offenses Arrests Miscellaneous Activity Value of Stolen Property Crime Pie Charts Crime Line Charts Crime Rate Charts

FBI Part I & Part II Crime Offenses Arrests Miscellaneous Activity Value of Stolen Property Crime Pie Charts Crime Line Charts Crime Rate Charts U.C. Davis Medical Center Crime Statistics (Medical Center) PDF Version FBI Part I & Part II Crime Offenses Arrests Miscellaneous Activity Value of Stolen Property Crime Pie Charts Crime Line Charts Crime

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Introduction: Name Redacted November 17, 2017 GSP 370 Research Proposal and Literature Review

Introduction: Name Redacted November 17, 2017 GSP 370 Research Proposal and Literature Review Name Redacted November 17, 2017 GSP 370 Research Proposal and Literature Review Introduction: The study of spatial patterns and distributions has been conducted for many years. Over the last few decades,

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p) 1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon I Decision Tree Learning Part I brainlinksystem.com $25+ / hr Illah Nourbakhsh s version Chapter 8, Russell and Norvig Thanks to all past instructors Carnegie Mellon Outline Learning and philosophy Induction

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Introducing GIS analysis

Introducing GIS analysis 1 Introducing GIS analysis GIS analysis lets you see patterns and relationships in your geographic data. The results of your analysis will give you insight into a place, help you focus your actions, or

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4] 1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Classification and Prediction

Classification and Prediction Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

More information

Crime and Fire Statistics

Crime and Fire Statistics Crime and Fire Statistics Monmouth University Police Department Crime Statistics Murder Negligent Manslaughter Forcible Sex Offenses Rape Criminal Sexual Contact Non-Forced Sex Offenses Incest Statutory

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Crime Forecasting Using Data Mining Techniques

Crime Forecasting Using Data Mining Techniques Crime Forecasting Using Data Mining Techniques Chung-Hsien Yu 1, Max W. Ward 1, Melissa Morabito 2, and Wei Ding 1 1 Department of Computer Science, 2 Department of Sociology, University of Massachusetts

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

What s Cooking? Predicting Cuisines from Recipe Ingredients

What s Cooking? Predicting Cuisines from Recipe Ingredients What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

Combing Open-Source Programming Languages with GIS for Spatial Data Science. Maja Kalinic Master s Thesis

Combing Open-Source Programming Languages with GIS for Spatial Data Science. Maja Kalinic Master s Thesis Combing Open-Source Programming Languages with GIS for Spatial Data Science Maja Kalinic Master s Thesis International Master of Science in Cartography 14.09.2017 Outline Introduction and Motivation Research

More information

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

More information

Tackling the Poor Assumptions of Naive Bayes Text Classifiers

Tackling the Poor Assumptions of Naive Bayes Text Classifiers Tackling the Poor Assumptions of Naive Bayes Text Classifiers Jason Rennie MIT Computer Science and Artificial Intelligence Laboratory jrennie@ai.mit.edu Joint work with Lawrence Shih, Jaime Teevan and

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information