CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

Similar documents
Naïve Bayes classification

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

CSC411 Fall 2018 Homework 5

Assignment 1: Probabilistic Reasoning, Maximum Likelihood, Classification

Machine Learning: Assignment 1

CS6220: DATA MINING TECHNIQUES

Machine Learning, Fall 2012 Homework 2

Bayesian Methods: Naïve Bayes

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Dimensionality reduction

Bayesian Learning (II)

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

CMU-Q Lecture 24:

Notes on Machine Learning for and

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

The Naïve Bayes Classifier. Machine Learning Fall 2017

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Bayesian Linear Regression [DRAFT - In Progress]

FINAL: CS 6375 (Machine Learning) Fall 2014

Non-Parametric Bayes

Advanced Introduction to Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Introduction to Machine Learning Midterm, Tues April 8

STA 4273H: Sta-s-cal Machine Learning

Naive Bayes & Introduction to Gaussians

Learning Bayesian network : Given structure and completely observed data

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Machine Learning

Based on slides by Richard Zemel

Introduction to Machine Learning

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

MLE/MAP + Naïve Bayes

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

10-701/15-781, Machine Learning: Homework 4

CS6220: DATA MINING TECHNIQUES

Generative Models for Discrete Data

Introduction to Graphical Models

Naive Bayes classification

Lecture 5: Bayesian Network

An Introduction to Bayesian Linear Regression

Bayesian Approaches Data Mining Selected Technique

Probabilistic Classification

Probabilistic Time Series Classification

TDA231. Logistic regression

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduction to Machine Learning

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Data Mining Part 4. Prediction

Introduction to Machine Learning

Relevance Vector Machines

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Chris Piech CS109 CS109 Final Exam. Fall Quarter Dec 14 th, 2017

Be able to define the following terms and answer basic questions about them:

The Bayes classifier

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

Jeff Howbert Introduction to Machine Learning Winter

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayesian Learning. Bayesian Learning Criteria

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Previously Monte Carlo Integration

CS 6375 Machine Learning

Directed Graphical Models

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Introduction to Machine Learning

Naïve Bayesian. From Han Kamber Pei

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Machine Learning - MT Classification: Generative Models

10701/15781 Machine Learning, Spring 2007: Homework 2

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall Midterm Examination

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

Homework 6: Image Completion using Mixture of Bernoullis

LEC 5: Two Dimensional Linear Discriminant Analysis

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Models in Machine Learning

Graphical Models for Collaborative Filtering

Learning Bayesian belief networks

Homework 2 Solutions Kernel SVM and Perceptron

{ p if x = 1 1 p if x = 0

Statistical Inference: Maximum Likelihood and Bayesian Approaches

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

L11: Pattern recognition principles

Feature Engineering, Model Evaluations

Gaussian Models

Machine Learning, Fall 2009: Midterm

Generative Clustering, Topic Modeling, & Bayesian Inference

Midterm, Fall 2003

Introduction to Bayesian Learning. Machine Learning Fall 2018

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

CPSC 540: Machine Learning

Machine Learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Transcription:

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February 1 PageRank You are given in the file adjency.mat a matrix G of size n n where n = 1000 such that { 1 if outbound link from i to j, G i,j = 0 otherwise. You will have to construct the associated transition matrix { ( ) p G T i,j = i,j / j G i,j + (1 p) 1/n if j G i,j > 0 1/n otherwise where p = 0.85. 1. Implement the Power method in Matlab to compute the rank vector associated to T. Write also a Matlab script that plots log v k v k 1 as a function of the iteration index k = 1, 2,..., 100. This allows us to see how the algorithm is converging and to determine how many iterations are necessary. Hand in the code and plot. Label the axes appropriately. Give the indexes of the three Webpages having the highest ranks. 2. Using ( ) n T T k v0 = a 1 π+ a i λ k i u i i=2 1

where u 1 = π,u 2,..., u n are the orthonormal eigenvectors associated to the eigenvalues λ 1 = 1 > λ 2 λ n and the definition of v k, show that for k large enough v k v k 1 a 2 (λ 2 1) a 1 λ 2 k 1. Explain whether the plot obtained in part 1 is in agreement with this result. If it is not, provide an explanation for the possible discrepancy you have observed. Finally suggest a method to estimate λ 2 from the plot. 2 Analysis of A Naive Bayes Classifier The first question asks you to analyse the following naive Bayes model that describes the weather in an imaginary country. We have and Y {night, day}, X 1 {cold,hot}, X 2 {rain,dry} P (x 1, x 2, y) = P (y) P (x 1 y) P (x 2 y) with P (Y = day) = 0.5, P (X 1 = hot Y = day) = 0.9, P (X 1 = hot Y = night) = 0.2, P (X 2 = dry Y = day) = 0.75 and P (X 2 = dry Y = night) = 0.4. For each of the following formulae except the first, write an equation which defines it in terms of formulae that appear earlier in the list. (For example, you should give a formula for P (x 1, x 2 ) in terms of P (x 1, x 2, y)). Then given the model above, calculate and write out the value of the formula for possible each combination of values of the variables that appear in it. 1. P (x 1, x 2, y). 2. P (x 1, x 2 ). 3. P (y x 1, x 2 ). 4. P (x 1 ). 5. P (x 2 ). 6. P (x 1 x 2 ). 2

7. P (x 2 x 1 ). 8. P (x 1 x 2, y). 9. P (x 2 x 1, y). Are X 1 and X 2 conditionally independent given Y? Are X 1 and X 2 marginally independent, integrating over Y? Provide a short proof for both answers. 3 Bayes Classifier for Gaussian Data Consider the following training set of heights x (in inches) and gender y (male/female) of some college students. x 67 79 71 68 67 60 y m m m f f f 1. Fit a Bayes classifier to this data, i.e. estimate the parameters of the class conditional densities ( ) 1 p (x y = c) = exp (x µ c) 2 2πσ 2 c 2σ 2 c and the class prior p (y = c) = π c using Maximum Likelihood (ML) estimation for c =m,f. What are your ML estimate θ of θ = ( ) µ m, σ m, π m, µ f, σ f, π f? ( 2. Compute p y = m x = 72, θ ). 3. What would be a simple way to extend this technique if you had multiple attributes per person, such as height and weight? Write down your proposed model as an equation. 3

4 Maximum Likelihood The next question asks you to devise ML and Bayesian MAP estimators for a simple model of an uncalibrated sensor. Let the sensor output, X, be a random variable that ranges over the real numbers. We assume that, when tested over a range of environments, its outputs are uniformly distributed on some unknown interval (0, θ), so that { 1 for x (0, θ) p (x θ) = θ 0 otherwise. We denote this distribution by X Unif(0, θ). To characterize the sensor s sensitivity, we would like to infer θ. 1. Given N independent observations x 1:N = (x 1, x 2,..., x N ) where X i Unif(0, θ), what is the likelihood function L (θ) = p (x 1:N θ)? What is the maximum likelihood (ML) estimate of θ? Give an informal proof that your estimator is in fact the ML estimator. 2. Suppose that we place the following prior distribution on θ: p (θ) = αβ α θ α 1 I (β, ) (θ) This is known as a Pareto distribution. We denote it by θ Pareto(α, β). Plot the three prior probability densities corresponding to the following three hyperparameter choices: (α, β) = (0.1, 0.1), (α, β) = (2.0, 0.1) and (α, β) = (1.0, 1.0). 3. If θ Pareto(α, β) and we observe N independent observations X i Unif(0, θ), derive the posterior distribution p (θ x 1:N ). Is this a member of any standard family? 4. For the posterior derived in part 3, what is the corresponding MAP estimate of θ? How does this compare to the ML estimate? ( 5. Recall that the quadratic loss is defined as L θ, θ ) ( θ) 2. = θ For the posterior derived in part 3, what estimator θ of θ minimizes the posterior expected quadratic loss? Simplify your answer as much as possible. 4

6. Suppose that we observe three observations x 1:3 = (0.7, 1.3, 1.9). Determine the posterior distribution of θ for each of the priors (α, β) in part 2, and plot the corresponding posterior densities. 5 Naive Bayes Classifier Implementation The Nursery database records a series of admission decisions to a nursery in Ljubljana, Slovenia. These data were downloaded from http://archive.ics.uci.edu/ml/datasets/nursery. The database contains one tuple for each admission decision. The features or attributes include financial status of the parents, the number of other children in the house, etc. See http://archive.ics.uci.edu/ml/machine-learning-databases/nursery/nursery.names for more information about these features. The first three tuples in the dataset are as follows: usual,proper,complete,1,convenient,convenient,nonprob,recommended,recommend usual,proper,complete,1,convenient,convenient,nonprob,priority,priority usual,proper,complete,1,convenient,convenient,nonprob,not_recom,not_recom where the first 8 values are features or attributes and the 9th value is the class assigned (i.e., the admission decision recommendation). Your job is to build a Naive Bayes classifier that will make admission recommendations. The data have been prepared for you by Marcos and can be downloaded from the course Webpage. All of the symbols have been replaced with identifying integers. The first three rows of this matrix are: >> load nursery.mat; >> nursery(1:3,:) ans = 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 4 1 1 1 1 1 1 1 3 1 You should divide this data into equal-sized training and testing data sets as follows (the reset ensures that we will ll all use the same training/test split). load nursery.mat; reset(randstream.getdefaultstream) nursery = nursery(randperm(size(nursery,1)),:); train = nursery(1:size(nursery,1)/2,:); 5

test = nursery(size(nursery,1)/2+1:end,:); 1. Estimate a Naive Bayes model using Maximum Likelihood (ML) on train and use it to predict the y labels of the test data. To model the class conditional distribution of each feature x i (i = 1,..., 8) and the class distribution, use multinomial/multinoulli distributions. Report the accuracy of your classifier, and submit your code. 2. Modify the Nursery data by duplicating the last attribute 20 more times. You can do this by executing nursery = [nursery(:,1:end-1),repmat(nursery(:,end-1),1,20),nursery(:,end)]; before splitting into train and test. Then run your ML Naive Bayes estimator on the new training data, and evaluate it on the new testing data. Explain in words why you see the change in accuracy that you observe. 3. Using the original data and the model you estimated in part 1, calculate the (joint) log-likelihood i log P (xi, y i ) of the training data. Using this model, is it possible to calculate the (joint) log likelihood of the test data? Explain your answer. 4. Now use Bayesian MAP estimators using Dirichlet priors having all their hyperparameters set to α k = 2 to estimate all of the multinomial/multinoulli distributions parameters in your Naive Bayes model from the original training data. What accuracy do you obtain on the test data using this model? Calculate the (joint) likelihood of the training data under the MAP model. Is it higher or lower than the likelihood of the training data under the ML model? Explain your answer. Now calculate the likelihood of the test data under the MAP model. Explain why you don t run into the same problems. 6