DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Similar documents
DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Regression with Numerical Optimization. Logistic

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Bayesian Machine Learning

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Machine Learning Practice Page 2 of 2 10/28/13

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Naïve Bayes classification

Least Squares Regression

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Logistic Regression. Machine Learning Fall 2018

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Generative v. Discriminative classifiers Intuition

Introduction to Machine Learning

ECE521 week 3: 23/26 January 2017

Probabilistic & Unsupervised Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Bayesian Linear Regression. Sargur Srihari

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Generative v. Discriminative classifiers Intuition

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning (CS 567) Lecture 5

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Linear discriminant functions

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Bayesian Learning (II)

Overfitting, Bias / Variance Analysis

Midterm, Fall 2003

CMU-Q Lecture 24:

Nonparametric Bayesian Methods (Gaussian Processes)

10-701/ Machine Learning - Midterm Exam, Fall 2010

Linear Regression (continued)

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.

Introduction to Logistic Regression

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Machine Learning

Logistic Regression. COMP 527 Danushka Bollegala

Introduction to Machine Learning

An Introduction to Statistical and Probabilistic Linear Models

Linear Regression (9/11/13)

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Machine Learning 2017

Ch 4. Linear Models for Classification

Neural Network Training

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Linear & nonlinear classifiers

HOMEWORK #4: LOGISTIC REGRESSION

Logistic Regression. Seungjin Choi

Generative Learning algorithms

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Least Squares Regression

Multivariate Bayesian Linear Regression MLAI Lecture 11

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Machine Learning, Fall 2012 Homework 2

CSCI567 Machine Learning (Fall 2014)

Bias-Variance Tradeoff

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Linear Models for Classification

Linear Models for Regression CS534

Machine Learning

Linear Models for Regression CS534

INTRODUCTION TO PATTERN RECOGNITION

Introduction to Gaussian Processes

10-701/ Machine Learning, Fall

Linear Models for Regression

Machine Learning Lecture 7

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Introduction to Machine Learning

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Machine Learning

Logistic Regression & Neural Networks

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Generative Models for Discrete Data

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

ECE 5984: Introduction to Machine Learning

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Chris Bishop s PRML Ch. 8: Graphical Models

Perception: objects in the environment

Transcription:

Data Provided: None DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 204 205 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE hour Please note that the rubric of this paper is made different from many other papers. Section A consists of THREE questions (Questions -3) and 40 marks in total. Section B consists of TWO questions (Questions 4, 5) worth 60 marks each. Answer ALL Questions -3 in Section A, and ONE Question from Section B (either Question 4 or Question 5). Figures in square brackets indicate the percentage of available marks allocated to each part of a question. TURN OVER

SECTION A Answer all three questions in this section.. Multiple Choice Questions: choose EXACTLY ONE answer to each part NoseyParker.com Ltd is an online retailer that records the music, film, games and books its users have bought. It uses this information to build a profile for each user, characterising their personality and interests. Wizard Inc sells a massive multiplayer online role-playing game, WasteOfTimecraft (WoT). It wishes to sell these downloads through NoseyParker.com. NoseyParker.com wants to design a machine learning algorithm to target adverts at users who are most likely to buy its services. NoseyParker.com has data about users who have bought downloads in the past. Each set of items that a user has bought in the past is in the form of a binary vector, x. The elements of this vector are associated with items (either music, film or books). The entry in the vector contains a if a given user has bought a given item. a) For its historic data, NoseyParker.com then has a binary value, y which contains a if a user downloaded the WasteOfTimecraft game, and a zero if the user didn t download the game. NoseyParker.com decides to try to use logistic regression to learn a set of classification weights, w, for the data. Which of the following describes π(x), the conditional probability of the positive class given w and x? log(+exp( w x)) +exp( w x) (iii) exp( (y w x) 2 ) (iv) w wexp( w x) b) NoseyParker.com decides to perform stochastic gradient descent to optimize the logistic regressor. If π(x) is the conditional probability for class one being correct, and η is the learning rate, and the observed class is positive, which of the following gives the correct update for the weights w? (iii) (iv) w w η( π(x))x w w+ηπ(x)x w w+η( π(x))x w w ηπ(x)x 2 CONTINUED

c) The logistic regression system is fully trained. User Simon3477 is online and has a historical purchasing history which is represented by the following vector x= [ 0 0 0 ]. Which of the following training results would cause NoseyParker.com to predict with probability greater than 0.5 that Simon3477 will download the game: w= [ 4 ] w= [ 2 2 ] (iii) w= [ 2 4 ] (iv) w= [ 4 3 3 2 ] d) The inner product between the feature vector and the weight matrix aims to approximate: (iii) (iv) The logarithm of the odds ratio between the positive and negative classes. The probability of the positive class being correct. The mean of the Gaussian random variable which is most likely to generate the given class conditional density. The logarithm of the probability of the positive class. e) NoseyParker.com s data scientist suggests naïve Bayes as an alternative approach to modelling. Which one of the following is not true of the naïve Bayes classifier? (iii) (iv) The features are conditionally independent given the class. The features are distributed according to a multivariate Gaussian. The data is conditionally independent given the model parameters. The class labels are discrete values. 3 TURN OVER

2. In the answer booklet, state whether each of the following mathematical equalities is True or False. a) p(x y) = p(y x)p(y) b) p(y)= p(y x)p(x) p(x y) c) p(y x)= p(x,y) p(x y)p(x) d) e) p(y) p(x) = p(y x) p(x y) p(y, x) p(x) = p(y x) p(y) 4 CONTINUED

3. Consider the following five equalities: () y=x diag(x)w (2) y=x w (3) y=diag(x)w (4) Y=xw (5) y=w xx w Find the correct equality that matches each of the following pieces of python code. Assume that the mathematical operator diag(z) forms a diagonal matrix with diagonal elements given by elements of z and that in python the numpy library as been imported as np and we are givenxandwas one dimensional numpy arrays. a) y = np.sum(x*w) b) y = x*w c) y = np.outer(x,w) d) y = np.sum(w*x**2) e) y = np.sum(x*w)**2 5 TURN OVER

SECTION B Answer EITHER Question 4 OR Question 5 in this section. 4. This question deals with Bayesian approaches to machine learning problems. a) What is the difference between epistemic and aleatoric uncertainty? [0%] b) In a regression problem we are given a vector of real-valued targets, y, consisting of n observations y...y n which are associated with multidimensional inputs x...x n. We assume a linear relationship between y i and x i where the data are corrupted by independent Gaussian noise giving a likelihood function of the form ( p(y X,w,σ 2 )= (2πσ 2 exp ) n/2 2σ 2 n i= (y i w x i ) 2 ). Consider the following Gaussian prior density for the k dimensional vector of parameters, w, ( p(w)= exp ) (2πα) 2 k 2α w w Show that the covariance of the posterior density for w is given by C w = [ σ 2 X X+ ] α I, where X is a design matrix of the input data. [35%] Show that the mean of the posterior density for w is given by µ w = C w σ 2 X y. [5%] 6 CONTINUED

5. This question concerns regression and maximum likelihood fits of regression models with basis functions. a) The polynomial basis with degree d computed for a one dimensional input has the form φ(x i )= [ ]. x i x 2 i x 3 i... x d i Give a disadvantage of the polynomial basis. Suggest a potential solution for this disadvantage and propose an alternative basis. [20%] b) The likelihood of a single data point in a regression model is given by, p(y i w,x i,σ 2 )= ( exp (y i φ(x i ) w) 2 ) 2πσ 2 2σ 2. Assuming that each data point is independent and identically distributed, derive a suitable objective function that should be minimized to recover w and σ 2. Explain your reasoning at each step. [5%] c) Find the stationary point of this objective function where it is minimized with respect to the vector w. [25%] END OF QUESTION PAPER 7