Assignment 1 (Sol.) Introduction to Machine Learning Prof. B. Ravindran. 1. Which of the following tasks can be best solved using Clustering.

Similar documents
Class 8 Review Problems solutions, 18.05, Spring 2014

Raquel Prado. Name: Department of Applied Mathematics and Statistics AMS-131. Spring 2010

Math Key Homework 3 (Chapter 4)

Some Concepts of Probability (Review) Volker Tresp Summer 2018

STAT 516 Midterm Exam 3 Friday, April 18, 2008

Midterm Exam 1 Solution

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Homework 5 Solutions

1 Basic continuous random variable problems

STAT 516 Midterm Exam 2 Friday, March 7, 2008

STAT 414: Introduction to Probability Theory

M378K In-Class Assignment #1

Introduction to Machine Learning

Machine Learning: Homework Assignment 2 Solutions

EECS 126 Probability and Random Processes University of California, Berkeley: Spring 2015 Abhay Parekh February 17, 2015.

Notes for Math 324, Part 17

Basics on Probability. Jingrui He 09/11/2007

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Joint Probability Distributions, Correlations

Review of Probability Theory

STAT 515 MIDTERM 2 EXAM November 14, 2018

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Exercises in Probability Theory Paul Jung MA 485/585-1C Fall 2015 based on material of Nikolai Chernov

1. Let X be a random variable with probability density function. 1 x < f(x) = 0 otherwise

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Homework 9 (due November 24, 2009)

Homework 4 Solution, due July 23

Math 510 midterm 3 answers

ENGG2430A-Homework 2

Problem 1. Problem 2. Problem 3. Problem 4

Econ 325: Introduction to Empirical Economics

Machine Learning 4. week

Chapter 2. Probability

Be able to define the following terms and answer basic questions about them:

Review of Probability. CS1538: Introduction to Simulations

STAT 418: Probability and Stochastic Processes

Week 10 Worksheet. Math 4653, Section 001 Elementary Probability Fall Ice Breaker Question: Do you prefer waffles or pancakes?

1 Basic continuous random variable problems

Be able to define the following terms and answer basic questions about them:

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Class 8 Review Problems 18.05, Spring 2014

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

Introduction to Probability, Fall 2013

Lecture 1: Basics of Probability

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Multivariate Distributions CIVL 7012/8012

L2: Review of probability and statistics

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Recitation 2: Probability

Problem Set #5. Econ 103. Solution: By the complement rule p(0) = 1 p q. q, 1 x 0 < 0 1 p, 0 x 0 < 1. Solution: E[X] = 1 q + 0 (1 p q) + p 1 = p q

Math 426: Probability MWF 1pm, Gasson 310 Exam 3 SOLUTIONS

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

ECSE B Solutions to Assignment 8 Fall 2008

CH5 CH6(Sections 1 through 5) Homework Problems

MAS223 Statistical Inference and Modelling Exercises

Exam 1 Solutions. Problem Points Score Total 145

Joint Probability Distributions, Correlations

Machine Learning. Instructor: Pranjal Awasthi

Notes for Math 324, Part 19

CS280, Spring 2004: Final

18.440: Lecture 19 Normal random variables

MATH/STAT 3360, Probability

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Math/Stat 394 Homework 5

MTH135/STA104: Probability

Introduction to Machine Learning

1.1 Review of Probability Theory

PROBABILITY THEORY REVIEW

Joint Distribution of Two or More Random Variables

Lecture 2: Repetition of probability theory and statistics

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Mathematics of Finance Problem Set 1 Solutions

Multivariate probability distributions and linear regression

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Bivariate Distributions

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Problem #1 #2 #3 #4 Total Points /5 /7 /8 /4 /24

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

CME 106: Review Probability theory

STAT 430/510: Lecture 16

Probability and Statistical Decision Theory

Probability Theory for Machine Learning. Chris Cremer September 2015

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

University of Illinois ECE 313: Final Exam Fall 2014

Naïve Bayes classification

Recap from previous lecture

STT 441 Final Exam Fall 2013

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Probability. Lecture Notes. Adolfo J. Rumbos

Functions of two random variables. Conditional pairs

Quick Tour of Basic Probability Theory and Linear Algebra

Refresher on Discrete Probability

Data Analysis and Monte Carlo Methods

MTH4107 / MTH4207: Introduction to Probability

Algebra I. Slide 1 / 175. Slide 2 / 175. Slide 3 / 175. Quadratics. Table of Contents Key Terms

Algebra I. Key Terms. Slide 1 / 175 Slide 2 / 175. Slide 3 / 175. Slide 4 / 175. Slide 5 / 175. Slide 6 / 175. Quadratics.

Transcription:

Assignment 1 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Which of the following tasks can be best solved using Clustering. (a) Predicting the amount of rainfall based on various cues (b) Detecting fraudulent credit card transactions (c) Training a robot to solve a maze We can think of the task of detecting fraudulent credit card transactions as essentially representing all credit card transactions using some features and performing clustering. The majority of the transactions will be legal and come under one (or perhaps more) cluster whereas we hope to find transactions not in the above cluster to indicate fraudulent activity which can be detected. Predicting the amount of rainfall is essentially a supervised learning problem. On the other hand, training a robot to solve a maze would best be attempted by making use of reinforcement learning algorithms. 2. What would be the ideal complexity of the curve which can be used for separating the two classes shown in the image below. (a) Linear (b) Quadratic (c) Cubic For the 2D data points shown in the figure, the distribution of the data suggests that the ideal complexity of the curve used for separating the two classes is quadratic. A linear boundary would result in a large number of miss-classifications, whereas a third degree curve would not do any better than the quadratic. 3. Pavlov s Experiment 1

(Image source - http://www.goldiesroom.org/) Pavlov s experiment is a classic experiment conducted by the Russian physiologist Ivan Pavlov. He experiments on dogs shown in the image below. Before conditioning dog responds with saliva only in presence of the food but after conditioning it starts salivating just with the bell. Select the correct option(s) about the experiment. (a) In this experiment, the dog acts as a Reinforcement learning agent (b) In this experiment, the dog learns in a supervised setting (c) Comparing this experiment to Reinforcement learning theory, the various states are Presence of just the bell Presence of just food Presence of both food and bell Sol. (a) & (c) In this scenario, we see that as the dog interacts with the environment it is in, it starts to modify its behaviour based on rewards it receives from the environment (in this case, food). This is clearly a reinforcement learning situation. We also note the different states of the environment which differ as the experiment progresses - initially just the food is present resulting in the dog producing saliva, next the bell is introduced whenever the dog is given food as part of the conditioning process, and finally, the desired behaviour (i..e, salivation) can be produced when just the bell is rung (without the food). 4. There are n bins of which the kth contains k 1 blue balls and n k red balls. You pick a bin at random and remove two balls at random without replacement. Find the probability that: the second ball is red; the second ball is red, given that the first is red. (a) 1/3, 2/3 (b) 1/2, 1/3 (c) 1/2, 2/3 (d) 1/3, 1/3 2

Sol. (c) Let C i be the colour of the ith ball. In each bin, there are a total of (k 1) + (n k) = (n 1) balls. Of these half are blue and the other half are red (verify n k=1 k 1 = n k=1 n k). The probability of the second ball being red is equal to the probability of the second ball being red given that the first ball was either red or blue. For a particular bin we have, P (C 2 = red) = Considering all bins, we have (n k) (n k 1) (k 1) (n k) + (n 1) (n 2) (n 1) (n 2) = n k n 1 P (C 2 = red) = n k=1 n k n(n 1) = 1 2 The probability of the second ball being red given that the first ball was red, Now, P (C 1 = red) = 1 2. For a particular bin, Considering all bins, we have P (C 2 = red C 1 = red) = P (C 2 = red, C 1 = red) P (C 1 = red) P (C 2 = red, C 1 = red) = (n k) (n k 1) (n 1) (n 2) P (C 2 = red C 1 = red) = n k=1 (n k)(n k 1) n(n 1)(n 2) 1 2 Simplifying, we have P (C 2 = red C 1 = red) = 2 3 5. A medical company touts its new test for a certain genetic disorder. The false negative rate is small: if you have the disorder, the probability that the test returns a positive result is 0.999. The false positive rate is also small: if you do not have the disorder, the probability that the test returns a positive result is only 0.005. Assume that 2% of the population has the disorder. If a person chosen uniformly from the population is tested and the result comes back positive, what is the probability that the person has the disorder? (a) 0.803 (b) 0.976 (c) 0.02 (d) 0.204 3

Sol. (a) Let, T be the probability of the test being positive, D be the probability of a person having the disorder From the data provided, we have: P (T D) = 0.999 P (T D) = 0.005 P (D) = 0.02 P ( D) = 1 P (D) = 0.98 We want to calculate the probability of a person chosen uniformly at random having the disorder given that the test came back positive, i.e., Now from the Bayes theorem, we have P (D T ) = P (T D)P (D) P (T ) Substituting known values, we have P (D T ) = = P (D T ) P (T D)P (D) P (T D)P (D) + P (T D)P ( D) 0.999 0.02 0.999 0.02 + 0.005 0.98 = 0.803 6. In an experiment, n coins are tossed, with each one showing up heads with probability p independently of the others. Each of the coins which shows up heads is then tossed again. What is the probability of observing 5 heads in the second round of tosses, if we toss 15 coins in the first round and p = 0.4? (Hint: First find the mass function of the number of heads observed in the second round.) (a) 0.372 (b) 0.055 (c) 0.0345 (d) 0.0488 The same result will be observed if we toss each of the coins twice and count the number of coins showing two consecutive heads. The probability of showing two consecutive heads, given that the probability of showing heads in a toss is p, is equal to p 2. Hence, the number of heads X observed after the second round of tosses can be given by ( ) n P (X = r) = p 2r (1 p 2 ) n r x Now substituting the given values into the above pmf we have, ( ) 15 P (X = 5) = 0.4 10 (1 0.4 2 ) 10 = 0.055 5 4

7. Consider two random variables X and Y having joint density function f(x, y) = 2e x y, 0 < x < y <. Are X and Y independent? Find the covariance of X and Y. (a) Yes, 1/4 (b) Yes, 1/2 (c) No, 1/4 (d) No, 1/2 Sol. (c) From the joint density, we can compute the marginal densities of X and Y. f X (x) = f Y (y) = x y 2e x y dy = 2e x e y x = 2e 2x. 2e x y dy = 2e y e x y 0 = 2e y (1 e y ). We can see that f X (x)f Y (y) f(x, y), hence, X and Y are not independent. Now, E[Y ] = y 0 E[X] = 2xe 2x dx = 1 2 Calculating the covariance, we have, E[XY ] = 2xye x y dxdy = 0 x y 2ye y dy 2ye 2y dy = 2 1 y 0 2 = 3 2 2xe x dx ye y dy Now, ye y dy = xe x e y dy = (x + 1)e x Thus, we have, E[XY ] = 2x(x + 1)e 2x dx = 2x 2 e 2x dx + 2xe 2x dx Note that the second term is simply 1/2 whereas the first term 2x 2 e 2x dx = 4x( (1/2)e 2x )dx = 2xe 2x dx = 1 2 Combining all results, we have Cov(X, Y ) = E[XY ] E[X]E[Y ] = 1 3 4 = 1 4 5

8. An airline knows that 5 percent of the people making reservations on a certain flight will not show up. Consequently, their policy is to sell 52 tickets for a flight that can hold only 50 passengers. What is the probability that there will be a seat available for every passenger who shows up? (a) 0.5101 (b) 0.81 (c) 0.6308 (d) 0.7405 Sol. (d) The probability that there will be a seat available for every passenger who shows up is equal to the probability that less than or equal to 50 passengers show up. This is the same as 1 minus the probability that exactly 52 or 51 passengers show up. Thus, the required probability 9. Let X have mass function = 1 0.95 52 52(0.95) 51 (0.05) = 0.7405. f(x) = { {x(x + 1)} 1 if x = 1, 2,..., 0 otherwise, and let α R. For what values of α is it the case that E(X α ) <? (a) α < 1 2 (b) α < 1 (c) α > 1 (d) α > 3 4 We have This expression is finite only if α < 1. E[X α ] = x=1 x α x(x + 1) 10. Is the following a distribution function? { e 1/x x > 0 F (x) = 0 otherwise If so, give the corresponding density function. If not, mention why it is not a distribution function. (a) No, not a monotonic function (b) Yes, x 2 e 1/x, x > 0 (c) No, not right continuous (d) Yes, x 1 e 1/x, x > 0 6

F (x) 1 as x and F (x) = 0 for x < 0 by definition. The monotonicity and continuity of F follow from the corresponding properties of e 1/x. Also, since all continuous functions are right continuous, F is right continuous. Thus, the given function is a distribution function. Its corresponding density function is given by differentiating: { x f(x) = 2 e 1/x x > 0 0 otherwise 7