Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Similar documents
Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

01 Probability Theory and Statistics Review

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Lecture Note 1: Probability Theory and Statistics

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

EXAM # 3 PLEASE SHOW ALL WORK!

CSC411 Fall 2018 Homework 5

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Midterm Exam 1 Solution

Machine Learning: Homework Assignment 2 Solutions

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Chapter 4. Chapter 4 sections

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

ECE Homework Set 3

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

ENGG2430A-Homework 2

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Midterm Exam 1 (Solutions)

SDS 321: Introduction to Probability and Statistics

Section 8.1. Vector Notation

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

MAS223 Statistical Inference and Modelling Exercises

B4 Estimation and Inference

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Parameter estimation Conditional risk

Lecture 2: Review of Probability

Chapter 5 continued. Chapter 5 sections

Review: mostly probability and some statistics

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Class 8 Review Problems 18.05, Spring 2014

Probability Review. Chao Lan

CMPSCI 240: Reasoning Under Uncertainty

Week 12-13: Discrete Probability

Problem Set 7 Due March, 22

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Introduction to Simple Linear Regression

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

SDS 321: Introduction to Probability and Statistics

Lecture 2: Review of Basic Probability Theory

Final Exam { Take-Home Portion SOLUTIONS. choose. Behind one of those doors is a fabulous prize. Behind each of the other two isa

FINAL EXAM: Monday 8-10am

Machine Learning. Instructor: Pranjal Awasthi

Continuous Random Variables

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Introduction to Probability and Stocastic Processes - Part I

Notes on Random Vectors and Multivariate Normal

A Probability Review

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

6.041/6.431 Fall 2010 Quiz 2 Solutions

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Lecture 11. Probability Theory: an Overveiw

1 Basic continuous random variable problems

BASICS OF PROBABILITY

3. Probability and Statistics

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Expectation maximization

conditional cdf, conditional pdf, total probability theorem?

Multivariate probability distributions and linear regression

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 4: Solutions Fall 2007

MAT 271E Probability and Statistics

Lecture 1: August 28

MACHINE LEARNING ADVANCED MACHINE LEARNING

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

S n = x + X 1 + X X n.

Introduction to Machine Learning

Lecture 2: Repetition of probability theory and statistics

HW Solution 12 Due: Dec 2, 9:19 AM

STAT 430/510: Lecture 10

Bayesian statistics, simulation and software

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Introduction to Bayesian Statistics

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Unsupervised Learning

Introduction to Machine Learning

CMPSCI 240: Reasoning Under Uncertainty

Chp 4. Expectation and Variance

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

CME 106: Review Probability theory

ESTIMATION THEORY. Chapter Estimation of Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

EECS 126 Probability and Random Processes University of California, Berkeley: Spring 2015 Abhay Parekh February 17, 2015.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Business Statistics 41000: Homework # 2 Solutions

STAT 414: Introduction to Probability Theory

Multivariate Distributions

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Gaussian random variables inr n

STAT 430/510: Lecture 16

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Transcription:

Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a reasonable level of consistency and simplicity we ask that you do not use other software tools.] If you collaborated with other members of the class, please write their names at the of the assignment. Moreover, you will need to write and sign the following statement: In preparing my solutions, I did not look at any old homeworks, copy anybody s answers or let them copy mine. Problem : Why? [5 points] Limit your answer to this problem to a page. a. Describe an application of pattern recognition related to your research. What are the features? What is the decision to be made? Speculate on how one might solve the problem. b. In the same way, describe an application of pattern recognition you would be interested in pursuing for fun in your life outside of work. Solution: Refer to examples discussed in lecture. Problem : Probability Warm-Up [0 points] Let x and y be discrete random variables, and a and b are constant values. Let µ x denote the expected value of x and σ x denote the variance of x. Use excruciating detail to answer the following: a. Show E[ax + by] ae[x] + be[y]. b. Show σ x E[x ] µ x. c. Show that indepent implies uncorrelated. d. Show that uncorrelated does not imply indepent.

e. Let z ax + by. Show that if x and y are uncorrelated, then σz a σx + b σy. f. Let x i i,..., n be random variables indepently drawn from the same probability distribution with mean µ x and variance σx. For the n sample mean x n x i, show the following: i E[x] µ x. ii Var[x] i variance of the sample mean σx/n. Note that this is different from n the sample variance s n n x i x. i g. Let x and x be indepent and identically distributed i.i.d continuous random variables. Can Pr[x x ] be calculated? If so, find its value. If not, explain. Hint : Remember that for a continuous variable Pr[x k] 0, for any value of k. Hint : Remember the definition of i.i.d. variables. h. Let x and x be indepent and identically distributed discrete random variables. Can Pr[x x ] be calculated? If so, find its value. If not, explain. Solution: a. The following is for continuous random variables. A similar argument holds for continuous random variables. E[ax + by] ax + by px, y a x X,y Y x X,y Y x px, y + b a x px dx + b x X y Y ae[x] + be[y] x X,y Y y py dy y px, y b. Making use of the definition of variance and the previous part, we have: σ x E[x µ x ] E[x µ x x + µ x] E[x ] E[µ x x] + E[µ x] E[x ] µ x E[x] + µ x E[x ] µ x µ x + µ x E[x ] µ x + µ x E[x ] µ x c. In order to check if two discrete random variables x and y are uncorrelated, we have to prove σ xy 0 the same holds for continuous random variables. From the previous question: σ x,y E[xy] µ x µ y

If two variables are indepent: Finally, E[xy] x X,y Y x X,y Y xy px, y xy px py x px x X y Y E[x] E[y] y py σ x,y E[x] E[y] µ x µ y 0 d. To prove this, we need to find one case where px, y pxpy and σ xy 0 are satisfied. One possible solution is as follows: Suppose we have the discrete random variables x and y, and we observed all possibilities: x y - - 0 0 0 0 If we look at the case where x and y, is satisfied: px, y 6 3 px py 4 6 6 9 Now it is easy to verify that is also satisfied: σ x,y E[xy] µ x µ y 0 0 4 6 0 e. Given that z ax + by and that x and y are uncorrelated, we have σ z E[z µ z ] E[z ] µ z E[ax + by ] aµ x + bµ y E[a x + abxy + b y ] a µ x + abµ x µ y + b µ y 3

a E[x ] + abe[xy] + b E[y ] a µ x abµ x µ y b µ y a E[x ] µ x + abe[xy] µ x µ y + b E[y ] µ y a σx + abσxy + b σy a σx + b σy, where only the last equality deps on x and y being uncorrelated. f. Using the result of a and the fact that E[x i ] µ x, E[ x] E[ n n x i ] n i n E[x i ] n n µ x µ x i Also, using the result of d and the fact Var[x i ] σ x Var[ x] Var[ n n x i ] n i n i Var[x i ] n n σ x σ x/n g. Given that x and x are continuous random variables, we know that Pr[x k] 0 and Pr[x k] 0 for any value of k. Thus, Pr[x x ] Pr[x < x ]. Given that x and x are i.i.d., we know that replacing x with x and x with x will have no effect on the world. In particular, we know that Pr[x < x ] Pr[x < x ]. However, since probabilities or the space of possible values must sum to one, we have Pr[x < x ] + Pr[x < x ]. Thus, Pr[x x ]. h. For discrete random variables, unlike the continuous case above, we need to know the distributions of x and x in order to find Pr[x k] and Pr[x k]. Thus, the argument we used above fails. In general, it is not possible to find Pr[x x ] without knowledge of the distributions of both x and x. Problem 3: Teatime with Gauss and Bayes [0 points] Let px, y παβ e y µ α + x y β 4

a. Find px, py, px y, and py x. In addition, give a brief description of each of these distributions. b. Let µ 0, α 5, and β 3. Plot py and py x 9 for a reasonable range of y. What is the difference between these two distributions? Solution: a. To find py, simply factor px, y and then integrate over x: py px, y dx παβ e παβ e y µ e α πα y µ e α πα y µ α + x y β dx y µ α e x y β dx πβ x y e β dx N µ, α The integral goes to because it is of the form of a probability distribution integrated over the entire domain. To find px y, divide px, y by py: px y px, y py πβ x y e β N y, β Finding px and py x follows essentially the same procedure, but the algebra is more involved and requires completing the square in the exponent. px px, y dy παβ e παβ e παβ e y µ α + x y β dy β y µ +α x y α β dy β y β µy+β µ +α x α xy+α y α β 5 dy

N ote : παβ e παβ e παβ e α +β y α x+β µy+β µ +α x α β y α x+β µ α +β y+ β µ +α x α +β α β α +β dy y α x+β µ α +β y+ παβ e παβ e π α β παβ α + β e y α x+β µ α +β y α x+β µ α +β α β α +β e π α β α +β πα + β e πα + β e πα + β e πα + β e πα + β e dy α x+β µ α α +β x+β µ α +β + β µ +α x α +β α β α +β α x+β µ α +β + β µ +α x α +β α β α +β e β µ +α x α α +β x+β µ α +β y α x+β µ α +β α β α +β α β α +β dy β µ +α x α α +β x+β µ α +β dy β µ +α x α α +β x+β µ α +β α β α +β α β α +β α +β β µ +α x α x+β µ α β α +β dy dy e π α β α +β py x dy α β µ +α 4 x +β 4 µ +α β x α 4 x α β µx β 4 µ α β α +β α β x α β µx+α β µ α β α +β x µ α +β y α x+β µ α +β α β α +β dy 6

N µ, α + β To find py x we simply divide px, y by px. In finding px, we already know the form of py x see the longest line in the derivation of px above: py x px, y px e π α β α +β y α x+β µ α +β α β α +β N α x + β µ α + β, α β α + β Note that all the above distibutions are Gaussian. b. The following Matlab code produced Figure : c l o s e a l l ; c l e a r a l l ; %I n i t i a l i z e v a r i a b l e s m 0. 0 ; a 5. 0 ; b 3 ; x 9 ; %Compute parameters y 00::00; mean a ˆ x + bˆ m / aˆ + b ˆ ; var a b ˆ / aˆ + b ˆ ; p y g i v e n x. 0 / s q r t pi var exp y mean. ˆ / var ; var a ˆ ; p y. 0 / s q r t pi var exp y m. ˆ / var ; %Show i n f o r m a t i o n f i g u r e ; p l o t y, p y given x, b ; hold on p l o t y, p y, r ; leg p y x, p y ; sy s i z e y ; a x i s [ y, y sy, 0, 0. ] ; x l a b e l y ; 7

0. 0.8 py x py 0.6 0.4 0. 0. µ0 α5 β3 0.08 0.06 0.04 0.0 0 00 80 60 40 0 0 0 40 60 80 00 y Figure : The marginal p.d.f. of y and the p.d.f. of y given x for a specific value of x. Notice how knowing x makes your knowledge of y more certain. t e x t 70,0.4, \mu0 ; t e x t 70,0., \ alpha 5 ; t e x t 70,0., \ beta 3 ; Problem 4: Let Σ [ 5 4 4 5 ] Covariance Matrix [5 points] a. Find the eigenvalues and eigenvectors of Σ by hand include all calculations. Verify your computations with MATLAB function eig. b. Verify that Σ is a valid covariance matrix. 8

c. We provide 500 data points sampled from the distribution N [0 0], Σ. Download the dataset from the course website and project the data onto the eigenvectors of the covariance matrix. What is the effect of this projection? Include MATLAB code and plots before and after the projection. Solution: a. We can find the eigenvectors and eigenvalues of Σ by starting with the definition of an eigenvector. Namely, a vector e is an eigenvector of Σ if it satisfies Σe λe for some constant scalar λ, which is called the eigenvalue corresponding to e. This can be rewritten as This is equivalent to Thus, we require that Σ λie 0 detσ λi 0 5 λ 4 0 By inspection, this is true when λ 9 and λ. To find the eigenvectors, we plug the eigenvalues back into the equation above to get [ ] [ ] [ ] [ ] [ ] 5 9 4 a 4 4 a 0 Σ 9Ie 4 5 9 b 4 4 b 0 which gives a b. Normalized, this results in the eigenvector [ ] e Similarly, λ gives [ ] [ 5 4 a Σ Ie 4 5 b ] [ 4 4 4 4 ] [ a b ] [ 0 0 ] which gives a b. Normalized, this results in the eigenvector [ ] e b. The matrix Σ is a valid covariance matrix if it is symmetric and positive semi-definite. Clearly, it is symmetric, since Σ T Σ. One way to prove it is positive semi-definite is to show that all its eigenvalues are non-negative. This is indeed the case, as shown in the previous question. 9

c. Figure shows how projecting the data onto the eigenvectors of the data s covariance matrix reduces the correlation between x and y. The MATLAB code is as follows: c l e a r a l l ; c l o s e a l l ; %I n i t i a l i z e c o v a r i a n c e matrix S [ 5 4 ; 4 5 ] ; %Load data load points ; %Show c o r r e l a t i o n o f o r i g i n a l p o i n t s c o r r c o e f X %Compute e i g e n v e c t o r s and e i g e n v a l u e s [V D] e i g S ; D V %P r o j e c t data px V X ; %Show c o r r e l a t i o n o f p r o j e c t e d p o i n t s c o r r c o e f px %Showing data f i g u r e ; subplot s c a t t e r X :,,X :,, f i l l e d ; a x i s equal x l a b e l x ; y l a b e l y ; t i t l e O r i g i n a l Data ; subplot ; s c a t t e r px :,, px :,, f i l l e d ; a x i s equal x l a b e l x ; y l a b e l y ; t i t l e P rojected Data ; 0

Original Data Projected Data y 0 8 6 4 0 4 6 8 0 y 6 4 0 4 6 8 5 0 5 x 4 0 4 x Figure : The original data and the data transformed into the coordinate system defined by the eigenvectors of their covariance matrix. Problem 5: Probabilistic Modeling [0 points] Let x {0, } denote a person s affective state x 0 for positive-feeling state, and x for negative-feeling state. The person feels positive with probability θ. Suppose that an affect-tagging system or a robot recognizes her feeling state and reports the observed state, y, to you. But this system is unreliable and obtains the correct result with probability θ. a. Represent the joint probability distribution P x, y θ for all x, y a x matrix as a function of the parameters θ θ, θ. b. The Maximum Likelihood estimation criterion for the parameter θ is defined as: θ ML arg max θ Lt,..., t n ; θ arg max θ n pt i θ where we have assumed that each data point t i is drawn indepently from the same distribution so that the likelihood of the data is Lt,..., t n ; θ i

i n pt i θ. Likelihood is viewed as a function of the parameters, which deps on the data. Since the above expression can be technically challenging, we maximize the log-likelihood log Lt,..., t n ; θ instead of likelihood. Note that any monotonically increasing function i.e., log function of the likelihood has the same maxima. Thus, θ ML arg max θ log Lt,..., t n ; θ arg max θ n log pt i θ i Suppose we get the following joint observations t x, y. x y 0 0 0 0 0 0 0 0 What are the maximum-likelihood ML values of θ and θ? Hint. Since P x, y θ P y x, θ P x θ, the estimation of the two parameters can be done separately in the log-likelihood criterion. Solution: a. The probability mass function pmf of x {0, } is { } θ, x 0 P x θ, x The conditional pmf of y {0, } given that x 0 is { } θ, y 0 P y x 0 θ, y The conditional pmf of y given that x is { θ, y 0 P y x θ, y } Use P x, y P y xp x to tabulate the joint pmf of x, y. P 0, 0 P 0, θ P x, y θ θ θ P, 0 P, θ θ θ θ

b. We select θ, θ to maximize the log-likelihood of the samples {x i, y i, i,..., n} which may be expressed as Jθ, θ i log P x i, y i log P y i x i + log P x i i log P y i x i + log P x i i i J θ + J θ Hence, we choose θ to maximize J θ i log P x i Nx log θ + n Nx log θ where Nx i x i. Differentiating w.r.t. θ gives J Nx n Nx + θ θ θ We set this derivative to zero and solve for θ to obtain θ Similarly, we choose θ to maximize Nx n J θ i log P y i x i Nx y log θ + n Nx y log θ where Nx y i x iy i + x i y i. Differentiating J w.r.t. θ, setting to zero and solving for θ gives θ Nx y n For the example data, θ 4 8, θ 6 8. Thus, θ P x, y θ θ θ θ θ θ θ The maximum likelihood of the data under this model is 6 8 6 4 P x i, y i 4.34 0 5 8 8 8 i 3

Problem 6: Ring Problem [0 points] To get credit for this problem, you must not only write your own correct solution, but also write a computer simulation of the process of playing this game: Suppose I hide the ring of power in one of three identical boxes while you weren t looking. The other two boxes remain empty. After hiding he ring of power, I ask you to guess which box it s in. I know which box it s in and, after you ve made your guess, I deliberately open the lid of an empty box, which is one of the two boxes you did not choose. Thus, the ring of power is either in the box you chose or the remaining closed box you did not choose. Once you have made your initial choice and I ve revealed to you an empty box, I then give you the opportunity to change your mind you can either stick with your original choice, or choose the unopened box. You get to keep the contents of whichever box you finally decide upon. What choice should you make in order to maximize your chances of receiving the ring of power? Justify your answer using Bayes rule. Write a simulation. There are two choices in this game for the contestant in this game: choice of box, choice of whether or not to switch. In your simulation, first let the host choose a random box to place the ring of power. Show a trace of your program s output for a single game play, as well as a cumulative probability of winning for 000 rounds of the two policies to choose a random box and then switch and to choose a random box and not switch. Solution: Always switch your answer to the box you didn t choose the first time. This reason is as follows. You have a /3 chance of initially picking the correct box. That is, there is a /3 chance the correct answer is one of the other two boxes. Learning which of the two other boxes is empty does not change these probabilities; your initial choice still has a /3 chance of being correct. That is, there is a /3 chance the remaining box is the correct answer. Therefore you should change your choice. Using Bayes: R Box with the Ring. O Box Opened. S Box Selected. Where: P R O, S P O R, SP R S P O S P O S Σ 3 RP O, R S Σ 3 RP O R, SP R S 4

3 P R O, S 3 + 3 + 0 3 3 3 Another way to understand the problem is to ext it to 00 boxes, only one of which has the ring of power. After you make your initial choice, I then open 98 of the 99 remaining boxes and show you that they are empty. Clearly, with very high probability the ring of power resides in the one remaining box you did not initially choose. Here is a sample simulation output for the Ring problem: a c t u a l : guess : r e v e a l : 3 swap : 0 guess : a c t u a l : 3 guess : 3 r e v e a l : swap : 0 guess : 3 a c t u a l : guess : 3 r e v e a l : swap : 0 guess : 3 swap : 0 win : 9 l o s e : 708 win / win+l o s e : 0.9 a c t u a l : 3 guess : r e v e a l : swap : guess : 3 a c t u a l : guess : r e v e a l : swap : guess : 3 5

a c t u a l : 3 guess : r e v e a l : swap : guess : 3 swap : win : 686 l o s e : 34 win / win+l o s e : 0.686 Here is a Matlab program that simulates the Ring simulation output above: f o r swap 0 : win 0 ; l o s e 0 ; f o r i :000 a c t u a l f l o o r rand 3+; guess f l o o r rand 3+; i f guess a c t u a l r e v e a l f l o o r rand +; i f r e v e a l a c t u a l r e v e a l r e v e a l + ; e l s e i f guess && a c t u a l r e v e a l 3 ; e l s e i f guess && a c t u a l 3 r e v e a l ; e l s e i f guess && a c t u a l r e v e a l 3 ; e l s e i f guess && a c t u a l 3 r e v e a l ; e l s e i f guess 3 && a c t u a l r e v e a l ; e l s e i f guess 3 && a c t u a l r e v e a l ; i f swap i f guess && r e v e a l guess 3 ; e l s e i f guess && r e v e a l 3 guess ; e l s e i f guess && r e v e a l guess 3 ; 6

e l s e i f guess && r e v e a l 3 guess ; e l s e i f guess 3 && r e v e a l guess ; e l s e i f guess 3 && r e v e a l guess ; e l s e guess guess ; i f guess a c t u a l win win + ; e l s e l o s e l o s e + ; %% only p r i n t t r a c e f o r f i r s t 3 games i f i < 3 a c t u a l guess r e v e a l swap guess %% p r i n t r e s u l t s f o r each game play p o l i c y swap win / win + l o s e 7