Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a reasonable level of consistency and simplicity we ask that you do not use other software tools.] If you collaborated with other members of the class, please write their names at the of the assignment. Moreover, you will need to write and sign the following statement: In preparing my solutions, I did not look at any old homeworks, copy anybody s answers or let them copy mine. Problem : Why? [5 points] Limit your answer to this problem to a page. a. Describe an application of pattern recognition related to your research. What are the features? What is the decision to be made? Speculate on how one might solve the problem. b. In the same way, describe an application of pattern recognition you would be interested in pursuing for fun in your life outside of work. Solution: Refer to examples discussed in lecture. Problem : Probability Warm-Up [0 points] Let x and y be discrete random variables, and a and b are constant values. Let µ x denote the expected value of x and σ x denote the variance of x. Use excruciating detail to answer the following: a. Show E[ax + by] ae[x] + be[y]. b. Show σ x E[x ] µ x. c. Show that indepent implies uncorrelated. d. Show that uncorrelated does not imply indepent.

e. Let z ax + by. Show that if x and y are uncorrelated, then σz a σx + b σy. f. Let x i i,..., n be random variables indepently drawn from the same probability distribution with mean µ x and variance σx. For the n sample mean x n x i, show the following: i E[x] µ x. ii Var[x] i variance of the sample mean σx/n. Note that this is different from n the sample variance s n n x i x. i g. Let x and x be indepent and identically distributed i.i.d continuous random variables. Can Pr[x x ] be calculated? If so, find its value. If not, explain. Hint : Remember that for a continuous variable Pr[x k] 0, for any value of k. Hint : Remember the definition of i.i.d. variables. h. Let x and x be indepent and identically distributed discrete random variables. Can Pr[x x ] be calculated? If so, find its value. If not, explain. Solution: a. The following is for continuous random variables. A similar argument holds for continuous random variables. E[ax + by] ax + by px, y a x X,y Y x X,y Y x px, y + b a x px dx + b x X y Y ae[x] + be[y] x X,y Y y py dy y px, y b. Making use of the definition of variance and the previous part, we have: σ x E[x µ x ] E[x µ x x + µ x] E[x ] E[µ x x] + E[µ x] E[x ] µ x E[x] + µ x E[x ] µ x µ x + µ x E[x ] µ x + µ x E[x ] µ x c. In order to check if two discrete random variables x and y are uncorrelated, we have to prove σ xy 0 the same holds for continuous random variables. From the previous question: σ x,y E[xy] µ x µ y

If two variables are indepent: Finally, E[xy] x X,y Y x X,y Y xy px, y xy px py x px x X y Y E[x] E[y] y py σ x,y E[x] E[y] µ x µ y 0 d. To prove this, we need to find one case where px, y pxpy and σ xy 0 are satisfied. One possible solution is as follows: Suppose we have the discrete random variables x and y, and we observed all possibilities: x y - - 0 0 0 0 If we look at the case where x and y, is satisfied: px, y 6 3 px py 4 6 6 9 Now it is easy to verify that is also satisfied: σ x,y E[xy] µ x µ y 0 0 4 6 0 e. Given that z ax + by and that x and y are uncorrelated, we have σ z E[z µ z ] E[z ] µ z E[ax + by ] aµ x + bµ y E[a x + abxy + b y ] a µ x + abµ x µ y + b µ y 3

a E[x ] + abe[xy] + b E[y ] a µ x abµ x µ y b µ y a E[x ] µ x + abe[xy] µ x µ y + b E[y ] µ y a σx + abσxy + b σy a σx + b σy, where only the last equality deps on x and y being uncorrelated. f. Using the result of a and the fact that E[x i ] µ x, E[ x] E[ n n x i ] n i n E[x i ] n n µ x µ x i Also, using the result of d and the fact Var[x i ] σ x Var[ x] Var[ n n x i ] n i n i Var[x i ] n n σ x σ x/n g. Given that x and x are continuous random variables, we know that Pr[x k] 0 and Pr[x k] 0 for any value of k. Thus, Pr[x x ] Pr[x < x ]. Given that x and x are i.i.d., we know that replacing x with x and x with x will have no effect on the world. In particular, we know that Pr[x < x ] Pr[x < x ]. However, since probabilities or the space of possible values must sum to one, we have Pr[x < x ] + Pr[x < x ]. Thus, Pr[x x ]. h. For discrete random variables, unlike the continuous case above, we need to know the distributions of x and x in order to find Pr[x k] and Pr[x k]. Thus, the argument we used above fails. In general, it is not possible to find Pr[x x ] without knowledge of the distributions of both x and x. Problem 3: Teatime with Gauss and Bayes [0 points] Let px, y παβ e y µ α + x y β 4

a. Find px, py, px y, and py x. In addition, give a brief description of each of these distributions. b. Let µ 0, α 5, and β 3. Plot py and py x 9 for a reasonable range of y. What is the difference between these two distributions? Solution: a. To find py, simply factor px, y and then integrate over x: py px, y dx παβ e παβ e y µ e α πα y µ e α πα y µ α + x y β dx y µ α e x y β dx πβ x y e β dx N µ, α The integral goes to because it is of the form of a probability distribution integrated over the entire domain. To find px y, divide px, y by py: px y px, y py πβ x y e β N y, β Finding px and py x follows essentially the same procedure, but the algebra is more involved and requires completing the square in the exponent. px px, y dy παβ e παβ e παβ e y µ α + x y β dy β y µ +α x y α β dy β y β µy+β µ +α x α xy+α y α β 5 dy

N ote : παβ e παβ e παβ e α +β y α x+β µy+β µ +α x α β y α x+β µ α +β y+ β µ +α x α +β α β α +β dy y α x+β µ α +β y+ παβ e παβ e π α β παβ α + β e y α x+β µ α +β y α x+β µ α +β α β α +β e π α β α +β πα + β e πα + β e πα + β e πα + β e πα + β e dy α x+β µ α α +β x+β µ α +β + β µ +α x α +β α β α +β α x+β µ α +β + β µ +α x α +β α β α +β e β µ +α x α α +β x+β µ α +β y α x+β µ α +β α β α +β α β α +β dy β µ +α x α α +β x+β µ α +β dy β µ +α x α α +β x+β µ α +β α β α +β α β α +β α +β β µ +α x α x+β µ α β α +β dy dy e π α β α +β py x dy α β µ +α 4 x +β 4 µ +α β x α 4 x α β µx β 4 µ α β α +β α β x α β µx+α β µ α β α +β x µ α +β y α x+β µ α +β α β α +β dy 6

N µ, α + β To find py x we simply divide px, y by px. In finding px, we already know the form of py x see the longest line in the derivation of px above: py x px, y px e π α β α +β y α x+β µ α +β α β α +β N α x + β µ α + β, α β α + β Note that all the above distibutions are Gaussian. b. The following Matlab code produced Figure : c l o s e a l l ; c l e a r a l l ; %I n i t i a l i z e v a r i a b l e s m 0. 0 ; a 5. 0 ; b 3 ; x 9 ; %Compute parameters y 00::00; mean a ˆ x + bˆ m / aˆ + b ˆ ; var a b ˆ / aˆ + b ˆ ; p y g i v e n x. 0 / s q r t pi var exp y mean. ˆ / var ; var a ˆ ; p y. 0 / s q r t pi var exp y m. ˆ / var ; %Show i n f o r m a t i o n f i g u r e ; p l o t y, p y given x, b ; hold on p l o t y, p y, r ; leg p y x, p y ; sy s i z e y ; a x i s [ y, y sy, 0, 0. ] ; x l a b e l y ; 7

0. 0.8 py x py 0.6 0.4 0. 0. µ0 α5 β3 0.08 0.06 0.04 0.0 0 00 80 60 40 0 0 0 40 60 80 00 y Figure : The marginal p.d.f. of y and the p.d.f. of y given x for a specific value of x. Notice how knowing x makes your knowledge of y more certain. t e x t 70,0.4, \mu0 ; t e x t 70,0., \ alpha 5 ; t e x t 70,0., \ beta 3 ; Problem 4: Let Σ [ 5 4 4 5 ] Covariance Matrix [5 points] a. Find the eigenvalues and eigenvectors of Σ by hand include all calculations. Verify your computations with MATLAB function eig. b. Verify that Σ is a valid covariance matrix. 8

c. We provide 500 data points sampled from the distribution N [0 0], Σ. Download the dataset from the course website and project the data onto the eigenvectors of the covariance matrix. What is the effect of this projection? Include MATLAB code and plots before and after the projection. Solution: a. We can find the eigenvectors and eigenvalues of Σ by starting with the definition of an eigenvector. Namely, a vector e is an eigenvector of Σ if it satisfies Σe λe for some constant scalar λ, which is called the eigenvalue corresponding to e. This can be rewritten as This is equivalent to Thus, we require that Σ λie 0 detσ λi 0 5 λ 4 0 By inspection, this is true when λ 9 and λ. To find the eigenvectors, we plug the eigenvalues back into the equation above to get [ ] [ ] [ ] [ ] [ ] 5 9 4 a 4 4 a 0 Σ 9Ie 4 5 9 b 4 4 b 0 which gives a b. Normalized, this results in the eigenvector [ ] e Similarly, λ gives [ ] [ 5 4 a Σ Ie 4 5 b ] [ 4 4 4 4 ] [ a b ] [ 0 0 ] which gives a b. Normalized, this results in the eigenvector [ ] e b. The matrix Σ is a valid covariance matrix if it is symmetric and positive semi-definite. Clearly, it is symmetric, since Σ T Σ. One way to prove it is positive semi-definite is to show that all its eigenvalues are non-negative. This is indeed the case, as shown in the previous question. 9

c. Figure shows how projecting the data onto the eigenvectors of the data s covariance matrix reduces the correlation between x and y. The MATLAB code is as follows: c l e a r a l l ; c l o s e a l l ; %I n i t i a l i z e c o v a r i a n c e matrix S [ 5 4 ; 4 5 ] ; %Load data load points ; %Show c o r r e l a t i o n o f o r i g i n a l p o i n t s c o r r c o e f X %Compute e i g e n v e c t o r s and e i g e n v a l u e s [V D] e i g S ; D V %P r o j e c t data px V X ; %Show c o r r e l a t i o n o f p r o j e c t e d p o i n t s c o r r c o e f px %Showing data f i g u r e ; subplot s c a t t e r X :,,X :,, f i l l e d ; a x i s equal x l a b e l x ; y l a b e l y ; t i t l e O r i g i n a l Data ; subplot ; s c a t t e r px :,, px :,, f i l l e d ; a x i s equal x l a b e l x ; y l a b e l y ; t i t l e P rojected Data ; 0

Original Data Projected Data y 0 8 6 4 0 4 6 8 0 y 6 4 0 4 6 8 5 0 5 x 4 0 4 x Figure : The original data and the data transformed into the coordinate system defined by the eigenvectors of their covariance matrix. Problem 5: Probabilistic Modeling [0 points] Let x {0, } denote a person s affective state x 0 for positive-feeling state, and x for negative-feeling state. The person feels positive with probability θ. Suppose that an affect-tagging system or a robot recognizes her feeling state and reports the observed state, y, to you. But this system is unreliable and obtains the correct result with probability θ. a. Represent the joint probability distribution P x, y θ for all x, y a x matrix as a function of the parameters θ θ, θ. b. The Maximum Likelihood estimation criterion for the parameter θ is defined as: θ ML arg max θ Lt,..., t n ; θ arg max θ n pt i θ where we have assumed that each data point t i is drawn indepently from the same distribution so that the likelihood of the data is Lt,..., t n ; θ i

i n pt i θ. Likelihood is viewed as a function of the parameters, which deps on the data. Since the above expression can be technically challenging, we maximize the log-likelihood log Lt,..., t n ; θ instead of likelihood. Note that any monotonically increasing function i.e., log function of the likelihood has the same maxima. Thus, θ ML arg max θ log Lt,..., t n ; θ arg max θ n log pt i θ i Suppose we get the following joint observations t x, y. x y 0 0 0 0 0 0 0 0 What are the maximum-likelihood ML values of θ and θ? Hint. Since P x, y θ P y x, θ P x θ, the estimation of the two parameters can be done separately in the log-likelihood criterion. Solution: a. The probability mass function pmf of x {0, } is { } θ, x 0 P x θ, x The conditional pmf of y {0, } given that x 0 is { } θ, y 0 P y x 0 θ, y The conditional pmf of y given that x is { θ, y 0 P y x θ, y } Use P x, y P y xp x to tabulate the joint pmf of x, y. P 0, 0 P 0, θ P x, y θ θ θ P, 0 P, θ θ θ θ

b. We select θ, θ to maximize the log-likelihood of the samples {x i, y i, i,..., n} which may be expressed as Jθ, θ i log P x i, y i log P y i x i + log P x i i log P y i x i + log P x i i i J θ + J θ Hence, we choose θ to maximize J θ i log P x i Nx log θ + n Nx log θ where Nx i x i. Differentiating w.r.t. θ gives J Nx n Nx + θ θ θ We set this derivative to zero and solve for θ to obtain θ Similarly, we choose θ to maximize Nx n J θ i log P y i x i Nx y log θ + n Nx y log θ where Nx y i x iy i + x i y i. Differentiating J w.r.t. θ, setting to zero and solving for θ gives θ Nx y n For the example data, θ 4 8, θ 6 8. Thus, θ P x, y θ θ θ θ θ θ θ The maximum likelihood of the data under this model is 6 8 6 4 P x i, y i 4.34 0 5 8 8 8 i 3

Problem 6: Ring Problem [0 points] To get credit for this problem, you must not only write your own correct solution, but also write a computer simulation of the process of playing this game: Suppose I hide the ring of power in one of three identical boxes while you weren t looking. The other two boxes remain empty. After hiding he ring of power, I ask you to guess which box it s in. I know which box it s in and, after you ve made your guess, I deliberately open the lid of an empty box, which is one of the two boxes you did not choose. Thus, the ring of power is either in the box you chose or the remaining closed box you did not choose. Once you have made your initial choice and I ve revealed to you an empty box, I then give you the opportunity to change your mind you can either stick with your original choice, or choose the unopened box. You get to keep the contents of whichever box you finally decide upon. What choice should you make in order to maximize your chances of receiving the ring of power? Justify your answer using Bayes rule. Write a simulation. There are two choices in this game for the contestant in this game: choice of box, choice of whether or not to switch. In your simulation, first let the host choose a random box to place the ring of power. Show a trace of your program s output for a single game play, as well as a cumulative probability of winning for 000 rounds of the two policies to choose a random box and then switch and to choose a random box and not switch. Solution: Always switch your answer to the box you didn t choose the first time. This reason is as follows. You have a /3 chance of initially picking the correct box. That is, there is a /3 chance the correct answer is one of the other two boxes. Learning which of the two other boxes is empty does not change these probabilities; your initial choice still has a /3 chance of being correct. That is, there is a /3 chance the remaining box is the correct answer. Therefore you should change your choice. Using Bayes: R Box with the Ring. O Box Opened. S Box Selected. Where: P R O, S P O R, SP R S P O S P O S Σ 3 RP O, R S Σ 3 RP O R, SP R S 4

3 P R O, S 3 + 3 + 0 3 3 3 Another way to understand the problem is to ext it to 00 boxes, only one of which has the ring of power. After you make your initial choice, I then open 98 of the 99 remaining boxes and show you that they are empty. Clearly, with very high probability the ring of power resides in the one remaining box you did not initially choose. Here is a sample simulation output for the Ring problem: a c t u a l : guess : r e v e a l : 3 swap : 0 guess : a c t u a l : 3 guess : 3 r e v e a l : swap : 0 guess : 3 a c t u a l : guess : 3 r e v e a l : swap : 0 guess : 3 swap : 0 win : 9 l o s e : 708 win / win+l o s e : 0.9 a c t u a l : 3 guess : r e v e a l : swap : guess : 3 a c t u a l : guess : r e v e a l : swap : guess : 3 5

a c t u a l : 3 guess : r e v e a l : swap : guess : 3 swap : win : 686 l o s e : 34 win / win+l o s e : 0.686 Here is a Matlab program that simulates the Ring simulation output above: f o r swap 0 : win 0 ; l o s e 0 ; f o r i :000 a c t u a l f l o o r rand 3+; guess f l o o r rand 3+; i f guess a c t u a l r e v e a l f l o o r rand +; i f r e v e a l a c t u a l r e v e a l r e v e a l + ; e l s e i f guess && a c t u a l r e v e a l 3 ; e l s e i f guess && a c t u a l 3 r e v e a l ; e l s e i f guess && a c t u a l r e v e a l 3 ; e l s e i f guess && a c t u a l 3 r e v e a l ; e l s e i f guess 3 && a c t u a l r e v e a l ; e l s e i f guess 3 && a c t u a l r e v e a l ; i f swap i f guess && r e v e a l guess 3 ; e l s e i f guess && r e v e a l 3 guess ; e l s e i f guess && r e v e a l guess 3 ; 6

e l s e i f guess && r e v e a l 3 guess ; e l s e i f guess 3 && r e v e a l guess ; e l s e i f guess 3 && r e v e a l guess ; e l s e guess guess ; i f guess a c t u a l win win + ; e l s e l o s e l o s e + ; %% only p r i n t t r a c e f o r f i r s t 3 games i f i < 3 a c t u a l guess r e v e a l swap guess %% p r i n t r e s u l t s f o r each game play p o l i c y swap win / win + l o s e 7