Machine Learning Assignment-1 - PDF Free Download

Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a) Create a Matlab fuctio that draws a radom umber from the uivariate ormal distributio N m, σ 2 ) for ay m, σ ɛ R. How do you test whether you have doe this correctly? As: Goal: To create a fuctio rad,m,sd) which takes 3 parameters as iput Where : is umber of radom umbers to be geerated m is required mea of the ormal distributio sdis required variace of the ormal distributio ad retur a array of radom umbers. Approach 1 Step 1. Fid uiformally distributed radom umbers. Step 2. Trasform the uiformally distributed umbers to stadard ormal distributio usig Box Muller Trasformatio. Box Muller Trasformatio: Suppose U1 ad U2 are idepedet radom variables that are uiformly distributed i the iterval 0, 1]. The accordig to Box Muller Trasfor- 1

matio the correspodig idepedet radom variables with a stadard ormal distributio is give by Z0 ad Z1 Z0 = 2 2 l U1 cos2πu2) Z1 = 2 2 l U1 si2πu2) Step 3. Scale the resulted stadard ormal distributio to ormal distributio with mea = m ad variace = sd. ND = m + sd sd) Where : ND is ormal distributio sd is stadard ormal distributio To test this fuctio we plot a histogrm for =1000. It comes as a bell courve which is symetric about the mea m.please check figure 1.1 for the curve. Approach 2 By usig the cetral limit theorem, we kow that the mea of uiformly distriuted radom umbers is ormally distributed. To create a uiformly created radom umber geerator, we use a pseudo radom geerator that is of the form x i+1 = ax i modb) We chose the values of a = 40692, x 0 = 1321349, ad b = 2 31 249 I order to get a ormally distributed curve, we get radom umbers from the pseudo radom umber geerator, ad take the arithmetic mea of the values i each row, givig radom umbers. The seed values for the PRNG are made to persist, thereby gettig distict umbers o each ru. We the scale the results for the mea ad variace specified as per equatio 1.1). The histogram of the values geerated usig this method is show i figure 1.2 b) Choose a arbitrary o-liear uivariate mathematical fuctio f : R R. Make your fuctio complex eough such that it is differet from the aswer of other groups 2

Figure 1.1: Histogram of radom umbers geerated usig box-muller trasformatio Figure 1.2: Histogram of radom umbers geerated usig cetral limit theorem 3

with a high likelihood. As:For this exercise, we are asked to choose a uivariate fuctio from f : R R. We decided to use the asteroid curve, which is defied as: Where : aɛr >0 ad a x a ) 3 y = ± a 2/3) x 2/3) The above fucio is itestig for our purpose of liear regressio, as y is defied at two poits for each value of x, oe positive ad the other egative. We believe such a fuctio will serve as a diagostic tool for our problem of liear regressio, as we ca use this fuctio i two ways, viz., cosiderig oly the positive values of y or cosiderig both the positive ad egative values of y. c) Geerate radom data x i, y i ) such that yi = f x i) + ξ i, ξ i N 0, σ 2), for a rage of x-values ad a value of σ chose such that the data is iterestig. Plot the data. Approach As metioed above, the asteroid curve could be plotted either by cosiderig both the egative ad positive sig of y or by usig just oe of the sig values. We wrote a matlab fuctio geerate data.m which geerates the various y values from a rage of user specified x values. The plot of the curve for various parameters are show i figures 1.3, 1.4, 1.5 ad 1.6: Figure 1.3: Plot of positive asteroid withoutfigure 1.4: Plot of positive asteroid with oise added oise N0, 3) Figure 1.5: Plot of full asteroid withoutfigure 1.6: Plot of full asteroid with added oise oise N0, 3) 4

The above figures were draw usig a = 30 for every a x a i itervals of 0.1. Alterig the value of a does ot chage the shape of the curve, but acts as a scalig factor, icreasig the size of curve as well as the rage of values over which x is defied. We used a moderate value of σ = 3 as too much oise chages the shape of the curve, while too little does ot reflect the real-life sceario. d) Use liear least-squares to fit a polyomial curve through your geerated data ad compute the mea-squared error of your estimate. Give your derivatio. Vary the umber of data poits you use ad degree of the polyomial. Plot the results. Describe i words what you see. Approach Let us assume that the geerated data satisfy the polyomial equatio y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 +... + a m x m where m is the degree of the polyomial = Y = A t X Where A t = [a 0 a 1 a 2... a m ] ad X t = [1 x x 2... x m ] The values of the coefficiets A will be determied by fittig the polyomial to the traiig data oisy data geerated from the asteroid curve). This ca be doe by miimizig a error fuctio that measures the misfit betwee the fuctioyx,a), for ay give value of a, ad the traiig set data poits. Let us assume our error fuctio is Mea Square Error fuctio give by Takig trace EA) = Y i A t X i 2 EA) = Y i A t X i )Y i A t X i ) t EA) = Y i A t X i )Yi t Xt i A) EA) = Y iyi t At X i Yi t Y ixi ta + At X i Xi ta) EA) = Y iyi t 2Y ixi ta + At X i Xi ta) EA) = At X ixi t)a 2 Y ixi t)a + Y iyi t) trea)) = tr At X ixi t)a ) tr 2 Y ixi t)a ) + tr Y iy t i ) ) 5

trea)) = tr A X ixi t)at ) tr 2 Y ixi t)a ) + tr Y iy t Takig derivative i ) ) trea)) A = A tra X ixi t)at ) A tr2 Y ixi t)a ) + 0 Miimum value of EA) is at poit where trea)) A = 0 0 = A X i Xi t ) t + A X i Xi t 2 Y i Xi t ) A X i Xi t = As per Liear Mea Square value of A = Y i Xi t ) Y i X t i X i X t i A = Y i X t i X i X t i Fittig a polyomial to the asteroid curve We performed experimets o liear regressio i both the positive asteroid ad the full asteroid curves. For all the experimets, the data were geerated by choosig the fuctio parameters as a = 30&x was sampled at all poits ɛ [ a, a] at itervals of 0.1 ad Gaussia oise was added to the sample set usig the gaussia radom umber geerator defied above. We the implemeted the above MSE estimator i Matlab as geerate data.m. The fuctio takes i as argumets, the value of a, the samplig frequecy, the mea ad the variace for the radom umber geerators ad a flag which specifies if we require a positive or full asteroid curve ad returs a set of oisy observatios of the asteroid curve, where the oise compoet is derived from a ormally distributed radom umber geerator. The resultig data was the fed to fit polyomial.m which takes the data poits from geerate data.m ad the polyomial degree ad computes the value of A, which miimized the mea square distace i the data. We the plotted the proposed values of [y] agaist a x a i itervals of 0.1 to get a idea of how good the estimates are. 6

Positive Asteroid For the positive asteroid, the data was geerated usig a oise variace of 3 ad mea of 0. The data set cotaied 601 sets of x, y correspodig to the values of the curve from -a to a. We the tryig varyig the degree of the polyomial from = 2 to = 10. The plots of the data accordig to the predicted model ad the origial data is show below. Figure 1.7: Plot origial data with oise Figure 1.8: Plot of predicted values takigfigure 1.9: Plot of predicted values takig = 2 = 3 From the figures, it is clear that choosig a smaller value of the degree, forms the best fit for the data. O icreasig the degree of the polyomial to =10, the model trys to over fit the data ad much of the local miima ad maxima of the curve correspod to the oise deviatios. The lower power polymoial of = 3 or = 4 provides the best fit for the curve i our experimets. 7

Figure 1.10: Plot of predicted values takigfigure 1.11: Plot of predicted values takig = 4 = 10 Full Asteroid For the full asteroid, the data was geerated usig a oise variace of 3 ad mea of 0 just as was doe for the positive asteroid. The data set cotaied 1201 sets of x, y correspodig to the values of the curve from atoa. We the tryig varyig the degree of the polyomial i the rage = 2to = 20. The plots of the data accordig to the predicted model ad the origial data is show below. Figure 1.12: Plot origial data with oise As we ca see from the figures 1.13 to 1.18, the liear model fails to model the full asteroid curve effectively, ulike the positive asteroid curve from the previous sectio. We fid that while a lower degree of the polyomial such as = 2& = 4) avoids overfittg, it fails to provide a suitable predictio for x > 0. O the other had, polyomials of higher order, simply oscillate betwee positive ad egative values ad do ot provide a good model for the full asteroid. This example therefore shows the limitatios of a liear model i makig predictios of values which follow a complex o-liear fuctio. 8

Figure 1.13: Plot of predicted values takigfigure 1.14: Plot of predicted values takig = 2 = 4 Figure 1.15: Plot of predicted values takigfigure 1.16: Plot of predicted values takig = 6 = 10 Figure 1.17: Plot of predicted values takigfigure 1.18: Plot of predicted values takig = 15 = 20 While it provides very good approximatio for the positive asteroid, the liear model fails to model the full asteroid well. 2 Liear Multivariate Regressio. The attached file data.txt has two umbers d ad o the first lie; d is the dimesio of the data, ad is the umber of data poits. The, lies of each d umbers follow that costitute a set of vectors x tt=1, with x t R d The data x tt=1 is hypothesized to be geerated by a model: x t+1 = Ax t + m t m t N 0, Σ) where A R dxd, m t R d adσ R d xd a) Compute best estimates for A ad Σ Give your derivatio. Sol: To calculate the value of A ad Σ let us assume we have data X, Y ) where is the umber of data poits. The give hypothesis is X t+1 = AX t + M t 9

For the sake of coviiece, lets replace X t+1 by Y. The equatio ow becomes : Y = AX t + M t where M t is N 0, Σ) This could be re-writte as Y N A T X, Σ ) Which represets a ormal curve with mea AX t. I order to get the values of A,B ad Σ, lets cosider the MLE of the above curve. The geeral likelihood fuctio is, ) 1 e 2 x ˆx)T x ˆx) P x) = 2π) Σ ) d for our fuctio, l A, Σ) = ) 1 1.e 2 y i A x i ) T y i A x i ) 2π) d Σ We eed to fid the values of A ad Σ which maximizes the above fuctio. Therefore, lets take the derivate ad equate it to zero after takig the atural logrithm o both sides. ll A, Σ) = l 1 2π) d Σ )) 1.e 2 y i A x i ) T y i A x i ) ll A, Σ) = log 2π d 2 ) ) + 1 2 log )) 1 2 y i A x i ) T y i A x i ) ) log 2π d 2 ) ) term is a costat ad lets detote it by c. O further simplificatio, we get Takig the trace: ll A, Σ) = c + 2 log ) ll A, Σ) = c + 2 log ) = c + 2 log ) 1 2 y i A x i ) T y i A x i ) ) tr 1 2 y i A x i ) T y i A x i ) ) 1 2 tr y i A x i ) y i A x i ) T ) 10

as tr ABC) = tr BCA) = tr CAB) Expadig the last term, we get. = c+ 2 log ) 1 2 tr yi y T ) i yi x T i A T ) A xi y T ) i + A xi x T i A T )) ) = c + 2 log ) 1 2 tr + 1 2 tr yi y i T ) ) + 1 Σ 2 tr 1 yi x T i A T )) A xi y i T ) ) 1 Σ 2 tr 1 A xi x T i A T )) Here, tr yi x T )) i AT = tr A xi y i T )) as tr A) = tr A) T ad as tr ABC) = tr BCA) = tr CAB) Therefore, by simplicatio we get ll A, Σ) = c + 2 log ) 1 2 tr 1 2 tr yi y i T ) ) + tr A xi x T i A T )) yi x T i A T )) I order to fid the values of A ad Σ, lets take the derivatives ad equate it to zero. ) ll A, Σ) = 0 = yi x T ) 1 i A x i x T i + A x i x T i A 2 Here tr AB) A = B T ad tr CABA T ) A = C T AB T + CAB = = A x i x T i = yi x T ) i A = yi x T ) ) i x i x T i Σ = y A x) y A x) T ) 1 Puttig the values of x ad y from data.txt ad solvig for A ad Σ we get A = 1.0000 4.7027e 05 0.4939 0.0021 2.5025e 06 1.0000 0.0011 0.4989 0.0090 0.0061 1.7839 0.6576 0.0089 0.0060 0.8028 1.6608 11

Σ = 1.4959e 04 1.0289e 05 0.0011 4.5981e 04 1.0289e 05 1.5179e 04 9.8889e 04 3.4004e 04 0.0011 9.8889e 04 33.9355 33.8797 4.5981e 04 3.4004e 04 33.8797 33.8291 b) Based o the results, do you have a guess of what the model above actually models? The system models a time depedet system, where the predicted output of a observatio or experimet depeds o the value of the curret state of the system ad a radom evet which acts as a oise. Oe such example of this system could be the motio plaig robot, where the positio at time x t+1 is derived from it s positio at time t which correspods to the state of the robot at curret time. The oise compoet could be cosidered as the effects of the robot s eviromet, which prevet the robot from followig a perfect ad expected trajectory. Such effects could be wid chages, terrai chages or ay other such aturally occurig evets, which are hard to model ad predict, thereby take as a radom umber, simplifyig our system. Cosiderig the motio plaer as the iput, the d-dimesioal feature vector of x would be some spatial positio of the object or robot represeted i d- compoets. 3 Liear Multivariate regressio with iputs The give hypothesis is X t+1 = AX t + BU t + M t For the sake of coviiece, lets replace X t+1 by Y. The equatio ow becomes : Y = AX t + BU t + M t where M t is N 0, Σ) This could be re-writte as Y N A T X + B T U, Σ ) Which represets a ormal curve with mea A T X + B T U. I order to get the values of A,B ad Σ, lets cosider the MLE of the above curve. The geeral likelihood fuctio is, ) 1 e 2 x ˆx)T x ˆx) P x) = 2π) Σ ) d for our fuctio, l A, B, Σ) = ) 1 1.e 2 y i A x i B u i ) T y i A x i B u i ) 2π) d Σ 12

We eed to fid the values of A,B ad Σ which maximizes the above fuctio. Therefore, lets take the derivate ad equate it to zero after takig the atural logrithm o both sides. ll A, B, Σ) = l 1 2π) d Σ )) 1.e 2 y i A x i B u i ) T y i A x i B u i ) ll A, B, Σ) = log 2π d 2 ) )+ 1 2 log )) 1 2 y i A x i B u i ) T y i A x i B u i ) ) log 2π d 2 ) ) term is a costat ad lets detote it by c. O further simplificatio, we get ll A, B, Σ) = c + 2 log ) Takig the trace: 1 2 y i A x i B u i ) T y i A x i B u i ) ) ll A, B, Σ) = c + 2 log ) = c + 2 log ) tr 1 2 y i A x i B u i ) T y i A x i B u i ) ) 1 2 tr y i A x i B u i ) y i A x i B u i ) T ) as tr ABC) = tr BCA) = tr CAB) Expadig the last term, we get. = c+ 2 log ) 1 2 tr yi y T i ) yi x T i A T ) yi u T i B T ) A xi y i T ) + A xi x T i A T ) + B ui u T i B T ) + A xi u T i B T ) B ui y i T ) + B ui x T i A T )) ) = c+ 2 log ) 1 2 tr + 1 2 tr 1 2 tr yi y i T ) ) + 1 Σ 2 tr 1 yi x T i A T )) + 1 Σ 2 tr 1 yi u T i B T )) A xi y i T ) ) 1 Σ 2 tr 1 A xi x T i A T )) 1 Σ 2 tr 1 B ui u T i B T )) A xi u T i B T )) + 1 Σ 2 tr 1 B ui y i T ) ) 1 Σ 2 tr 1 B ui x T i A T )) 13

Here, tr yi x T )) i AT = tr A xi y i T )) as tr A) = tr A) T ad as tr ABC) = tr BCA) = tr CAB) similarly, + 1 2 tr yi u T )) i BT = + 1 2 tr B ui y i T )) ad 1 2 tr A xi u T )) i BT = 1 2 tr B ui x T )) i AT Therefore, by simplicatio we get: = c+ 2 log ) 1 2 tr 1 2 tr ll A, B, Σ) = yi y i T ) ) +tr yi x T i A T )) +tr A xi x T i A T )) 1 Σ 2 tr 1 B ui u T i B T )) tr A xi u T i B T )) yi u T i B T )) I order to fid the values of A ad B ad Σ, lets take the derivatives ad equate it to zero. ll A, B, Σ) A = 0 = yi x T i ) 1 A 2 x i x T i ) + A x i x T i B ui x T ) i Here tr AB) A = B T ad tr CABA T ) A = C T AB T + CAB = A x i x T i = yi x T ) i B ui x T ) i = Similarly derivig for B, A = yi x T ) i B ui x T ) ) i x i x T i ) 1 ll A, B, Σ) B = = 0 = yi u T i B ) 1 B 2 u i u T i + B u i u T i = yi u T ) i A ) u i u T i A xi u T ) i xi u T ) i 14

= B = yi u T ) i A xi u T ) ) i u i u T i We have two equatios i terms of A ad B. We ca solve them to have the exact values of A ad B. Here Y,U ad X deotes the feature vectors of all the iputs. A = B = ) 1 ) )) ) )) y x T y u T u x T x u T u x T 1 x x T u u T x x T I u u T x x T ) )) ) )) y u T y x T x u T u x T x u T 1 u u T x x T u u T I x x T u u T A = Y X T ) XX T ) 1 Y U T ) UU T ) 1 UX T ) XX T ) 1) I XU T ) UU T ) 1 UX T ) XX T ) 1) 1 B = Y U T ) UU T ) 1 Y X T ) XX T ) 1 XU T ) UU T ) 1) I UX T ) XX T ) 1 XU T ) UU T ) 1) 1 Σ = y A x B u) y A x B u) T Puttig the values of x ad y from data2.txt ad solvig for A, B ad Σ we get A = Σ = 1.0000 9.1820e 07 0.5000 1.3454e 05 4.1199e 07 1.0000 2.2804e 05 0.5000 3.5668e 07 4.1072e 07 1.0000 1.8959e 06 6.2351e 07 5.8397e 07 6.7223e 06 1.0000 B = 0.1250 4.3179e 04 9.2692e 07 0.1250 0.5000 9.4947e 06 1.2225e 04 0.5000 0.0049 1.8349e 04 5.5485e 07 9.4324 05 1.8349e 04 0.0039 3.9692e 05 1.2200e 04 5.5485e 07 3.9692e 05 9.8998e 04 2.8346e 05 9.4324 05 1.2200e 04 2.8346e 05 9.6041e 04 b) Based o the results, do you have a guess of what the model above actually models? This system also models a time depedet system like questio 2. I the case of this system however, the additioal u t vector could be cosidered as the cotrol iput to the system which, alog with the curret state of the system ad the oise compoet could be used to predict the ew positio of the system at time t + 1. Cosiderig the motio plaig robot, where the positio at time x t+1 is derived from it s positio at 15

time t which correspods to the state of the robot at curret time, the additioal u t vector could be the user iput give to the robot. Such a iput may have differet effects o the robot depedig o the terrai ad other evirometal factors ad this models such a system. The oise compoet could oce agai be cosidered as the effects of the robot s eviromet, which prevet the robot from followig a perfect ad expected trajectory. Such effects could be wid chages, terrai chages or ay other such aturally occurig evets, which are hard to model ad predict, thereby take as a radom umber, simplifyig our system. 16