Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

Similar documents
Social Studies 201 Notes for March 18, 2005

Social Studies 201 Notes for November 14, 2003

Comparing Means: t-tests for Two Independent Samples

Clustering Methods without Given Number of Clusters

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

1. The F-test for Equality of Two Variances

If Y is normally Distributed, then and 2 Y Y 10. σ σ

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

Introduction to Laplace Transform Techniques in Circuit Analysis

Problem Set 8 Solutions

CHAPTER 6. Estimation

The Hassenpflug Matrix Tensor Notation

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

MINITAB Stat Lab 3

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

Lecture 8: Period Finding: Simon s Problem over Z N

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

THE THERMOELASTIC SQUARE

Solution to Test #1.

Lecture 7: Testing Distributions

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

SOLUTIONS TO ALGEBRAIC GEOMETRY AND ARITHMETIC CURVES BY QING LIU. I will collect my solutions to some of the exercises in this book in this document.

Linear Motion, Speed & Velocity

DIFFERENTIAL EQUATIONS Laplace Transforms. Paul Dawkins

Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer

List Coloring Graphs

SIMPLE LINEAR REGRESSION

Math 273 Solutions to Review Problems for Exam 1

Week 3 Statistics for bioinformatics and escience

SIMON FRASER UNIVERSITY School of Engineering Science ENSC 320 Electric Circuits II. Solutions to Assignment 3 February 2005.

Control Systems Analysis and Design by the Root-Locus Method

Chapter 4. The Laplace Transform Method

Bogoliubov Transformation in Classical Mechanics

MATEMATIK Datum: Tid: eftermiddag. A.Heintz Telefonvakt: Anders Martinsson Tel.:

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

: œ Ö: =? À =ß> real numbers. œ the previous plane with each point translated by : Ðfor example,! is translated to :)

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

DIFFERENTIAL EQUATIONS

Alternate Dispersion Measures in Replicated Factorial Experiments

What lies between Δx E, which represents the steam valve, and ΔP M, which is the mechanical power into the synchronous machine?

Given the following circuit with unknown initial capacitor voltage v(0): X(s) Immediately, we know that the transfer function H(s) is

Preemptive scheduling on a small number of hierarchical machines

Dimensional Analysis A Tool for Guiding Mathematical Calculations

Avoiding Forbidden Submatrices by Row Deletions

The Laplace Transform , Haynes Miller and Jeremy Orloff

SVM example: cancer classification Support Vector Machines

(3) A bilinear map B : S(R n ) S(R m ) B is continuous (for the product topology) if and only if there exist C, N and M such that

Chapter 12 Simple Linear Regression

Lecture 9: Shor s Algorithm

Moment of Inertia of an Equilateral Triangle with Pivot at one Vertex

THE SPLITTING SUBSPACE CONJECTURE

(b) Is the game below solvable by iterated strict dominance? Does it have a unique Nash equilibrium?

List coloring hypergraphs

Design By Emulation (Indirect Method)

Regression. What is regression? Linear Regression. Cal State Northridge Ψ320 Andrew Ainsworth PhD

Notes on Phase Space Fall 2007, Physics 233B, Hitoshi Murayama

LINEAR ALGEBRA METHOD IN COMBINATORICS. Theorem 1.1 (Oddtown theorem). In a town of n citizens, no more than n clubs can be formed under the rules

March 18, 2014 Academic Year 2013/14

Eigenvalues and eigenvectors

Correction for Simple System Example and Notes on Laplace Transforms / Deviation Variables ECHE 550 Fall 2002

Computers and Mathematics with Applications. Sharp algebraic periodicity conditions for linear higher order

Digital Control System

Inference for Two Stage Cluster Sampling: Equal SSU per PSU. Projections of SSU Random Variables on Each SSU selection.

Department of Mechanical Engineering Massachusetts Institute of Technology Modeling, Dynamics and Control III Spring 2002

MAP METHODS FOR MACHINE LEARNING. Mark A. Kon, Boston University Leszek Plaskota, Warsaw University, Warsaw Andrzej Przybyszewski, McGill University

An Inequality for Nonnegative Matrices and the Inverse Eigenvalue Problem

Lecture 10 Filtering: Applied Concepts

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get

NAME (pinyin/italian)... MATRICULATION NUMBER... SIGNATURE

Sampling and the Discrete Fourier Transform

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Math Skills. Scientific Notation. Uncertainty in Measurements. Appendix A5 SKILLS HANDBOOK

1 Routh Array: 15 points

Chapter 13. Root Locus Introduction

Singular perturbation theory

Statistics and Data Analysis

Cumulative Review of Calculus

Optimal Coordination of Samples in Business Surveys

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.

Electronic Theses and Dissertations

Statistical machine learning and kernel methods. Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis

EE Control Systems LECTURE 14

AP Physics Charge Wrap up

Symmetry Lecture 9. 1 Gellmann-Nishijima relation

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

One Class of Splitting Iterative Schemes

Convex Hulls of Curves Sam Burton

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

Physics 741 Graduate Quantum Mechanics 1 Solutions to Final Exam, Fall 2014

Root Locus Diagram. Root loci: The portion of root locus when k assume positive values: that is 0


FUNDAMENTALS OF POWER SYSTEMS

Assessing the Discriminatory Power of Credit Scores under Censoring

TUTORIAL PROBLEMS 1 - SOLUTIONS RATIONAL CHEREDNIK ALGEBRAS

Notes on the geometry of curves, Math 210 John Wood

online learning Unit Workbook 4 RLC Transients

Advanced Digital Signal Processing. Stationary/nonstationary signals. Time-Frequency Analysis... Some nonstationary signals. Time-Frequency Analysis

USPAS Course on Recirculated and Energy Recovered Linear Accelerators

Transcription:

Suggetion - Problem Set 3 4.2 (a) Show the dicriminant condition (1) take the form x D Ð.. Ñ. D.. D. ln ln, a deired. We then replace the quantitie. 3ß D3 by their etimate to get the proper form for thi dicriminant. (b) Here uing the output notation C œ ß C œ for clae 1 and 2 repectively, you want to minimize ÐC x Ñ œ Ðy Ñ, 3œ 3! 3 Ô x x! x Ô where œ, letting œ Ö Ùand œ Ò1 Ó œ ã ã Þ ã Õ Õ x x In general vector/matrice with a ~ on them can repreent vector augmented with 1' (and in ome cae!'). Ue the uual leat quare o that approximate. œð ) y Thu œ y. ÐÑ Firt conider the right hand ide, y. Without lo you can arrange the data o the firt example Ð ßCÑ are in the firt cla and the lat are in the econd. x 3 3 Thu how the right ide of (1) become: 1 1 y y y! œ œ œ y y. Meantime how y œ x3 x3 œ Ð.. Ñ 4œ 4œ. So

! y œ Ð.. Ñ. ÐÑ To calculate the left ide of (1), you can write Ô x ã x ë œ œ x, Ö ã ã Ù Õ x Let 1 1 Më œ œ Ò1 M 1 1 ] i.e. M i the matrix whoe firt row are copie of., and whoe lat row are copie of.. Here i alway a column vector of length with all 1'. Then how 1 1 1 1 1 œ = Ò Ó œ 1.... Thu how! ë.. œ.. Now from the relationhip Ð Ñ œ!.. Þ Ð.. Ñ! y œ! œ Ò 1 Ó œ 1!, if you average over the entrie C 3 of y, how o!œ1 y œ 1 œ Ð.. Ñ,!!!œ Ð.. Ñ,!

o now œ..! Š, You can write Ô! œ Þ Ð$Ñ Õ Ð.. ÑŠ.. œ Ð MÑ Ð MÑ M M M M But how Ð MÑ Ð MÑ œ Ð Ñ D M œ x. x. œ.... 4œ 4 4 4œ Thu So by (3) above, how M M œ.... Þ œð ÑD..... Ô! œ Ö Ù Þ Ð Ñ Ð Ñ Õ œ.. Œ.. D.... (4) Now how the bottom term coefficient i Ð Ñ Ð Ñ.. Š.. D.... œ Ð Ñ D Ð.. ÑÐ.. Ñ

Now ue (1), (2), (4) and (5). œ Ð Ñ D DF (5) (c) It follow that D œð.. ÑÒÐ.. Ñ ÓœÒÐ.. Ñ ÓÐ.. Ñß F which i clearly in the direction of Ð.. Ñß ince ÒÐ.. Ñ Ó i a calar (why?) Finally from (4.56), œ ÐÐ ÑD Ñ Ð.. Ñ ÒÐ.. Ñ ÓÐ.. Ñ œðcalarñ Þ D Ð.. ÑÞ (d) Changing the coding for the two C value tranform the pair of number and repectively into another pair + and, of poible C value. Show that there i a linear calar tranformation C œ-c.œ0ðcñuch that 0 Š œ+ and 0 Š œ,. What are - and.? Now how that if y ha only entrie and, then in their place the vector y œ-y. 1 will have + and, repectively Þ Further how that if we replace y by y in the dataet Ö x ßC, then we will have a new 3 3 3œ y œ œ Ð Ñ y œ Hy œ HÐ-y. 1 Ñ œ -y. 1. (why i H1 œ 1à recall H i a projection). Thu the tranformation to y i exactly the ame a the tranformation to y above. Show in fact that the tranformation act in exactly the ame way on each component of y. Now how that the final election of Ô C C clae baed on the new y œ Ö Ù will be baed on each entry C3, and whether it i ã Õ C cloer to + (chooe cla 1 Ñ or, (chooe cla 2). Show that C 3 i cloer to + iff C3 i cloer to. (e) Now you have and! and the regreion function 0ÐxÑœ! x.

From part (c), œ5d Ð.. Ñ for ome 5. Thu how from above that œ.. 5D! Š Ð.. Ñ. ecall the group target (y-value) on which we have trained the regreion are: cla À Cœ à cla2 À Cœ. For an input tet vector x, how that the correponding C will be in cla 1 if 0ÐxÑ i cloer to, than to, and otherwie cla 2. Show C hould be aigned to cla 2 if 0ÐxÑ Š. Show from above that the criterion for cla 2 aignment i: 0ÐxÑœ Š. x 5. D Ð.. Ñ Š or x D Ð Ñ. Ð Ñ. Š.. D.. Š Þ 5 I thi the ame a the LDA criterion in (a)? Now aume happen then? œ œ Î - what 4.3 ecall the LDA criterion for chooing the group 6 out of group ß ÞÞÞß O given a tet feature vector x i 6œ arg max $ ÐxÑ, 5 5 i.e., finding the 6œ5 which make $ 5 ÐxÑ the larget. Here a uual $ 5 ÐxÑœx D. 5. 5 D ln 15Þ Thi problem i related to the dicuion in ection 4.2 involving the ue of a regreion approach to ditinguih among the O prediction. Thi work by chooing target (repreentative of the O clae to be et equal to the repone variable C) a follow. For a vector x whoe cla i 5, we chooe the repone variable to be y œ Ð!ß ÞÞÞß!ß ß!ÞÞÞß!Ñ (a row vector), with a 1 only in the 5 >2 poition. Then if we are given a training et 7 œöðx3ßy3ñ 3œ, the repone are no longer C3 œ 0 or 1, but >2 vector y with a 1 in the 5 poition if the cla aigned to x i group 5. 3 3 Ð(Ñ

A hown in the text, the appropriate regreion here work exactly a in the cae the repone are calar, except that the uual vector C 3 Ô C C y œ Ö Ù ã ÕC with each row a calar (0 or 1) repreenting the cla of the Ô y y Y œ Ö Ù, ã Õy x 3 i replaced by a matrix with each cla indicator y 3 indicating the cla through the poition of it only entry 1 (note again that each i a row vector). By adding the uual column of 1' we form y 3 Ô y Y œ ã ã Þ Õ y Otherwie the regreion proce i the ame, with the vector y replaced by the matrix Y. Now following the regreion dicuion in the text, the uual etimated value y of y i replaced uing the ame formula to get an etimated value Y of Y: Y œ Ð Ñ Y, (7) which ha exactly the ame form a tandard regreion, with y replaced by Y. Notice with Y œ B (8a) B œ ( Ñ Y. (8c) Note that a in our general regreion dicuion the matrix i aumed to already contain an initial column of 1'. ow-wie, defining y 3 to be the 3 >2 row of Ô y y Y œ Ö Ù, equation (8a) i equivalent to y œ x B ã 3 3 ; here and elewhere the tilde ~ on Õy a vector mean we have added 1' in the initial poition: x 3 œ x. 3

Note that wherea previouly wa given by the ame formula (8c), now B i a matrix intead of a vector. Wherea previouly we had C 3 œ x 3 a the etimated value of C3 within the dataet, we now have intead where y 3 i a vector (the 3 >2 row of Y ). y œ x B, (9) 3 3 We are aking what would happen if we imply replace thi training et 7 œöðx3ßy3ñ with a new training et replacing the input vector x 3 by the correponding etimate y3, o that the training et now look like 7 w œöðy3 ßy3Ñ 3œ. Note that we are uing the tranpoe in y 3 becaue we want it to be a column vector (why?), replacing the original column input vector x 3. Equivalently, we are replacing the training matrix Ô x Ô y œ ã ã by the repone vector et Y œ ã ã. Õ x Õ y We wih to how that if we ue thi new dataet in both training and teting, then we will till get the ame cla prediction for a new tet vector x, but uing LDA (not regreion here). Show that given that 6œarg max 5$ 5ÐxÑ, we jut need to check how the computation of the $ 5 ÐxÑ change uing the new data et. Note the training data now have the form 7 w œöðy ßy Ñ œ ÖÐB x ßy Ñ Þ 3 3 3œ 3 3 3œ The original dicriminant function ha the form $ 5 Ðx) œ x D. 5. 5D. 5 ln 1 5 ß Ð!Ñ where where 1 œ Þ Show the new dicriminant function $ ÐxÑ ha the form 5 5 5 $ 5Ðy ) œyd. 5. 5 D. 5 ln 1 5,. 5 œ y œ B x œ B Þ ë4 ë x4 œ B.ë 5 4œà1Ð4Ñœ5 4 5 5 4œà1Ð4Ñœ5 5 4œà1Ð4Ñœ5 D œ 4œ Ð O y 4. 1Ð4ÑÑÐy4. 1Ð4ÑÑ where, a uual, thi etimator repreent the pooled etimate of the variance of the

vector of interet baed on their individual group, but with x replaced by y. Here >2 1Ð4Ñ repreent the group (out of the O total) of the 4 ample. 3 3 You wih to how that the modified dicriminant function make the ame deciion a the original one, i.e., that whenever y œ x B, But note that and how $ 5 Ðy Ñ 6 5 6 $ 5 Ðy Ñ $ Ðy Ñ iff $ ÐxÑ $ ÐxÑÞ ha the form (10), with. œð.ë B œ B 5 5 D œ 4œ Ð O y ).ë 5 3. ÑÐy3. 1Ð3Ñ Ñ 1Ð3Ñ œ O B Ðx 3. ÑÐx3 Ñ B 1Ð3Ñ. 1Ð3Ñ 4œ œ B D B where, becaue our vector are augmented to have a 1 in the firt poition (and thu are of length : ) we mut alo augment the covariance matrix D in order to be Ð: Ñ Ð: Ñ, by adding a firt row and firt column of 0'. That i, we define! 0: D œ ß 0 D : where 0 : i a column vector of length : with all zeroe, and the upper left corner i Þ Of coure D i the etimator of D. Thu how we can write $ 5 Ðy ÑœyD.. D. ln 1 5 5 5 œ x B B B B B B Ð D Ñ.. Ð D BÑ B. ln1 5 5 5 5 Ð!+Ñ Now to define the quare root of a matrix. For any : : quare ymmetric invertible : matrix E, aume that Ö- 3 ß a 3 3œ are it eigenvalue and correponding eigenvector. For a function 0ÐBÑ define 0ÐEÑ to be the matrix with the ame eigenvector a 3, but Î Î : eigenvalue 0Ð-3Ñ. Thu E would have Ö-3 ß a 3 3=1 a it eigenvalue-eigenvector pair.

Now we replace our dataet x Î 3 Ä D x3. Thi lead to the replacement Ä D. Clearly Y in (7) doe not change under thi tranformation (why?). However, now with the tranformed value, how we have Î Î Î Î ZÐÑœZÐD ÑœD DD œm. Thi tranformed dataet thu lead to the ame function which ha changed to (now D œm) Y, but give a linear dicriminant $ 5 ÐxÑœx. 5. 5. 5 ln 1 5, (11) Show thi i actually identical to that before remember that the new x equal the old x Î time D and we have alo computed the. 5 from the new dataet; thu the clae obtained from uing the dicriminant in (11) (uing the new tranformed data et) will be identical to what the predicted clae were before. Alo how the tranformed dicriminant function (now changing thee x into the unchanged y and forming the reulting dicriminant) mut be exactly the ame a when we replaced by y before, ince the dataet Öy 3 ß y3 3œ i identical (ee (9a)). Thu we need only how the reult of thi problem for the new (tranformed) dataet Öx3ß y3, where the new are defined a above. x 3 Show uing the ame argument we will again replace our dataet o that each current datapoint x will be replaced by x., with., i.e. the overall current mean of all x (regardle of cla). Show thi doe not change the covariance D, and in term of the new dataet the identical dicriminant function will now be 3 3 3 $ 5 ÐxÑ œ Ðx. Ñ Ð. 5. Ñ Ð. 5. Ñ Ð. 5. Ñ ln 1 5 (13) (again with x obtained from the new mean-ubtracted dataet). Show by tranlating all data by the ame amount. will not change the relative ize of the dicriminant function, and o if we replace the old dicriminant function (13) by $ 5 ÐxÑœx. 5. 5. 5 ln 1 5 (15) then clearly thi will not affect whether $ 5 ÐxÑ $ 6 ÐxÑ or not. Furthermore, ince Y i a tandard regreion etimator (jut with multiple column), how a tranlation of the dataet will not affect the prediction, o that with thi new dataet the Y we obtain i identical to the previou one.

Thu at thi point you have reduced the problem to having a dataet with empirical mean 0 and tandard deviation 1 for each coordinate, and we till have the ame dicriminant function (15) and etimator Y, derived in the ame way from the new data and the outcome matrix Y. Thi mean we are uing the dicriminant function (11) above à how equation (10a) become $ 5 Ðy Ñœx BB Ð B B BB Ð B ). 5. B 5 ). 5 ln. 5Þ But how that H B ÐB B) B i jut the projection onto the column pace of B (ee ' ' the dicuion on p. 46 of the hat function, which project y onto the column pace of, giving y). Now how that. 5 i in the column pace of B. Note our aumption of covariance M and mean! for the x3, it follow that œm, o B œð Ñ Y œ YÞThu the 5 >2 column b5 of B i jut b5 œ C5, where C5 denote the 5 >2 column (not >2 row) of Y. But from the definition of œ Ò x x ÞÞÞ xó and C5 (whoe 3 entry i 0 unle x i in cla 5), the column b mut be jut b œ x, i.e., a multiple of 3 5 5 3 3 K. 5. Thu, clearly all. 5 are in the column pace of B, and hence H. 5 œ. 5 for all 5. Thu by (17) how $ 5 Ðy Ñ œ x L. 5. 5 L. 5 ln 1 5 œ x. 5. 5. 5 ln 15, that i, the dicriminant from the Y -baed dicriminant function give identical value to the dicriminant (15), which we have hown give identical choice to the dicriminant baed on the original dataet, a deired. 5