Building C303, Harbin Institute of Technology, Shenzhen University Town, Xili, Shenzhen,Guangdong Yonghui Wu, Yaoyun Zhang;

Size: px

Start display at page:

Download "Building C303, Harbin Institute of Technology, Shenzhen University Town, Xili, Shenzhen,Guangdong Yonghui Wu, Yaoyun Zhang;"

Adele Lewis
5 years ago
Views:

1 SOLUTIONS FOR PATTERN CLASSIFICATION CH. Building C33, Harbin Institute of Technology, Shenzhen University Town, Xili, Shenzhen,Guangdong Province, P.R. China Yonghui Wu, Yaoyun Zhang; Problem ) Suppose that P (ω max x) < c, so we get : P (ω i x) P (ω max x) <, i (,, c) c we obtain: c P (ω i x) < c c contradict with: c P (ω i x) b) According to minimum-error-rate: P (error) P (error x)p(x)dx c,ω i ω max P (ω i x)p(x)dx ( P (ω max x))p(x)dx P (ω max x)p(x)dx c) see above: P (error) P (ω max x)p(x)dx d) c p(x)dx c c c When P (ω x) P (ω x) P (ω c x),that s to say:p (ω max x) c obtain: P (error) c c (), then we

2 Problem 4 Case : g i (x) i c +, therefore { gi (x) > g j (x) i j and j c + g i (x) > g c+ (x) { p(x ωi )P (ω i ) > p(x ω j )P (ω j ) λs λr p(x ω i )P (ω i ) > λ s Σ c p(x ω i P (ω i )) { P (ωi x) > P (ω j x) P (ω i x) > λr λ s This is correspondent with the decision rule of Problem 3, thus in case discriminant function can get the minimum risk, that is it is optimal. Case : decide g c+ (x), therefore g c+ (x) > g i (x) j c + λs λr λ s c p(x ω i)p (ω i ) > p(x ω i )P (ω i ) p(x ω i )P (ω i ) λs λr λ s > P (ω i x) This result is correspondence with the decision rule of Problem 3, thus in Case discriminant function can get the minimum risk, that is it is optimal b g (x) p(x ω ) g (x) p(x ω ) g 3 (x) 3 8 (p(x ω + p(x ω ) c R 3 is changed from (, + ) to d g (x) g (x) g 3 (x) 3 π e (x ) 4 3 π e x π e (x ) + 3 π e x

3 R : g (x) g (x) and g (x) g 3 (x) R : g (x) g (x) and g (x) g 3 (x) R 3 : g 3 (x) g (x) and g 3 (x) g (x) R : x < + 3 ln, x > ln 3 3 R : + 3 ln 3 < x < ln 3 Problem H(P (x)) (p(x) ln(p(x))dx ε[ln( p(x) )] ε[ln( p(x) )] d ln(π) + ln( Σ ) + ε[( x µ ) t Σ ( x µ )] As we know, when x i are independent, we have this: ( x µ ) t Σ ( x d µ ) ( x i µ i ) N(, ) δ i therefore, ε[( x µ ) t Σ ( x µ )] ε[ so, H(p(x)) d ln(π) + ln( Σ ) + d Problem 3 d ( x i µ i δ i ) ] d ε[( x i µ i ) ] d δ i x (.5,, ) t [ p(x ω) exp ] (π) 3 Σ (x µ) t Σ (x µ) Σ Σ

4 (x µ) t Σ (x µ) t t p(x ω) e (π) 3 () b Σ λi Then we calculate the eigenvalues: λ 5 λ 5 λ ( λ) [ (5 λ) 4 ] then, we calculate its eigenvetors: λ, λ 3, λ 7 Λ 3 7 e λ e 5 5 x x x 3 x x x 3, x 5x + x 3 x + 5x 3 x x x 3 5x + x 3 x x + 5x 3 x 3 let x, we get: then we get: x, x 3 e

5 Similarly, x 5x + x 3 x + 5x 3 Hence: x 5x + x 3 x + 5x 3 e 3x 3x 3x 3 e 3 7x 7x 7x 3, then we get : x, x x 3, letx, then :, by nomalization, e, then we get : x, x x 3, letx, then :, by nomalization, e 3 Φ A w ΦΛ c x w A t w(x µ) t t.5.5

6 d The square of the Mahalanobis distance from x to µ is r (x µ) t Σ (x µ). The square of the Mahalanobis distance from x w to is ( rw x t wx w Thus, r r w. e p(x N(µ, Σ)) p(x ) if x T t x, then µ n x k n k Σ ) exp[ (π) d Σ (x µ) t Σ (x µ)] x k n T t k (x k µ )(x k µ ) t k x k T t µ k T t (x k µ)(x k µ) t T k T t [ k T t ΣT Thus, we have p(t t x N(T t µ, T t ΣT )). (x k µ)(x k µ) t ] T p(t t x N(T t µ, T t ΣT )) (π) d Σ exp (T t x T t µ) t (T t ΣT ) (T t x T t µ) (π) d Σ exp (xt T µ t T )(T (T t Σ) (T t x T t µ) (π) d Σ exp (xt µ t )T T Σ (T t ) T t (x µ) (π) d T t ΣT exp (x µ)t Σ (x µ) exp (π) d Σ T (x µ)t Σ (x µ) thus for some T, we have: p(x µ, Σ) p(t t x N(T t µ, T t ΣT ))

7 f Since ΣΦ ΦΛ, so Σ ΦΛΦ, meanwhile, Φ is a symmetric matrix, thus, Φ Φ t. A t wσa w (ΦΛ ) t Σ ) Problem 5 From Eq. 59 Λ Φ t ΦΛΦ t ΦΛ Λ ΛΛ I g i (x) (x µ i) t ( ) (x µ i ) + ln p(ω i ) Xt ( ) X + Xt ( ) µ i + µt i( ) X µt i( ) µ i + ln p(ω i ) The quadratic term is independent of i, and is a symmetrical matrix, it can be rewrite as: g i (x) Xt ( ) µ i + µt i( ) X µt i( ) µ i + ln p(ω i ) (µt i( ) X) t + µt i( ) X µt i( ) µ i + ln p(ω i ) µ t i( ) X µt i( ) µ i + ln p(ω i ) (( ) µ i ) t X µt i( ) µ i + ln p(ω i ) W t i X + W i b The decision surface for a linear machine is defied by: g i (x) g j (x)

8 that is: (( ) µ i ) t X µt i( ) µ i + ln p(ω i ) (( ) µ j ) t X + µt j( ) µ j ln p(ω j ) [(( ) µ i ) t (( ) µ j ) t ]X (µt i( ) µ i µ t j( ) µ j ) + ln p(ω i) p(ω j ) [( ) (µ i µ j )] t X (µ i µ j ) t ( ) (µ i + µ j ) + p(ω i) p(ω j ) [( ) (µ i µ j )] t (X (µ i + µ j ) + [( ) (µ i µ j )] t (X (µ i + µ j ) + W t (X X ) Problem 43 ln p(ω i) p(ω j ) (µ i µ j ) [( ) (µ i µ j )] t (µ i µ j ) ) ln p(ω i) p(ω j ) (µ i µ j ) (µ i µ j ) t ( ) (µ i µ j ) ) P ij represents the probability of the ith component of x in the state of nature ω j b) P roof According to section.4., the minimum-error-rate classification can be achieved by use of the discriminant functions: X is binary-valued, we obtain: g i (x) ln p(x ω i ) + ln P (ω i ) g i (x) ln p(x ω i ) + ln P (ω i ) d ln P X i ij ( P ij) X i + ln P (ω i ) d P ij X i ln + P ij d X i ln( P ij ) + ln P (ω i )

Bayes Rule for Minimizing Risk

Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead