Multivariate Least Weighted Squares (MLWS)

Size: px

Start display at page:

Download "Multivariate Least Weighted Squares (MLWS)"

Aleesha Hutchinson
5 years ago
Views:

1 () Stochastic Modelling in Economics and Finance 2 Supervisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 12 th March 2012

2 Contents

3 1 1 Introduction 2 3 Proof of consistency (80%) 4 Appendix (60%) 5 n consistency

4 2 Definition 1 Let us define the multivariate least weighted squares () estimator of the matrix of regression coefficients B as ˆB n,w = arg min (B,Σ) D n ( ) s 1 w d(s) 2 (B, Σ), (1) n s=1 where d (1) (B, Σ) d (2) (B, Σ) d (n) (B, Σ) is the ordered sequence of the residual Mahalanobis distances.

5 3 For any m N denote by Π m the set of all permutations of {1,..., m} and for any π Π n, π j stands for the j-th coordinate of the vector π. For any (B, Σ) D put π(b, Σ, i) = j {1,..., n} di 2 (B, Σ) = d(j) 2 (B, Σ) (2) and π(b, Σ) = (π(b, Σ, 1),..., π(b, Σ, n)). Now we can expess the equation (1) in the equivalent form ˆB n,w = arg min (B,Σ) D n ( ) π(b, Σ, i) 1 w di 2 (B, Σ), (3) n

6 4 For an arbitrary π Π n denote the function we want to minimize by MF (B, Σ, π) = n ( ) πi 1 w di 2 (B, Σ). (4) n To prove the existence of the solution of our minimization problem (1) we can follow the steps of the proof for LWS presented in Víšek (2008) and Víšek (2009). Remark For any ω 0 Ω the estimator is equal to the MWLS estimator with properly arranged weights.

7 5 The last remark gives us the way how to compute. We can minimize MF (ˆB n,π MWLS, Σ n,π MWLS, π) over the set of all permutations Π n. Asymptotic complexity of this algorithm is n!. Impossible to compute for n larger then ! = It is better to modify iterative algorithm for LWS presented in Mašíček (2004).

8 6 1 ˆB = 0, ˆΣ = I, ˆ MF = + 2 choose initial values B 1, Σ 1, j := 1 3 compute Mahalanobis distances d i (B j, Σ j ) 4 assign weights to observations (smaller MD = higher weight) 5 B j+1, Σ j+1 MWLS with weights computed in step 4 6 B j B j+1 ; j:=j+1 and step 3 7 if MF (B j, Σ j, w j ) < MF ˆ then ˆB := B j, ˆΣ = Σj, MF ˆ = MF (B j, Σ j, w j ) 8 repeat step 2 until terminal condition 9 return values ˆB, ˆΣ, ˆ MF

9 7 Previous algorithm is still too slow. Possible improvements: - choice of initial values - heuristic on inperspective

10 8 Choice of initial values Randomly choose m = 1000 subsets - H i of size h n/2, n/4 and compute ˆB = B LS (H i ) and ˆΣ = Σ LS (H i ). - H i of size h = p + q and compute the coefficients of the hyperplane through H i. If H i does not define a unique hyperplane extend H i by adding random observations until it does. ˆB =coefficients of this hyperplatne ˆΣ = i.

12 10 Heuristic on inperspective - perform 2 iteration steps with all m - choose a = 10 with lowest MFs - perform 30 iteration steps with a - choose one with lowest MF - iterate until convergence (or terminal condition)

13 We have to enlarge our notation ˆB n,w = arg min (B,Σ) D Let us denote n ( ) π(b, Σ, i) 1 w di 2 (B, Σ), n F B,Σ (r) = P ( ) Y 1 B X 1 < r (5) F (n) B,Σ (r) = 1 n n I (d i (B, Σ) < r) (6) and realize, that d i (B, Σ) and di 2 (B, Σ) are on the same position in the ordered sequences. 11

14 12 Therefore we can write ˆB n,w = arg min (B,Σ) D n w ( ) F (n) B,Σ (d i(b, Σ)) di 2 (B, Σ), (7) After some computation we can express the system of normal. NE Y,X,n (B) = n w ( ) F (n) B,Σ (d i(b, Σ)) X i (Y i B X i ) = 0 (8) In the one dimensional case, the main idea of proving consistency of LWS is to approximate empirical distribution funciton of absolute value of residuals by a continuous distribution function. We try to prove consistency in the simillar way. We shall need following assumptions

15 13 Condition C1 The weight function w : [0, 1] [0, 1] with w(0) = 1 is absolutely continuous and nonincreasing, with derivative w (α) bounded from below by L, (L > 0).

16 14 Condition C2 The sequence {(X i, U i ) } is a sequence of i.i.d. (p + q) dimensional random vectors distributed accordig to distribution function F X,U (x, u) = F X (x)f U (u). F U (u) is absolutely continuous with density f U (u) bounded by U e. Finally, there is a q > 1 so that E X 1 4q <. Symbol X = (tr(xx )) 1 2 denotes the Frobenius norm.

17 15 Now we want to prove, that all of normal are bounded. Denote ( ) B F B X (r) = P X i X < r (9) and the corresponding empirical distribution function F (n) B X (r) = 1 n n I i B ( B X i X i B ) < r (10) To prove the boundedness of, the following lemmas will be useful.

18 16 Lemma A1 (see Víšek (2009)) Under Condition C2 the distribution function F B X (r) is uniformly in r R, uniformly continuous in B, i.e for any δ > 0, there is ξ (0, 1) so that for any B (1) and B (2) such that B (1) B (2) < ξ sup F B (1),Σ (r) F B (2),Σ (r) δ r R sup F (B r R (1) ) X (r) F (B (2) ) X (r) δ

19 17 Lemma A2 (see Víšek (2009)) Let the Condition C2 hold. For any ɛ > 0 there is a constant K ɛ and n ɛ N so that for all n > n ɛ P([ω Ω : sup u R + (n) sup n F B,Σ (u) F B,Σ(u) < K ɛ ]) > 1 ɛ B R p q

20 18 Lemma A3 (see Víšek (2009)) Let the Condition C2 hold. For any ɛ > 0 there is a constant K ɛ and n ɛ N so that for all n > n ɛ P([ω Ω : sup u R + (n) sup n F (u) F B R p q B X B X (u) < K ɛ]) > 1 ɛ

21 19 For any λ R + and any a R put γ λ,a = sup F B X (a). (11) B =λ The surface of the ball {B R p q ; B = λ} is compact, there is B γ,a {B R p q ; B = λ} so that γ λ,a = F B γ,ax (a). (12) ( B For B ) = B. B 1 F B X (u) = P 2 X 1 < r = ) P ( B X1 2 B 2 < r B 2 = F B X ( r B ) It means, we can without loss of generality, consider γ 1,t.

22 20 Lemma 1 Under Conditions C1 a C2, there is a > 0 and b (0, 1) so that a.(b γ 1,a )w(b) > 0. (13)

23 21 Lemma 2 Let Conditions C1 and C2 be fulfilled. Then for any ɛ > 0 there is θ > 0, > 0 and n ɛ, N such that for any n > n ɛ, ( ) P ω Ω : inf 1 B θ n B NE Y,X,n (B) > > 1 ɛ

24 22 Proof 1 n B NE Y,X,n (B) = 1 n ( ) = n B w F (n) B,Σ (d i(b, Σ)) X i (Y i B X i ) 1 n ( ) w F (n) n B,Σ (d i(b, Σ)) B X i X i B 1 n ( ) w F (n) n B,Σ (d i(b, Σ)) B X i Y i

25 23 First of all, we shall pay attention to the quadratic part 1 n n w ( ) F (n) B,Σ (d i(b, Σ)) B X i X i B (14) and we will find a positive definite quadratic form which uniformly for B outside the ball of diameter 2 and with probability at least 1 ɛ is the lower bound for (14). This quadratic form is then for B with enough large norm (> θ), larger than the linear part in B.

26 24 Fix a > 0, b (0, 1) existenece of which was shown in Lemma 1 and Σ = 1. Further for any B denote the set of indices, for which F (n) B,Σ (d i(b, Σ)) < b by I b (B). We can easily verify that the edf overcomes b not later than at its [nb] + 1 jump. It means #I b (B) [nb] (15) i I b (B) implies w(f (n) B,Σ (d i(b, Σ))) w(b) (16) Lets denote I a (B) the set of indices, for which B X i X i B < a and estimate #I a (B).

27 25 Fix ɛ > 0, δ > 0 and put κ = a(b γ 1,a)w(b) 2 Then according to Lemma 1, κ > 0. Employing Lemma A3, find n 1 N so that for all n > n 1 we have ( P ω Ω : sup sup F (n) (u) F B R p q B u R X B X (u) and dentote the corresponding set by B (1) n (17) ) κ > 1 ɛ aw(b) 3 (18) F (n) B X (a) = #I a(b) n (19)

28 Equation (18) imply for any n > n 1 and ω B (1) n #I a (B) = nf (n) B X (a) n(f B X (a) + κ aw(b) ) n(γ 1,a + κ aw(b) ) (20) Equation (20) holds only for {B R p q ; B = 1} Consider ω B n (1), n 1 > n and put C n (B) = {i : F (n) B,Σ (d i(b, Σ)) < b and B X i X i B > a} Then (15) and (20) imply that the number of indices of C n (B) is at least #C n (B) #I b (B) #I a (B) = nb n(γ 1,a + κ aw(b) ) = (21) = n(b γ 1,a κ aw(b) ) 26

29 27 Now employing law of large numbers let us find n 2 N and B n (2), P(B n (2) ) > 1 ɛ 3 such that for any n > n 2, ω B n (2) and 1 B X i X i B #C n (B) a i #C n(b)

30 28 Now we have for any n > max(n 1, n 2 ) any ω B n (1) B n (2) and any B = 1 1 n n w ( ) F (n) B,Σ (d i(b, Σ)) B X i X i B 1 n > a(b γ 1,a )w(b) κ = κ i C n(b) w(b)b X i X Consider now any B R p q ; B = θ 1 and put B = θ 1 B i B B X i X i B = θ 2 B X i X i B (22) We have proved that for any n > max(n 1, n 2 ) any ω B n (1) B n (2) and any B R p q B = 1 #I a (B ) n(γ 1,a + κ aw(b) ) (23)

31 29 Recall that and for any i I b (B) #I b (B) > [nb] (24) w(f (n) B,Σ (d i(b, Σ))) w(b) (25) Now from (22), (23), (24) and (25) 1 n ( ) w F (n) n B,Σ (d i(b, Σ)) B X i X = θ 2 1 n n w i B = ( ) F (n) B,Σ (d i(b, Σ)) B X i X i B θ 2 (a(b γ 1,a )w(b) κ) = θ 2 κ (26)

32 30 So we have proved, that for any n = max(n 1, n 2 ) and ω B n (1) B n (2) and any B 1 1 n ( ) w F (n) n B,Σ (d i(b, Σ)) B X i X i B θ2 κ = B 2 κ (27) Now we shall consider the linear term 1 n ( ) w F (n) n B,Σ (d i(b, Σ)) B X i U i

33 31 Denote τ = E{ U 1 X 1 }. Then find n 3 N so that for any n > n 3, there is B n (3), P(B n (3) ) > 1 ɛ/3 and for any ω B n (3) 1 n ( ) w F (n) n B,Σ (d i(b, Σ)) B X i U i 1 n n B X i U i 2τ B (28)

34 32 Consider n = max(n 1, n 2, n 3 ) and ω B n (1) B n (2) B n (3). It follows P(B n ) > 1 ɛ and (26),(27) imply that for any B, B 1 and for κ we have defined 1 n B NE Y,X,n (B) > B 2.κ 2τ B Then for any > 0 there is a θ > > 0 such that for any B R p q, B θ with probability at least 1 ɛ 1 n B NE Y,X,n (B) >.

35 33 Let Conditions C1 and C2 be fulfilled. Then for any ɛ > 0, δ (0, 1) there is ρ > 0 and n ɛ,δ,ρ N such that for any n > n ɛ,δ,ρ we have P(ω Ω : sup B ρ 1 n n w ( ) F (n) B,Σ (d i(b, Σ)) B X i (Y i B X i ) ( B E w (F B,Σ (d 1 (B, Σ))) X 1 (Y 1 B X 1 ) ) < ) > 1 ɛ

36 34 Proof Let us fix ɛ, δ (0, 1), ρ > 0 and Σ > 1 and denote E X 1 = κ (1). Recalling that we have assumed B 0 = 0, we shall consider for B R p q, B ρ 1 n B NE Y,X,n (B) = 1 n = 1 n 1 n n w n w n w ( ) F (n) B,Σ (d i(b, Σ)) B X i (Y i B X i ) ( ) F (n) B,Σ (d i(b, Σ)) B X i X i B (29) ( ) F (n) B,Σ (d i(b, Σ)) B X i Y i

37 35 Let us again start with (29) and put τ (1) = δ/16κ (1) ρ 2 L. Due to Lemma A2 we can find n 1 N so that for any n > n 1, there is a set B n (1) such that P(B n (1) ) > 1 ɛ/8 and for any ω B n (1) sup sup F (n) B,Σ (r) F B,Σ(r) τ (1) (30) B R p q r R Employing the law of large numbers, find n 2 > n 1 so that for any n > n 2, there is a set B n (2) such that P(B n (2) ) > 1 ɛ/8 and for any ω B n (2) n X i 2 < 2κ (1) (31)

38 36 Since ten for any n > n 2 1 n sup B ρ n ( ( ) ) w F (n) B,Σ (d i(b, Σ)) w (F B,Σ (d i (B, Σ))) X i X i Lτ (1) n n X i X Lτ (1) i n 2κ(1) = δ 8ρ 2 We have for any n > n 2 and any ω B n (1) B n (2) 1 n sup B ρ n (w(f (n) B,Σ (d i(b, Σ))) w(f B,Σ (d i (B, Σ))))B X i X i B δ 8 (32)

39 37 δ Employing Lemma A1 and find for = such τ (2) > 0 16Lκ (1) ρ 2 that for T (ρ, τ (2) ) = { B (1) ρ, B (2) ρ, B (1) B (2) < τ (2)} we have sup (B (1),B (2) ) T (ρ,τ (2) ) r R sup F B (1),Σ (r) F B (2),Σ (r) < (33) then for any n > n 2 and any ω B n (1) B n (2) 1 sup (w(f n B (1),Σ (r i(b (2) ))) w(f B (2),Σ (r i(b (2) )))) (B (1),B (2) ) T (ρ,τ (2) ) B (1) X i X i B (1)

40 38 L ρ 2 n X i 2 δ 8 (34) The rest of the proof requires further notation and use of Hoelder s inequality and runs along similar line as the proof of in Víšek (2008).

41 39 I Mašíček, L. (2004). Diagnostika a senzitivita robustních modelů. MFF UK Prague: PhD. thesis. Víšek, J. Á. (2008). Consistency of the least weighted squares. Kybernetika(submitted). Víšek, J. Á. (2009). Consistency of the least weighted squares under heteroscedasticity. Kybernetika(submitted).

42 40 Thank You For Your Attention!

Multivariate Least Weighted Squares (MLWS)

Multivariate Least Weighted Squares (MLWS) () Stochastic Modelling in Economics and Finance 2 Supervisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 23 rd February 2015 Contents 1 2 3 4 5 6 7 1 1 Introduction 2 Algorithm 3 Proof of consistency