The Missing-Index Problem in Process Control

Size: px

Start display at page:

Download "The Missing-Index Problem in Process Control"

Aileen Craig
5 years ago
Views:

1 The Missing-Index Problem in Process Control Joseph G. Voelkel CQAS, KGCOE, RIT October 2009 Voelkel (RIT) Missing-Index Problem 10/09 1 / 35

2 Topics 1 Examples Introduction to the Problem 2 Models, Inference, Likelihoods, EM Algorithm 3 Examples of the Method 4 Summary, Future Work Paper at Voelkel (RIT) Missing-Index Problem 10/09 2 / 35

3 Example 1. 8-spindle machine CP = Cyclic Permutation case Sample J=8 consecutive parts Sampled in order of product, but rst index value not known Index j=1,...,j? ? ? 3... J possibilities. Voelkel (RIT) Missing-Index Problem 10/09 3 / 35

4 Example 2. 4-cavity machine GP = General Permutation case Sample the J=4 parts from the shot No order to the parts Index j=1,...,j? ? ? ? 4... J! possibilities. Voelkel (RIT) Missing-Index Problem 10/09 4 / 35

5 Example cavity mold SP = Sampled Permutation Case Sample J=4 parts at random from the N=64 parts Index j=1,...,j? ? ? 3... Will not be discussed in this talk. N! (N J)! possibilities Voelkel (RIT) Missing-Index Problem 10/09 5 / 35

6 Models Index-Known Case General model we consider, if index j were known, is Y ij = + i + j + " j(i) + i ; i = 1; 2; : : : ; I; j = 1; 2; : : : ; J P i i = P j j = 0; " j(i) N 0; 2 ; i N 0; 2 ; ind r.v. s ( i : for non-stochastic events, e.g. mean drifts/shifts) 1 Injection Molding: j =cavities, " j(i) =within-shot variation, i =across-shot variation 2 Multiple-spindle machine: j =spindles. Spindle parts as a group ) " j(i), i analogous to above. Spindle parts one at a time ) model excludes i. Our interest here lies with (; ). Voelkel (RIT) Missing-Index Problem 10/09 6 / 35

7 Models Index Unknown. CP Case If J = 3, index order is either (1; 2; 3), (3; 1; 2), or (2; 3; 1) Ordered set of these 3 vectors CP (3), indexed by k Model same as before except Y ij = + i + j + " j(i) + i =) Y ij = + i + jk (i)+" jk (i) + i Here, (1 k ; 2 k ; : : : J k ) is the (unknown) k th member of CP (J) E.g, k = 2 =) (1 k ; 2 k ; 3 k ) = (3; 1; 2), so Y i1 = + i + 3 +" 2(i) + i Reasonable: k is discrete uniform on 1; 2; : : : J for each i. Voelkel (RIT) Missing-Index Problem 10/09 7 / 35

8 Example CP Case Indices in (unknown) correct order Y1 Y2 Y Actual Y data (CP d) Y1 Y2 Y Voelkel (RIT) Missing-Index Problem 10/09 8 / 35

9 CP Case: Inference Y ij = + i + jk (i) + " jk (i) + i Objective: inference on = ( 1 ; 2 ; : : : ; J ) and (Inference on f i g and (or =J) made from Y i ) Note: information on is contained in contrasts within each time i, e.g. in Y ij Y i for j = 1; 2; : : : ; J Restriction on. We will use J = 0, not P j j = 0 Still identi able (at best) only up to cyclic permutation. Voelkel (RIT) Missing-Index Problem 10/09 9 / 35

10 CP Case: Inference Will not use the J Y ij Y i (correlated) contrasts for inference Instead, will use the J 1 Helmert contrasts: Z i1 = c 1 (Y i2 Y i1 ) Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) ::: PJ 1 Z i;j 1 = c J 1 Y ij j=1 Y ij = (J 1) Here, c j = 1= p 1 + (1=j) de ned so var (Z ij ) = 2 when the indices are correctly aligned Note: Y is I J, Z is I (J 1). Voelkel (RIT) Missing-Index Problem 10/09 10 / 35

11 CP Case: Inference The J 1 Helmert contrasts: Z i1 = c 1 (Y i2 Y i1 ) Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) Actual Y data (CP d) Y1 Y2 Y Z data (w/o the c j...) Z1 Z Voelkel (RIT) Missing-Index Problem 10/09 11 / 35

12 CP Case: Likelihood Y ij = + i + jk (i) + " jk (i) + i Will estimate (; ) using likelihood methods (focus: estimation) Let f (z; ) = density function of N ; 2. For J = 3: Z i1 = c 1 (Y i2 Y i1 ), Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) E [Z i1 ] = c 1 2k (i) 1k (i) E [Z i2 ] = c 2 3k (i) 1k (i) + 2k (i) =2 k is discrete uniform on 1; 2; 3, so i th likelihood contribution is 1=3 of f (z i1 ; c 1 ( 2 1 )) f (z i2 ; c 2 ( 3 ( ) =2)) + f (z i1 ; c 1 ( 1 3 )) f (z i2 ; c 2 ( 2 ( ) =2)) + f (z i1 ; c 1 ( 3 2 )) f (z i2 ; c 2 ( 1 ( ) =2)) Recall 3 = 0, but kept in here for symmetry Note: sum of J terms each term is a product of J 1 terms. Voelkel (RIT) Missing-Index Problem 10/09 12 / 35

13 CP Case: Likelihood Note: sum of J terms each is a product of J 1 terms General J case. Contribution at i to likelihood is 1 P J Q J 1 k=1 j=1 J f Pj i z ij ; c j h jk+1 (i) l=1 jl(i) =j. Voelkel (RIT) Missing-Index Problem 10/09 13 / 35

14 GP Case: Likelihood GP case is analogous to CP case For J = 3, now have J! = 6 permutations (1; 2; 3), (1; 3; 2), (2; 1; 3), (2; 3; 1), (3; 1; 2), and (3; 2; 1) Contribution to the likelihood at time i is 1=6 times f (z i1 ; c 1 ( 2 1 )) f (z i2 ; c 2 ( 3 ( ) =2)) + f (z i1 ; c 1 ( 3 1 )) f (z i2 ; c 2 ( 2 ( ) =2)) + f (z i1 ; c 1 ( 1 2 )) f (z i2 ; c 2 ( 3 ( ) =2)) + f (z i1 ; c 1 ( 3 2 )) f (z i2 ; c 2 ( 1 ( ) =2)) + f (z i1 ; c 1 ( 1 3 )) f (z i2 ; c 2 ( 2 ( ) =2)) + f (z i1 ; c 1 ( 2 3 )) f (z i2 ; c 2 ( 1 ( ) =2)) General J: sum of J! terms each is a product of J Next: back to CP case. 1 terms Voelkel (RIT) Missing-Index Problem 10/09 14 / 35

15 Finding MLE s The log-likelihood is very complex, even in simpler CP case: P I i=1 ln 1 P J Q J 1 J k=1 j=1 f Pj i z ij ; c j h jk+1 (i) l=1 jl(i) =j Maximizing usually fails for all but the simplest CP cases Some similarities to estimation of parameters in normal-mixture case So, consider use of EM algorithm However, our problem is both more and less complex than the normal-mixture problem More complex: likelihood term is sum of products, not just a sum Less complex: mixture probabilities are known. Voelkel (RIT) Missing-Index Problem 10/09 15 / 35

16 EM Algorithm For compactness, de ne (E [Z ij ] for each i) Pj jk = jk () = c j jk l=1 j l =j Then i th contr n to likelihood is 1 P J Q J 1 J k=1 j=1 f z ij; jk Idea behind EM algorithm: consider how likelihood would look if all the CP s were known ( no missing data ) De ne k (i) to be the actual permutation index at time i, and de ne all of these to be k = (k (1) ; k (2) ; : : : ; k (I)) Then incomplete data is Z = (Z 1 ; Z 2 ; : : : ; Z I ), where Z i = (Z i1 ; Z i2 ; : : : ; Z i;j 1 ) is the observed (transformed) data at time i, and complete data is (Z; k ). Voelkel (RIT) Missing-Index Problem 10/09 16 / 35

17 EM Algorithm The i th contr n to incomplete-data log likelihood is the complex 1 P ln J Q J 1 k=1 j=1 J f z ij; jk However, the i th contr n to the complete-data log likelihood is then simply P J 1 j=1 z ln f ij ; jk (i) () Simple index-known case: easy to maximize with respect to (; ) For example, if data indices rearranged so that k (i) = 1 for each i, then the MLE of 2 1 would simply be Y 2 Y1 Idea of EM algorithm: estimate the correct indices so complete-data log likelihood can be used. Voelkel (RIT) Missing-Index Problem 10/09 17 / 35

18 Cycle of EM Algorithm Each cycle of the EM algorithm requires that we 1 Use current estimates ^ c at iteration c, of = (; ) 2 Find conditional expectation of complete-data (i.e., (Z; k )) log likelihood, conditional on the incomplete data Z (using at ^ c to obtain conditional expectation) 3 Maximize the result in (2) with respect to, to get ^ c+1 4 Continue until convergence Questions: Initial estimates Expectation step (interesting!) Maximization step (also interesting!) Convergence criteria. Voelkel (RIT) Missing-Index Problem 10/09 18 / 35

19 EM Algorithm: Expectation step For EM algorithm, it is useful to write complete-data log-likelihood ` () = P I P J 1 i=1 j=1 z ln f ij ; jk (i) () as ` () = P I i=1 P J k(i)=1 (k (i) ; k (i)) P J 1 j=1 z ln f ij ; jk(i) () where (k; k ) = 1 if k = k and is 0 otherwise This makes conditional expectation easier to obtain k (i) is now part of a linear term. Voelkel (RIT) Missing-Index Problem 10/09 19 / 35

20 EM Algorithm: Expectation step Expectation step requires nding `em () = E [` () jz] (conditional expectation evaluated at ^ c ) Here, need to nd `em () = P I i=1 P J k(i)=1 E [ (k (i) ; k (i)) jz i ] P J 1 j=1 z ln f ij ; jk(i) () Voelkel (RIT) Missing-Index Problem 10/09 20 / 35

21 EM Algorithm: Expectation step Need to nd `em () = P I i=1 We can solve this. Find that P J k(i)=1 E [ (k (i) ; k (i)) jz i ] P J 1 j=1 z ln f ij ; jk(i) () E [ (k (i) ; k (i)) jz i ] = P ( (k (i) ; k (i)) = 1 jz i ) Q J 1 j=1 z f ij ; jk(i) () = P J Q J 1 k(i)=1 j=1 z f ij ; jk(i) () (i; k (i)), where (i; k (i)) = P i th CP has index k is called the responsibility of permutation k to observation i Evaluating at ^ c, we obtain estimates ^ c (i; k (i)) of the (i; k (i)). Voelkel (RIT) Missing-Index Problem 10/09 21 / 35

22 EM Algorithm: Expectation step responsibility ^ (i; k (i)) = ^P i th CP has index k Assume ^ c = ^ c ; ^ c = ((4; 1; 0) ; 1) [index k=1] f (z i1 ; c 1 ( 2 1 )) f (z i2 ; :::) + [index k=2] f (z i1 ; c 1 ( 1 3 )) f (z i2 ; :::) + [index k=3] f (z i1 ; c 1 ( 3 2 )) f (z i2 ; :::) Z i data (w/o the c j...). Just consider Z i1 for i = 1 and i = ( 3)? 1 3 (4)? 3 2 ( 1)? ( 3)? 1 3 (4)? 3 2 ( 1)?. Voelkel (RIT) Missing-Index Problem 10/09 22 / 35

23 EM Algorithm: Maximization step Next, we need to maximize `em () = P I i=1 with respect to = (; ) P J k(i)=1 ^ (i; k (i)) P J 1 j=1 z ln f ij ; jk(i) () Close look at sum: portion is mathematically equivalent to a weighted sum of squares in a linear-regression framework, so associated matrix methods can be directly applied However, instead of the usual I terms, we now have IJ (J 1) terms. Voelkel (RIT) Missing-Index Problem 10/09 23 / 35

24 EM Algorithm: Maximization step For matrix machinery, de ne X (predictor) and W (weight) matrices and U (response) vector. For J = 3, here are portions for a given i. (k (i) ; j) X portion diag (W) portion U portion (1; 1) c 1 c 1 0 (i; 1) Z i1 (1; 2) c 2 =2 c 2 =2 c 2 (i; 1) Z i2 (2; 1) c 1 0 c 1 (i; 2) Z i1 (2; 2) 6 c 2 =2 c 2 c 2 =2 7 6 (i; 2) 7 6Z i2 7 (3; 1) 4 0 c 1 c (i; 3) 5 4Z i1 5 (3; 2) c 2 c 2 =2 c 2 =2 (i; 3) Z i2 Solution is (X not full column rank) ^ = X 0 WX X 0 WU. Voelkel (RIT) Missing-Index Problem 10/09 24 / 35

25 EM Algorithm: J=2 case Some algebra leads to An extreme case ^ 1 = (1=Ic 1 ) P I i=1 Z i1 (1 2 (i; 1)) = (1=I) P I i=1 (Y i2 Y i1 ) (1 2 (i; 1)) Say all permutations just happen to be (1; 2) and separation via is much larger than noise via Then estimated values (i; 1) 1, so ^ 1 P I i=1 (Y i1 Y i2 ) =I If permutations have clean separation, each (i; 1) weight either 0 or 1, an indicator of which permutation took place If poor separation, (i; k (i)) tend to be closer to 0:5 in extreme case (all weights = 0:5) get ^ 1 = 0: 0 estimated separation. Voelkel (RIT) Missing-Index Problem 10/09 25 / 35

26 EM Algorithm: Implementation Expectation step Maximization step Questions Initial estimates Convergence criteria Initial permutation of rows. (= Voelkel (RIT) Missing-Index Problem 10/09 26 / 35

27 EM Algorithm: Implementation Initial permutation of rows? Can get initial estimates if reasonable idea of how to permute rows If some signal in the data, this is possible Data Y Y1 Y2 Y Best signal=row with max var? Align other rows to this one Permuted data Y Y1 Y2 Y (= Voelkel (RIT) Missing-Index Problem 10/09 27 / 35

28 EM Algorithm: Implementation Advantage of initial permutation of rows? 1 Initial estimates of the parameters. For, estimate is Y 1 Y J; Y 2 Y J; : : : ; Y (J 1) Y J; 0 2 Best-guess row alignments in place. If reasonably strong signal, most likely permutation in CP (J) to be correct for each i is k = 1 (no permutation) So f^ c (i; k (i)) ; i = 1; 2; : : : ; Ig 1 for k (i) = 1 and 0 for other k (i), across iterations c = 1; 2; : : : 3 If signal very weak, will observe that the f^ c (i; 1) ; i = 1; 2; : : : ; Ig tends to decrease with c algorithm indicates that initial row-alignments could be other ones. Voelkel (RIT) Missing-Index Problem 10/09 28 / 35

29 EM Algorithm: Implementation Convergence criteria? 1 Stability of f^ c (i; k (i))g across iterations c determines stability of estimates 2 So, reasonable stopping rule is rst c such that max i;k where cutoff is, say, ^ c (i; k) ^ c 1 (i; k) < i;k(i) Voelkel (RIT) Missing-Index Problem 10/09 29 / 35

30 Example 1: 8-spindle machine, CP case 24 rows of data Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y Boxplots, after mean-centering each row, and then row-aligning... Voelkel (RIT) Missing-Index Problem 10/09 30 / 35

31 Distribution of Y (Row-Mean Centered) at each Index Y Index

32 Ref Row 4: Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y Distribution after Row-Alignment Permutations Y Index (Permuted)

33 β^c vs Iteration c Iteration c β^c

34 σ^ c vs Iteration c Iteration c σ^c

35 Iteration c= Permutation index k γ^c(., k)

36 Iteration c= Permutation index k γ^c(., k)

37 Iteration c= Permutation index k γ^c(., k)

38 Iteration c= Permutation index k γ^c(., k)

39 Iteration c= Permutation index k γ^c(., k)

40 Iteration c= Permutation index k γ^c(., k)

41 β^ and Actual Y means (Both Centered) β^ and Y Y Index

42 β^ and Actual Y means (Both Centered) β^ and Actual Y means (Centered and Aligned) β^ and Y Y β^ and Y Index Index

43 Y Density Estimates Indices Known (Row Adjusted) Y Density Normal Estimates Indices Unknown Y Y

44 Example 2: 4-cavity machine, GP case 24 rows of data, again... Y1 Y2 Y3 Y Results... Voelkel (RIT) Missing-Index Problem 10/09 31 / 35

45 Distr'n of Y at each Index (Row-Mean Centered) Distribution after Row-Alignment Permutations Y Y Index Index (Permuted)

46 β^ c vs Iteration c Iteration c β^c

47 σ^ c vs Iteration c σ^c Iteration c

48 Iteration c= 2 γ^c(., k) Permutation index k

49 Iteration c= 3 γ^c(., k) Permutation index k

50 Iteration c= 5 γ^c(., k) Permutation index k

51 Iteration c= 10 γ^c(., k) Permutation index k

52 Iteration c= 20 γ^c(., k) Permutation index k

53 Iteration c= 50 γ^c(., k) Permutation index k

54 ( ) ( ) ( ) ( ) 0.8 Iteration c= 112 γ^c(., k) Permutation index k

55 β^ and Actual Y means (Centered and Aligned) Y Index β^ and Y

56 Example 4: Random Data, CP case I=100 rows, J=8 N (0; 1) data Results... Voelkel (RIT) Missing-Index Problem 10/09 32 / 35

57 I=100 J=8 N(0,1) Distribution of Y (Row-Mean Centered) at each Index Y Index

58 Distribution after Row-Alignment Permutations Y Index (Permuted)

59 β^ c vs Iteration c β^c Iteration c

60 σ^ c vs Iteration c σ^c Iteration c

61 Iteration c= Permutation index k γ^c(., k)

62 Iteration c= Permutation index k γ^c(., k)

63 Iteration c= Permutation index k γ^c(., k)

64 β^ and Actual Y means (Both Centered) Y Index β^ and Y

65 Example 4a: Random Data, CP case I=100 rows, J=8 N (0; 1) data Shrinkage of initial location ( j ) estimates? Expansion of initial scale estimate? Results... Voelkel (RIT) Missing-Index Problem 10/09 33 / 35

66 shrink initial β est s by 2 enlarge initial σ est s by 2 β^ c vs Iteration c β^c Iteration c

67 New solution Old solution (!) β^ and Actual Y means (Both Centered) β^ and Actual Y means (Both Centered) β^ and Y Y β^ and Y Y Index Index

68 Using Same values... β^c ˆ ˆ β 5 Using ˆ β N 0, ( 0.01 = = ) β c= 1 c= 1 D β c = 1, ( 2 ) = = ( ) vs Iteration c β^c vs Iteration c β^c β^c Iteration c Iteration c

69 Summary (and future work) Both CP and GP methods appear to work well in cases with good signals Under H 0, results might be optimistic Future work 1 Approximate the standard errors of the estimates 2 Implement likelihood-ratio tests 3 Investigate asymptotics 4 Investigate H 0 case in detail 5 Consider shrinkage of nal estimates 6 Improve (linear) convergence rate, e.g. Aitken s acceleration technique. Voelkel (RIT) Missing-Index Problem 10/09 34 / 35

70 Questions? Voelkel (RIT) Missing-Index Problem 10/09 35 / 35

Computational Linear Algebra

Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.