The Missing-Index Problem in Process Control

Similar documents
Computational Linear Algebra

CS281 Section 4: Factor Analysis and PCA

Likelihood-based inference with missing data under missing-at-random

Neural Network Training

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Lecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation

A Note on the Expectation-Maximization (EM) Algorithm

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Chapter 17: Undirected Graphical Models

Managing Uncertainty

Maximum Likelihood Estimation

Variable selection for model-based clustering

Section 6.3 Richardson s Extrapolation. Extrapolation (To infer or estimate by extending or projecting known information.)

Study Notes on the Latent Dirichlet Allocation

Linear Regression and Its Applications

Numerical Methods

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Mixtures of Gaussians. Sargur Srihari

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Linear Regression (9/11/13)

Generalized linear models

An exploration of matrix equilibration

Determinants - Uniqueness and Properties

Sherman-Morrison-Woodbury

Linear Algebra Practice Final

Applications of Hidden Markov Models

This ensures that we walk downhill. For fixed λ not even this may be the case.

MS-C1620 Statistical inference

Selected Topics in Optimization. Some slides borrowed from

For a system with more than one electron, we can t solve the Schrödinger Eq. exactly. We must develop methods of approximation, such as

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 21: Spectral Learning for Graphical Models

Generalized Linear Models

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

CS 246 Review of Linear Algebra 01/17/19

Matrices and systems of linear equations

Strassen-like algorithms for symmetric tensor contractions

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

Accelerating Convergence

EM Algorithm II. September 11, 2018

Part 6: Multivariate Normal and Linear Models

The Solution of Linear Systems AX = B

Simple Estimators for Semiparametric Multinomial Choice Models

Factor Analysis (10/2/13)

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Linear Methods for Prediction

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

An average case analysis of a dierential attack. on a class of SP-networks. Distributed Systems Technology Centre, and

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Sparse Linear Models (10/7/13)

Variable Selection in Predictive Regressions

Machine Learning Lecture 5

Direct Methods for Solving Linear Systems. Matrix Factorization

Expectation-Maximization (EM) algorithm

Expectation-Maximization

Gaussian Mixture Models, Expectation Maximization

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Secretary Problems. Petropanagiotaki Maria. January MPLA, Algorithms & Complexity 2

Eigenvalues and diagonalization

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Lecture Stat Information Criterion

Boolean Inner-Product Spaces and Boolean Matrices

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Statistical Methods for Data Mining

12 - Nonparametric Density Estimation

Numerical Methods I Solving Nonlinear Equations

Definition 2.3. We define addition and multiplication of matrices as follows.

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Vector, Matrix, and Tensor Derivatives

Hypothesis Testing for Var-Cov Components

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Index Notation for Vector Calculus

Massachusetts Institute of Technology

Three-Way Tables (continued):

Data Mining Techniques

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

GROUP THEORY PRIMER. New terms: tensor, rank k tensor, Young tableau, Young diagram, hook, hook length, factors over hooks rule

Lecture Notes 1: Matrix Algebra Part C: Pivoting and Matrix Decomposition

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

COURSE Iterative methods for solving linear systems

Business Statistics. Lecture 9: Simple Regression

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Multiple Linear Regression

Lecture 16 Deep Neural Generative Models

COM336: Neural Computing

EM for ML Estimation

Transcription:

The Missing-Index Problem in Process Control Joseph G. Voelkel CQAS, KGCOE, RIT October 2009 Voelkel (RIT) Missing-Index Problem 10/09 1 / 35

Topics 1 Examples Introduction to the Problem 2 Models, Inference, Likelihoods, EM Algorithm 3 Examples of the Method 4 Summary, Future Work Paper at http://people.rit.edu/~jgvcqa/ Voelkel (RIT) Missing-Index Problem 10/09 2 / 35

Example 1. 8-spindle machine CP = Cyclic Permutation case Sample J=8 consecutive parts Sampled in order of product, but rst index value not known Index j=1,...,j? 1 1 2 3 4 5 6 7 8? 2 2 3 4 5 6 7 8 1? 3... J possibilities. Voelkel (RIT) Missing-Index Problem 10/09 3 / 35

Example 2. 4-cavity machine GP = General Permutation case Sample the J=4 parts from the shot No order to the parts Index j=1,...,j? 1 1 2 3 4? 2 3 2 4 1? 3 2 1 3 4? 4... J! possibilities. Voelkel (RIT) Missing-Index Problem 10/09 4 / 35

Example 3. 64-cavity mold SP = Sampled Permutation Case Sample J=4 parts at random from the N=64 parts Index j=1,...,j? 1 1 3 10 55? 2 6 22 10 4? 3... Will not be discussed in this talk. N! (N J)! possibilities Voelkel (RIT) Missing-Index Problem 10/09 5 / 35

Models Index-Known Case General model we consider, if index j were known, is Y ij = + i + j + " j(i) + i ; i = 1; 2; : : : ; I; j = 1; 2; : : : ; J P i i = P j j = 0; " j(i) N 0; 2 ; i N 0; 2 ; ind r.v. s ( i : for non-stochastic events, e.g. mean drifts/shifts) 1 Injection Molding: j =cavities, " j(i) =within-shot variation, i =across-shot variation 2 Multiple-spindle machine: j =spindles. Spindle parts as a group ) " j(i), i analogous to above. Spindle parts one at a time ) model excludes i. Our interest here lies with (; ). Voelkel (RIT) Missing-Index Problem 10/09 6 / 35

Models Index Unknown. CP Case If J = 3, index order is either (1; 2; 3), (3; 1; 2), or (2; 3; 1) Ordered set of these 3 vectors CP (3), indexed by k Model same as before except Y ij = + i + j + " j(i) + i =) Y ij = + i + jk (i)+" jk (i) + i Here, (1 k ; 2 k ; : : : J k ) is the (unknown) k th member of CP (J) E.g, k = 2 =) (1 k ; 2 k ; 3 k ) = (3; 1; 2), so Y i1 = + i + 3 +" 2(i) + i Reasonable: k is discrete uniform on 1; 2; : : : J for each i. Voelkel (RIT) Missing-Index Problem 10/09 7 / 35

Example CP Case Indices in (unknown) correct order Y1 Y2 Y3 3 6 10 5 8 7 8 12 12 6 6 4 9 5 11 7 8 9 Actual Y data (CP d) Y1 Y2 Y3 6 10 3 5 8 7 12 8 12 6 4 6 9 5 11 7 8 9. Voelkel (RIT) Missing-Index Problem 10/09 8 / 35

CP Case: Inference Y ij = + i + jk (i) + " jk (i) + i Objective: inference on = ( 1 ; 2 ; : : : ; J ) and (Inference on f i g and (or 2 + 2 =J) made from Y i ) Note: information on is contained in contrasts within each time i, e.g. in Y ij Y i for j = 1; 2; : : : ; J Restriction on. We will use J = 0, not P j j = 0 Still identi able (at best) only up to cyclic permutation. Voelkel (RIT) Missing-Index Problem 10/09 9 / 35

CP Case: Inference Will not use the J Y ij Y i (correlated) contrasts for inference Instead, will use the J 1 Helmert contrasts: Z i1 = c 1 (Y i2 Y i1 ) Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) ::: PJ 1 Z i;j 1 = c J 1 Y ij j=1 Y ij = (J 1) Here, c j = 1= p 1 + (1=j) de ned so var (Z ij ) = 2 when the indices are correctly aligned Note: Y is I J, Z is I (J 1). Voelkel (RIT) Missing-Index Problem 10/09 10 / 35

CP Case: Inference The J 1 Helmert contrasts: Z i1 = c 1 (Y i2 Y i1 ) Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) Actual Y data (CP d) Y1 Y2 Y3 6 10 3 5 8 7 12 8 12 6 4 6 9 5 11 7 8 9 Z data (w/o the c j...) Z1 Z2 4.0-5.0 3.0 0.5-4.0 2.0-2.0 1.0-4.0 4.0 1.0 1.5. Voelkel (RIT) Missing-Index Problem 10/09 11 / 35

CP Case: Likelihood Y ij = + i + jk (i) + " jk (i) + i Will estimate (; ) using likelihood methods (focus: estimation) Let f (z; ) = density function of N ; 2. For J = 3: Z i1 = c 1 (Y i2 Y i1 ), Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) E [Z i1 ] = c 1 2k (i) 1k (i) E [Z i2 ] = c 2 3k (i) 1k (i) + 2k (i) =2 k is discrete uniform on 1; 2; 3, so i th likelihood contribution is 1=3 of f (z i1 ; c 1 ( 2 1 )) f (z i2 ; c 2 ( 3 ( 1 + 2 ) =2)) + f (z i1 ; c 1 ( 1 3 )) f (z i2 ; c 2 ( 2 ( 3 + 1 ) =2)) + f (z i1 ; c 1 ( 3 2 )) f (z i2 ; c 2 ( 1 ( 2 + 3 ) =2)) Recall 3 = 0, but kept in here for symmetry Note: sum of J terms each term is a product of J 1 terms. Voelkel (RIT) Missing-Index Problem 10/09 12 / 35

CP Case: Likelihood Note: sum of J terms each is a product of J 1 terms General J case. Contribution at i to likelihood is 1 P J Q J 1 k=1 j=1 J f Pj i z ij ; c j h jk+1 (i) l=1 jl(i) =j. Voelkel (RIT) Missing-Index Problem 10/09 13 / 35

GP Case: Likelihood GP case is analogous to CP case For J = 3, now have J! = 6 permutations (1; 2; 3), (1; 3; 2), (2; 1; 3), (2; 3; 1), (3; 1; 2), and (3; 2; 1) Contribution to the likelihood at time i is 1=6 times f (z i1 ; c 1 ( 2 1 )) f (z i2 ; c 2 ( 3 ( 1 + 2 ) =2)) + f (z i1 ; c 1 ( 3 1 )) f (z i2 ; c 2 ( 2 ( 1 + 3 ) =2)) + f (z i1 ; c 1 ( 1 2 )) f (z i2 ; c 2 ( 3 ( 2 + 1 ) =2)) + f (z i1 ; c 1 ( 3 2 )) f (z i2 ; c 2 ( 1 ( 2 + 3 ) =2)) + f (z i1 ; c 1 ( 1 3 )) f (z i2 ; c 2 ( 2 ( 3 + 1 ) =2)) + f (z i1 ; c 1 ( 2 3 )) f (z i2 ; c 2 ( 1 ( 3 + 2 ) =2)) General J: sum of J! terms each is a product of J Next: back to CP case. 1 terms Voelkel (RIT) Missing-Index Problem 10/09 14 / 35

Finding MLE s The log-likelihood is very complex, even in simpler CP case: P I i=1 ln 1 P J Q J 1 J k=1 j=1 f Pj i z ij ; c j h jk+1 (i) l=1 jl(i) =j Maximizing usually fails for all but the simplest CP cases Some similarities to estimation of parameters in normal-mixture case So, consider use of EM algorithm However, our problem is both more and less complex than the normal-mixture problem More complex: likelihood term is sum of products, not just a sum Less complex: mixture probabilities are known. Voelkel (RIT) Missing-Index Problem 10/09 15 / 35

EM Algorithm For compactness, de ne (E [Z ij ] for each i) Pj jk = jk () = c j jk l=1 j l =j Then i th contr n to likelihood is 1 P J Q J 1 J k=1 j=1 f z ij; jk Idea behind EM algorithm: consider how likelihood would look if all the CP s were known ( no missing data ) De ne k (i) to be the actual permutation index at time i, and de ne all of these to be k = (k (1) ; k (2) ; : : : ; k (I)) Then incomplete data is Z = (Z 1 ; Z 2 ; : : : ; Z I ), where Z i = (Z i1 ; Z i2 ; : : : ; Z i;j 1 ) is the observed (transformed) data at time i, and complete data is (Z; k ). Voelkel (RIT) Missing-Index Problem 10/09 16 / 35

EM Algorithm The i th contr n to incomplete-data log likelihood is the complex 1 P ln J Q J 1 k=1 j=1 J f z ij; jk However, the i th contr n to the complete-data log likelihood is then simply P J 1 j=1 z ln f ij ; jk (i) () Simple index-known case: easy to maximize with respect to (; ) For example, if data indices rearranged so that k (i) = 1 for each i, then the MLE of 2 1 would simply be Y 2 Y1 Idea of EM algorithm: estimate the correct indices so complete-data log likelihood can be used. Voelkel (RIT) Missing-Index Problem 10/09 17 / 35

Cycle of EM Algorithm Each cycle of the EM algorithm requires that we 1 Use current estimates ^ c at iteration c, of = (; ) 2 Find conditional expectation of complete-data (i.e., (Z; k )) log likelihood, conditional on the incomplete data Z (using at ^ c to obtain conditional expectation) 3 Maximize the result in (2) with respect to, to get ^ c+1 4 Continue until convergence Questions: Initial estimates Expectation step (interesting!) Maximization step (also interesting!) Convergence criteria. Voelkel (RIT) Missing-Index Problem 10/09 18 / 35

EM Algorithm: Expectation step For EM algorithm, it is useful to write complete-data log-likelihood ` () = P I P J 1 i=1 j=1 z ln f ij ; jk (i) () as ` () = P I i=1 P J k(i)=1 (k (i) ; k (i)) P J 1 j=1 z ln f ij ; jk(i) () where (k; k ) = 1 if k = k and is 0 otherwise This makes conditional expectation easier to obtain k (i) is now part of a linear term. Voelkel (RIT) Missing-Index Problem 10/09 19 / 35

EM Algorithm: Expectation step Expectation step requires nding `em () = E [` () jz] (conditional expectation evaluated at ^ c ) Here, need to nd `em () = P I i=1 P J k(i)=1 E [ (k (i) ; k (i)) jz i ] P J 1 j=1 z ln f ij ; jk(i) () Voelkel (RIT) Missing-Index Problem 10/09 20 / 35

EM Algorithm: Expectation step Need to nd `em () = P I i=1 We can solve this. Find that P J k(i)=1 E [ (k (i) ; k (i)) jz i ] P J 1 j=1 z ln f ij ; jk(i) () E [ (k (i) ; k (i)) jz i ] = P ( (k (i) ; k (i)) = 1 jz i ) Q J 1 j=1 z f ij ; jk(i) () = P J Q J 1 k(i)=1 j=1 z f ij ; jk(i) () (i; k (i)), where (i; k (i)) = P i th CP has index k is called the responsibility of permutation k to observation i Evaluating at ^ c, we obtain estimates ^ c (i; k (i)) of the (i; k (i)). Voelkel (RIT) Missing-Index Problem 10/09 21 / 35

EM Algorithm: Expectation step responsibility ^ (i; k (i)) = ^P i th CP has index k Assume ^ c = ^ c ; ^ c = ((4; 1; 0) ; 1) [index k=1] f (z i1 ; c 1 ( 2 1 )) f (z i2 ; :::) + [index k=2] f (z i1 ; c 1 ( 1 3 )) f (z i2 ; :::) + [index k=3] f (z i1 ; c 1 ( 3 2 )) f (z i2 ; :::) Z i data (w/o the c j...). Just consider Z i1 for i = 1 and i = 3 4.0 2 1 ( 3)? 1 3 (4)? 3 2 ( 1)? -4.0 2 1 ( 3)? 1 3 (4)? 3 2 ( 1)?. Voelkel (RIT) Missing-Index Problem 10/09 22 / 35

EM Algorithm: Maximization step Next, we need to maximize `em () = P I i=1 with respect to = (; ) P J k(i)=1 ^ (i; k (i)) P J 1 j=1 z ln f ij ; jk(i) () Close look at sum: portion is mathematically equivalent to a weighted sum of squares in a linear-regression framework, so associated matrix methods can be directly applied However, instead of the usual I terms, we now have IJ (J 1) terms. Voelkel (RIT) Missing-Index Problem 10/09 23 / 35

EM Algorithm: Maximization step For matrix machinery, de ne X (predictor) and W (weight) matrices and U (response) vector. For J = 3, here are portions for a given i. (k (i) ; j) X portion diag (W) portion U portion 2 3 2 3 2 3 (1; 1) c 1 c 1 0 (i; 1) Z i1 (1; 2) c 2 =2 c 2 =2 c 2 (i; 1) Z i2 (2; 1) c 1 0 c 1 (i; 2) Z i1 (2; 2) 6 c 2 =2 c 2 c 2 =2 7 6 (i; 2) 7 6Z i2 7 (3; 1) 4 0 c 1 c 1 5 4 (i; 3) 5 4Z i1 5 (3; 2) c 2 c 2 =2 c 2 =2 (i; 3) Z i2 Solution is (X not full column rank) ^ = X 0 WX X 0 WU. Voelkel (RIT) Missing-Index Problem 10/09 24 / 35

EM Algorithm: J=2 case Some algebra leads to An extreme case ^ 1 = (1=Ic 1 ) P I i=1 Z i1 (1 2 (i; 1)) = (1=I) P I i=1 (Y i2 Y i1 ) (1 2 (i; 1)) Say all permutations just happen to be (1; 2) and separation via is much larger than noise via Then estimated values (i; 1) 1, so ^ 1 P I i=1 (Y i1 Y i2 ) =I If permutations have clean separation, each (i; 1) weight either 0 or 1, an indicator of which permutation took place If poor separation, (i; k (i)) tend to be closer to 0:5 in extreme case (all weights = 0:5) get ^ 1 = 0: 0 estimated separation. Voelkel (RIT) Missing-Index Problem 10/09 25 / 35

EM Algorithm: Implementation Expectation step Maximization step Questions Initial estimates Convergence criteria Initial permutation of rows. (= Voelkel (RIT) Missing-Index Problem 10/09 26 / 35

EM Algorithm: Implementation Initial permutation of rows? Can get initial estimates if reasonable idea of how to permute rows If some signal in the data, this is possible Data Y Y1 Y2 Y3 6 10 3 5 8 7 12 8 12 6 4 6 9 5 11 Best signal=row with max var? Align other rows to this one Permuted data Y Y1 Y2 Y3 6 10 3 (= 8 7 5 12 12 8 6 6 4 11 9 5. Voelkel (RIT) Missing-Index Problem 10/09 27 / 35

EM Algorithm: Implementation Advantage of initial permutation of rows? 1 Initial estimates of the parameters. For, estimate is Y 1 Y J; Y 2 Y J; : : : ; Y (J 1) Y J; 0 2 Best-guess row alignments in place. If reasonably strong signal, most likely permutation in CP (J) to be correct for each i is k = 1 (no permutation) So f^ c (i; k (i)) ; i = 1; 2; : : : ; Ig 1 for k (i) = 1 and 0 for other k (i), across iterations c = 1; 2; : : : 3 If signal very weak, will observe that the f^ c (i; 1) ; i = 1; 2; : : : ; Ig tends to decrease with c algorithm indicates that initial row-alignments could be other ones. Voelkel (RIT) Missing-Index Problem 10/09 28 / 35

EM Algorithm: Implementation Convergence criteria? 1 Stability of f^ c (i; k (i))g across iterations c determines stability of estimates 2 So, reasonable stopping rule is rst c such that max i;k where cutoff is, say, 0.0001. ^ c (i; k) ^ c 1 (i; k) < i;k(i) Voelkel (RIT) Missing-Index Problem 10/09 29 / 35

Example 1: 8-spindle machine, CP case 24 rows of data Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 8.66 8.45 8.79 8.69 8.82 8.70 8.53 8.65 8.51 8.56 8.73 8.48 8.70 8.44 8.81 8.60 8.50 8.69 8.76 8.51 8.30 8.79 8.48 8.74 8.44 8.81 8.81 8.50 8.53 8.94 8.44 8.82 8.50 8.66 8.66 8.88 8.40 8.74 8.78 8.44 8.58 8.63 8.80 8.45 8.77 8.71 8.63 8.53... Boxplots, after mean-centering each row, and then row-aligning... Voelkel (RIT) Missing-Index Problem 10/09 30 / 35

Distribution of Y (Row-Mean Centered) at each Index 8.4 8.5 Y 8.6 8.7 8.3 8.8 8.9 9.0 1 2 3 4 5 6 7 8 Index

Ref Row 4: Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 8.44 8.81 8.81 8.50 8.53 8.94 8.44 8.82 Distribution after Row-Alignment Permutations 8.4 8.5 Y 8.6 8.7 8.3 8.8 8.9 9.0 1 2 3 4 5 6 7 8 Index (Permuted)

β^c vs Iteration c 5 10 15 Iteration c -0.30-0.25-0.20 0-0.15-0 β^c 0.10-0.05 0.00

σ^ c vs Iteration c 5 10 15 Iteration c 0.1000 0.1005 σ^c 0.1010

Iteration c= 2 1 2 3 4 5 6 7 8 Permutation index k γ^c(., k) 0.0 0.2 0.4 0.6 0.8 1.0

Iteration c= 3 1 2 3 4 5 6 7 8 Permutation index k γ^c(., k) 0.0 0.2 0.4 0.6 0.8 1.0

Iteration c= 4 1 2 3 4 5 6 7 8 Permutation index k γ^c(., k) 0.0 0.2 0.4 0.6 0.8 1.0

Iteration c= 5 1 2 3 4 5 6 7 8 Permutation index k γ^c(., k) 0.0 0.2 0.4 0.6 0.8 1.0

Iteration c= 10 1 2 3 4 5 6 7 8 Permutation index k γ^c(., k) 0.0 0.2 0.4 0.6 0.8 1.0

Iteration c= 18 1 2 3 4 5 6 7 8 Permutation index k γ^c(., k) 0.0 0.2 0.4 0.6 0.8 1.0

β^ and Actual Y means (Both Centered) -0.3-0 β^ and Y.2-0.1 0.0 0.1 Y 1 2 3 4 5 6 7 8 Index

β^ and Actual Y means (Both Centered) β^ and Actual Y means (Centered and Aligned) β^ and Y -0.3-0.2-0.1 0.0 0.1 Y β^ and Y -0.3-0.2-0.1 0.0 0.1 1 2 3 4 5 6 7 8 Index 1 2 3 4 5 6 7 8 Index

Y Density Estimates Indices Known (Row Adjusted) Y Density Normal Estimates Indices Unknown 8.2 8.4 8.6 8.8 9.0 9.2 Y 8.0 8.2 8.4 8.6 8.8 9.0 Y

Example 2: 4-cavity machine, GP case 24 rows of data, again... Y1 Y2 Y3 Y4 8.66 8.65 8.79 8.45 8.56 8.70 8.73 8.48 8.48 8.79 8.74 8.30 8.44 8.82 8.53 8.94 8.88 8.50 8.66 8.66 8.63 8.80 8.53 8.58... Results... Voelkel (RIT) Missing-Index Problem 10/09 31 / 35

Distr'n of Y at each Index (Row-Mean Centered) Distribution after Row-Alignment Permutations 8.4 8.5 8.4 8.5 Y 8.6 8.7 8.8 Y 8.6 8.7 8.8 8.9 8.9 1 2 3 4 Index 1 2 3 4 Index (Permuted)

β^ c vs Iteration c 0 20 40 60 80 100 Iteration c -0..30-0.25-0.20-0.15 β^c -0.10-0.05 0.00

σ^ c vs Iteration c 0.07 0.08 σ^c 0.09 0.10 0 20 40 60 80 100 Iteration c

Iteration c= 2 γ^c(., 0.0 0. 2 0.4 k) 0.6 0.8 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

Iteration c= 3 γ^c(., 0.0 0. 2 0.4 k) 0.6 0.8 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

Iteration c= 5 γ^c(., 0.0 0. 2 0.4 k) 0.6 0.8 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

Iteration c= 10 γ^c(., 0.0 0. 2 0.4 k) 0.6 0.8 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

Iteration c= 20 γ^c(., 0.0 0. 2 0.4 k) 0.6 0.8 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

Iteration c= 50 γ^c(., 0.0 0. 2 0.4 k) 0.6 0.8 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

(1 2 3 4) (1 4 3 2) (3 2 1 4) (3 4 1 2) 0.8 Iteration c= 112 γ^c(., 0.0 0. 2 0.4 k) 0.6 1 3 5 7 9 11 13 15 17 19 21 23 Permutation index k

β^ and Actual Y means (Centered and Aligned) Y 1 2 3 4 Index -0.10 β^ and Y -0.05 0.00 0.05 0.10

Example 4: Random Data, CP case I=100 rows, J=8 N (0; 1) data Results... Voelkel (RIT) Missing-Index Problem 10/09 32 / 35

I=100 J=8 N(0,1) Distribution of Y (Row-Mean Centered) at each Index -3-2 - Y -1 0 1 2 1 2 3 4 5 6 7 8 Index

Distribution after Row-Alignment Permutations -3-2 - Y -1 0 1 2 1 2 3 4 5 6 7 8 Index (Permuted)

β^ c vs Iteration c -1.5-1.0 β^c -0.5 0.0 0 5 10 15 20 25 30 35 Iteration c

σ^ c vs Iteration c 0.80 84 0.86 0.82 0.8 σ^c 0.88 0 5 10 15 20 25 30 35 Iteration c

Iteration c= 2 1 2 3 4 5 6 7 8 Permutation index k 0.0 0.2 0.4 γ^c(., k) 0.6 0.8 1.0

Iteration c= 5 1 2 3 4 5 6 7 8 Permutation index k 0.0 0.2 0.4 γ^c(., k) 0.6 0.8 1.0

Iteration c= 35 1 2 3 4 5 6 7 8 Permutation index k 0.0 0.2 0.4 γ^c(., k) 0.6 0.8 1.0

β^ and Actual Y means (Both Centered) Y 1 2 3 4 5 6 7 8 Index β^ and Y -0.6-0.4-0.2 0.0 0.2 0.4 0.6

Example 4a: Random Data, CP case I=100 rows, J=8 N (0; 1) data Shrinkage of initial location ( j ) estimates? Expansion of initial scale estimate? Results... Voelkel (RIT) Missing-Index Problem 10/09 33 / 35

shrink initial β est s by 2 enlarge initial σ est s by 2 β^ c vs Iteration c -1.2-1 β^c.0-0.8-0 0.6-0.4-0.2 0.0 0 10 20 30 40 50 60 Iteration c

New solution Old solution (!) β^ and Actual Y means (Both Centered) β^ and Actual Y means (Both Centered) β^ and Y 0.2 0.4 0.2 0.4 0.6 0.6-0.6-0.4-0.2 0.0 Y β^ and Y -0.6-0.4-0.2 0.0 Y 1 2 3 4 5 6 7 8 Index 1 2 3 4 5 6 7 8 Index

Using Same values... β^c ˆ ˆ β 5 Using ˆ β N 0, ( 0.01 = = ) β c= 1 c= 1 D β c = 1, ( 2 ) = = ( ) vs Iteration c β^c vs Iteration c β^c -0.6-0.4-0.2 0.0 0.2 0.4 0.6-1.2-1.0-0.8-0.6 β^c -0.4-0.2 0.0 0 20 40 60 80 Iteration c 0 20 40 60 80 Iteration c

Summary (and future work) Both CP and GP methods appear to work well in cases with good signals Under H 0, results might be optimistic Future work 1 Approximate the standard errors of the estimates 2 Implement likelihood-ratio tests 3 Investigate asymptotics 4 Investigate H 0 case in detail 5 Consider shrinkage of nal estimates 6 Improve (linear) convergence rate, e.g. Aitken s acceleration technique. Voelkel (RIT) Missing-Index Problem 10/09 34 / 35

Questions? Voelkel (RIT) Missing-Index Problem 10/09 35 / 35