Bayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data

Bayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data Presenter: M. Khakifirooz Co-authors: C-F Chien, Y-J Chen National Tsing Hua University ISMI 2015, 16 th -18 th Oct. KAIST, Daejeon, Korea 1

Outline The Purpose of Bayesian Inference Data Analysis Approach Bayesian Variable Selection (BVS) Data Clearance Yield Classification Final Decision Table Data Structure provided by Data Model Conclusive Research Framework Conclusion & Path Forward 2

The Purpose of Bayesian Inference Naïve Bayesian Classifier Learning Curve Bayesian Networks Bayesian Inference Gaussian Bayesian Classifier 3

The Purpose of Bayesian Inference Human Experience + System Analysis Human Experience Yield Learning Curve of Semiconductor Manufacturing: In addition to data analytics, Cumulative Engineering Training and Experience significantly enhanced yield improvement Effron(1996), Tobin et al. (1999) Yield Learning Curve of Semiconductor Manufacturing 4

Data Structure provided by Data Model i = 1,, M N of process stage sample size 1 k i N of specify tools at each stage n ij, j = 1,, k i frequency of each specify tool 1 P nij n ij of exist chambers for each tool p l, l = 1,, P nij frequency of each exist chamber N = k i j=1 l=1 P nij p l N M = Response Variable: %Yield (continues) M k i P nij p l i=1 j=1 l=1 Explanatory Variables: Stages (tools-chambers) (nominal) Stages (process time) (continues) Obs. Nominal Variables var 1 var 2 n 1 a 1 a 2 n 2 a 1 b 2 n 3 b 1 Na Dummy Variables Obs. var 1 -a 1 var 1 b 1 var 2 -a 2 var 2 -b 2 n 1 1 0 1 0 n 2 1 0 0 1 n 3 0 1 0 0 5

Data Structure provided by Data Model Yield stage 1 stage 2 obs. 1 Tool 1 Tool 2 obs. 2 Tool 1 Tool 1 obs. 3 Tool 2 Tool 2 Yield stage 1 stage 2 obs. 1 Chamber 1 Chamber 2 obs. 2 Chamber 2 Chamber 1 obs. 3 Chamber 1 Chamber 2 Yield stage 1 stage 2 obs. 1 Date 1.1 Date 1.2 obs. 2 Date 2.1 Date 2.2 obs. 3 Date 3.1 Date 3.2 Yield stage 1 stage 2 obs. 1 Tool 1. Chamber 1 Tool 2. Chamber 2 obs. 2 Tool 1. Chamber 2 Tool 1. Chamber 1 obs. 3 Tool 2. Chamber 1 Tool 2. Chamber 2 Yield s 1. T 1. Ch 1 s 1. T 1. Ch 2 s 1. T 2. Ch 1 s 2. T 2. Ch 2 s 2. T 2. Ch 2 obs. 1 1 0 0 1 0 obs. 2 0 1 0 0 1 obs. 3 0 0 1 1 0 Yield s 1. T 1. Ch 1 s 1. T 1. Ch 2 s 1. T 2. Ch 1 s 2. T 2. Ch 2 s 2. T 2. Ch 2 obs. 1 Date 1.1 0 0 Date 1.2 0 obs. 2 0 Date 2.1 0 0 Date 2.2 obs. 3 0 0 Date 3.1 Date 2.3 0 6

Data Structure provided by Data Model Obs. var 1 -a 1 var 1 b 1 var 1 -c 1 n 1 1 0 0 n 2 0 0 1 n 3 0 1 0 Pr(i th variable sellected) 1 3 1 3 1 3 var 1 a 1, var 1 b 1, var 1 c 1 d Multinomial selection probability based on engineer experience 1 3, 1 3, 1 3 var 1 -a 1 1,0,0 To randomly pick a point in this space, we need a continues distribution 0,1,0 0,0,1 Distribution over Multinomial (posterior distribution): Dirichlet Distribution var 1 -c 1 var 1 b 1 7

Data Analysis Approach Critical Phenomena: i. High dimensionality caused by transforming categorical variables to dummies ii. iii. Multicollinearity caused by dummies nature Complicated posterior distribution caused hardness for direct variable selection Remedy: Approximate Inference with Sampling Use random sampling (MCMC techniques: Gibbs sampler, Metropolis-Hastings, ) to approximate the distribution and selecting significant explanatories 8

Data Analysis Approach: Gibbs Sampler Beginning with initial value x 1 0, x 2 0 Suppose x 1, x 2 ~Pr x, x 2 Sampling at iteration t as follow: Iteration Sample x 1 Sample x 2 k x t t 1 1 ~Pr x 1 x 2 x t t 2 ~Pr x 2 x 1 Iterating the above step until the sample values have the same distribution as if they where sampled from the true posterior joint distribution Based on frequency of visits, selecting the most probable variables 9

Data Analysis Approach: Data Clearance When X is categorical (dummy var.) & Y is quantitative variable - parametric or non-parametric? - dependent or independent? - unbalanced class? Yield value Representative var. Bad Yield 53.12 < 1 Middle Yield 53.12 and 57.51 ignore Good Yield >57.51 0 10

Variable I Data Analysis Approach: Data Clearance Variable II Level a Level b Level c f ca f cb Level d f da f db If both var. I & var. II are explanatory: - test the Interchangeability of measures - measurement of the degree of Homogeneity If var. I is explanatory and var. II is response: - measurement of the Reliability of instrument (test/scale) - measurement of the Objectivity or lack of bias MEASURMENT of AGREEMENT W. S. Robinson(1957) Cohen s Kappa K K < 0, "No agreement" 0 K < 0.2, Slight agreement 0.2 K < 0.4, "Fair agreement" 0.4 K < 0.6, "Moderate agreement" 0.6 K < 0.8, "Substantial agreement" 0.8 K 1, "Almost perfect agreement" 11

Research Framework (I) Problem Definition Data Preparation A Bayesian Framework for Semiconductor Manufacturing Data Data Integration Dummy Variable Construction for Integrated Variables (1460 var.) THE CLASS DISTRIBUTION FOR THE KAPPA TEST FOR EACH PAIR OF INPUT VARIABLES Almost perfect agreement Substantial agreement Moderate agreement 3 109 1,764 Fair agreement Slight agreement No agreement 24,539 280,081 758,574 Data Mining & Key Factor Screening Cohen s Kappa Statistics for each pairs of input variables No Agreement Agreement Wrap the associate variables Assign Cutting Point & Bad/Middle/Good Wafers 12

Research Framework (II) Data Mining & Key Factor Screening Cohen s Kappa Statistics for each pairs of X & Y No Agreement Data Clearance K 0.2 BVS via Gibbs Sampler Agreement Model RMSE Adjusted R-squared Min Median Max Min Median Max Gibbs + GLM 1.842 2.653 2.841 0.046 0.371 0.711 GBM + GLM 2.534 3.051 3.332 0.000 0.053 0.337 RF + GLM 2.268 2.838 3.660 0.016 0.293 0.507 GLM 7.951 34.60 139.8 0.000 0.029 0.214 Number of resamples 20, Number of iterations 2 Model Construction, Evaluation & Interpretation GLM Construction with Gaussian distribution & Repeated Random Sub-sampling Validation A Comparison to the Wrapped Variables Define Abnormal Devices & Time 13

Decision Graph High Yield Middle Yield Low Yield 14

Decision Table Factors Date Bad Good Stage10 - Tool2 - Chamber3 before 8/29/2014 2:32 after 8/29/2014 12:50 Stage12 - Tool2 - Chamber1 between 8/30/2014 3:26 & 8/30/2014 3:43 before 8/29/2014 10:55 Stage12 - Tool2 - Chamber4 after 8/29/2014 7:36 till 8/30/2014 3:44 before 8/29/2014 7:36 Stage13 - Tool5 - Chamber2 - generally effected the high yield Stage17 - Tool2 - Chamber2 after 8/30/2014 12:21 before 8/30/2014 10:37 Stage23-Tool3-Chamber2 - generally effected the high yield Stage44 - Tool7.- Chamber2 and Chamber3 at 9/3/2014 at 9/1/2014 Stage49 - Tool1.- Chamber4 at 9/3/2014 at 9/2/2014 Stage57 - Tool1.- Chamber3 - generally effected the high yield 15

Conclusion & Path Forward Based on the empirical results, we validate that the proposed approach has practical viability, which means adding the efficacy of domain knowledge and experience to the system could improve results. Using the domain knowledge might be to restrict conjunctions in rules to tools, chambers and steps that are related to occurs within a reasonable time frame. The data are not sampled from a stationary population, hence, over the time, the results may change significantly, or some empirical answer might be reject based on engineer domain knowledge, which doesn t mean that the result is incorrect. The result may be a proxy for one or more events that are occurring elsewhere or at the other periods of the time, hence, the simulation study is an essential tool for evaluation the accuracy of our proposed method. 16