Bayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data

Similar documents
Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Contents. Part I: Fundamentals of Bayesian Inference 1

Sleep data, two drugs Ch13.xls

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Pattern Recognition and Machine Learning

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Bayesian Methods for Machine Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Machine Learning Linear Classification. Prof. Matteo Matteucci

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian Models in Machine Learning

Convex Optimization CMU-10725

MCMC: Markov Chain Monte Carlo

Graphical Models and Kernel Methods

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

ST 740: Markov Chain Monte Carlo

Diagnostics. Gad Kimmel

Results: MCMC Dancers, q=10, n=500

Monte Carlo Methods. Leon Gu CSD, CMU

Lecture 7 and 8: Markov Chain Monte Carlo

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Probabilistic Graphical Networks: Definitions and Basic Results

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Down by the Bayes, where the Watermelons Grow

Fitting Narrow Emission Lines in X-ray Spectra

CS 343: Artificial Intelligence

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Bayes Nets: Sampling

Modeling Environment

Algorithms for Classification: The Basic Methods

Bayesian Inference and MCMC

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Bayesian inference for multivariate extreme value distributions

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

ECE521 week 3: 23/26 January 2017

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Machine Learning Overview

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Bayesian Networks BY: MOHAMAD ALSABBAGH

CPSC 540: Machine Learning

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Sampling Algorithms for Probabilistic Graphical models

Bayesian model selection for computer model validation via mixture model estimation

Bayesian Networks in Educational Assessment

an introduction to bayesian inference

CSC 2541: Bayesian Methods for Machine Learning

Naïve Bayes classification

Markov Chain Monte Carlo methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Bayesian Learning (II)

STA 4273H: Statistical Machine Learning

A Bayesian perspective on GMM and IV

Bayesian non-parametric model to longitudinally predict churn

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Introduction to Machine Learning

Probabilistic Time Series Classification

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning. Probabilistic KNN.

Gentle Introduction to Infinite Gaussian Mixture Modeling

Markov chain Monte Carlo

Active and Semi-supervised Kernel Classification

Computational statistics

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Introduction to Machine Learning CMU-10701

Bayesian GLMs and Metropolis-Hastings Algorithm

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Markov chain Monte Carlo

Bayesian Methods: Naïve Bayes

Bayesian model selection in graphs by using BDgraph package

Bayesian Nonparametric Regression for Diabetes Deaths

The Origin of Deep Learning. Lili Mou Jan, 2015

STAT 518 Intro Student Presentation

CTDL-Positive Stable Frailty Model

STA 4273H: Statistical Machine Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Principles of Bayesian Inference

Conditional Random Field

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Latent Dirichlet Allocation

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Linear Regression

17 : Markov Chain Monte Carlo

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Text Mining for Economics and Finance Latent Dirichlet Allocation

Intelligent Systems (AI-2)

The Ising model and Markov chain Monte Carlo

Transcription:

Bayesian Inference Technique for Data mining for Yield Enhancement in Semiconductor Manufacturing Data Presenter: M. Khakifirooz Co-authors: C-F Chien, Y-J Chen National Tsing Hua University ISMI 2015, 16 th -18 th Oct. KAIST, Daejeon, Korea 1

Outline The Purpose of Bayesian Inference Data Analysis Approach Bayesian Variable Selection (BVS) Data Clearance Yield Classification Final Decision Table Data Structure provided by Data Model Conclusive Research Framework Conclusion & Path Forward 2

The Purpose of Bayesian Inference Naïve Bayesian Classifier Learning Curve Bayesian Networks Bayesian Inference Gaussian Bayesian Classifier 3

The Purpose of Bayesian Inference Human Experience + System Analysis Human Experience Yield Learning Curve of Semiconductor Manufacturing: In addition to data analytics, Cumulative Engineering Training and Experience significantly enhanced yield improvement Effron(1996), Tobin et al. (1999) Yield Learning Curve of Semiconductor Manufacturing 4

Data Structure provided by Data Model i = 1,, M N of process stage sample size 1 k i N of specify tools at each stage n ij, j = 1,, k i frequency of each specify tool 1 P nij n ij of exist chambers for each tool p l, l = 1,, P nij frequency of each exist chamber N = k i j=1 l=1 P nij p l N M = Response Variable: %Yield (continues) M k i P nij p l i=1 j=1 l=1 Explanatory Variables: Stages (tools-chambers) (nominal) Stages (process time) (continues) Obs. Nominal Variables var 1 var 2 n 1 a 1 a 2 n 2 a 1 b 2 n 3 b 1 Na Dummy Variables Obs. var 1 -a 1 var 1 b 1 var 2 -a 2 var 2 -b 2 n 1 1 0 1 0 n 2 1 0 0 1 n 3 0 1 0 0 5

Data Structure provided by Data Model Yield stage 1 stage 2 obs. 1 Tool 1 Tool 2 obs. 2 Tool 1 Tool 1 obs. 3 Tool 2 Tool 2 Yield stage 1 stage 2 obs. 1 Chamber 1 Chamber 2 obs. 2 Chamber 2 Chamber 1 obs. 3 Chamber 1 Chamber 2 Yield stage 1 stage 2 obs. 1 Date 1.1 Date 1.2 obs. 2 Date 2.1 Date 2.2 obs. 3 Date 3.1 Date 3.2 Yield stage 1 stage 2 obs. 1 Tool 1. Chamber 1 Tool 2. Chamber 2 obs. 2 Tool 1. Chamber 2 Tool 1. Chamber 1 obs. 3 Tool 2. Chamber 1 Tool 2. Chamber 2 Yield s 1. T 1. Ch 1 s 1. T 1. Ch 2 s 1. T 2. Ch 1 s 2. T 2. Ch 2 s 2. T 2. Ch 2 obs. 1 1 0 0 1 0 obs. 2 0 1 0 0 1 obs. 3 0 0 1 1 0 Yield s 1. T 1. Ch 1 s 1. T 1. Ch 2 s 1. T 2. Ch 1 s 2. T 2. Ch 2 s 2. T 2. Ch 2 obs. 1 Date 1.1 0 0 Date 1.2 0 obs. 2 0 Date 2.1 0 0 Date 2.2 obs. 3 0 0 Date 3.1 Date 2.3 0 6

Data Structure provided by Data Model Obs. var 1 -a 1 var 1 b 1 var 1 -c 1 n 1 1 0 0 n 2 0 0 1 n 3 0 1 0 Pr(i th variable sellected) 1 3 1 3 1 3 var 1 a 1, var 1 b 1, var 1 c 1 d Multinomial selection probability based on engineer experience 1 3, 1 3, 1 3 var 1 -a 1 1,0,0 To randomly pick a point in this space, we need a continues distribution 0,1,0 0,0,1 Distribution over Multinomial (posterior distribution): Dirichlet Distribution var 1 -c 1 var 1 b 1 7

Data Analysis Approach Critical Phenomena: i. High dimensionality caused by transforming categorical variables to dummies ii. iii. Multicollinearity caused by dummies nature Complicated posterior distribution caused hardness for direct variable selection Remedy: Approximate Inference with Sampling Use random sampling (MCMC techniques: Gibbs sampler, Metropolis-Hastings, ) to approximate the distribution and selecting significant explanatories 8

Data Analysis Approach: Gibbs Sampler Beginning with initial value x 1 0, x 2 0 Suppose x 1, x 2 ~Pr x, x 2 Sampling at iteration t as follow: Iteration Sample x 1 Sample x 2 k x t t 1 1 ~Pr x 1 x 2 x t t 2 ~Pr x 2 x 1 Iterating the above step until the sample values have the same distribution as if they where sampled from the true posterior joint distribution Based on frequency of visits, selecting the most probable variables 9

Data Analysis Approach: Data Clearance When X is categorical (dummy var.) & Y is quantitative variable - parametric or non-parametric? - dependent or independent? - unbalanced class? Yield value Representative var. Bad Yield 53.12 < 1 Middle Yield 53.12 and 57.51 ignore Good Yield >57.51 0 10

Variable I Data Analysis Approach: Data Clearance Variable II Level a Level b Level c f ca f cb Level d f da f db If both var. I & var. II are explanatory: - test the Interchangeability of measures - measurement of the degree of Homogeneity If var. I is explanatory and var. II is response: - measurement of the Reliability of instrument (test/scale) - measurement of the Objectivity or lack of bias MEASURMENT of AGREEMENT W. S. Robinson(1957) Cohen s Kappa K K < 0, "No agreement" 0 K < 0.2, Slight agreement 0.2 K < 0.4, "Fair agreement" 0.4 K < 0.6, "Moderate agreement" 0.6 K < 0.8, "Substantial agreement" 0.8 K 1, "Almost perfect agreement" 11

Research Framework (I) Problem Definition Data Preparation A Bayesian Framework for Semiconductor Manufacturing Data Data Integration Dummy Variable Construction for Integrated Variables (1460 var.) THE CLASS DISTRIBUTION FOR THE KAPPA TEST FOR EACH PAIR OF INPUT VARIABLES Almost perfect agreement Substantial agreement Moderate agreement 3 109 1,764 Fair agreement Slight agreement No agreement 24,539 280,081 758,574 Data Mining & Key Factor Screening Cohen s Kappa Statistics for each pairs of input variables No Agreement Agreement Wrap the associate variables Assign Cutting Point & Bad/Middle/Good Wafers 12

Research Framework (II) Data Mining & Key Factor Screening Cohen s Kappa Statistics for each pairs of X & Y No Agreement Data Clearance K 0.2 BVS via Gibbs Sampler Agreement Model RMSE Adjusted R-squared Min Median Max Min Median Max Gibbs + GLM 1.842 2.653 2.841 0.046 0.371 0.711 GBM + GLM 2.534 3.051 3.332 0.000 0.053 0.337 RF + GLM 2.268 2.838 3.660 0.016 0.293 0.507 GLM 7.951 34.60 139.8 0.000 0.029 0.214 Number of resamples 20, Number of iterations 2 Model Construction, Evaluation & Interpretation GLM Construction with Gaussian distribution & Repeated Random Sub-sampling Validation A Comparison to the Wrapped Variables Define Abnormal Devices & Time 13

Decision Graph High Yield Middle Yield Low Yield 14

Decision Table Factors Date Bad Good Stage10 - Tool2 - Chamber3 before 8/29/2014 2:32 after 8/29/2014 12:50 Stage12 - Tool2 - Chamber1 between 8/30/2014 3:26 & 8/30/2014 3:43 before 8/29/2014 10:55 Stage12 - Tool2 - Chamber4 after 8/29/2014 7:36 till 8/30/2014 3:44 before 8/29/2014 7:36 Stage13 - Tool5 - Chamber2 - generally effected the high yield Stage17 - Tool2 - Chamber2 after 8/30/2014 12:21 before 8/30/2014 10:37 Stage23-Tool3-Chamber2 - generally effected the high yield Stage44 - Tool7.- Chamber2 and Chamber3 at 9/3/2014 at 9/1/2014 Stage49 - Tool1.- Chamber4 at 9/3/2014 at 9/2/2014 Stage57 - Tool1.- Chamber3 - generally effected the high yield 15

Conclusion & Path Forward Based on the empirical results, we validate that the proposed approach has practical viability, which means adding the efficacy of domain knowledge and experience to the system could improve results. Using the domain knowledge might be to restrict conjunctions in rules to tools, chambers and steps that are related to occurs within a reasonable time frame. The data are not sampled from a stationary population, hence, over the time, the results may change significantly, or some empirical answer might be reject based on engineer domain knowledge, which doesn t mean that the result is incorrect. The result may be a proxy for one or more events that are occurring elsewhere or at the other periods of the time, hence, the simulation study is an essential tool for evaluation the accuracy of our proposed method. 16

17