The Missing-Index Problem in Process Control
|
|
- Aileen Craig
- 5 years ago
- Views:
Transcription
1 The Missing-Index Problem in Process Control Joseph G. Voelkel CQAS, KGCOE, RIT October 2009 Voelkel (RIT) Missing-Index Problem 10/09 1 / 35
2 Topics 1 Examples Introduction to the Problem 2 Models, Inference, Likelihoods, EM Algorithm 3 Examples of the Method 4 Summary, Future Work Paper at Voelkel (RIT) Missing-Index Problem 10/09 2 / 35
3 Example 1. 8-spindle machine CP = Cyclic Permutation case Sample J=8 consecutive parts Sampled in order of product, but rst index value not known Index j=1,...,j? ? ? 3... J possibilities. Voelkel (RIT) Missing-Index Problem 10/09 3 / 35
4 Example 2. 4-cavity machine GP = General Permutation case Sample the J=4 parts from the shot No order to the parts Index j=1,...,j? ? ? ? 4... J! possibilities. Voelkel (RIT) Missing-Index Problem 10/09 4 / 35
5 Example cavity mold SP = Sampled Permutation Case Sample J=4 parts at random from the N=64 parts Index j=1,...,j? ? ? 3... Will not be discussed in this talk. N! (N J)! possibilities Voelkel (RIT) Missing-Index Problem 10/09 5 / 35
6 Models Index-Known Case General model we consider, if index j were known, is Y ij = + i + j + " j(i) + i ; i = 1; 2; : : : ; I; j = 1; 2; : : : ; J P i i = P j j = 0; " j(i) N 0; 2 ; i N 0; 2 ; ind r.v. s ( i : for non-stochastic events, e.g. mean drifts/shifts) 1 Injection Molding: j =cavities, " j(i) =within-shot variation, i =across-shot variation 2 Multiple-spindle machine: j =spindles. Spindle parts as a group ) " j(i), i analogous to above. Spindle parts one at a time ) model excludes i. Our interest here lies with (; ). Voelkel (RIT) Missing-Index Problem 10/09 6 / 35
7 Models Index Unknown. CP Case If J = 3, index order is either (1; 2; 3), (3; 1; 2), or (2; 3; 1) Ordered set of these 3 vectors CP (3), indexed by k Model same as before except Y ij = + i + j + " j(i) + i =) Y ij = + i + jk (i)+" jk (i) + i Here, (1 k ; 2 k ; : : : J k ) is the (unknown) k th member of CP (J) E.g, k = 2 =) (1 k ; 2 k ; 3 k ) = (3; 1; 2), so Y i1 = + i + 3 +" 2(i) + i Reasonable: k is discrete uniform on 1; 2; : : : J for each i. Voelkel (RIT) Missing-Index Problem 10/09 7 / 35
8 Example CP Case Indices in (unknown) correct order Y1 Y2 Y Actual Y data (CP d) Y1 Y2 Y Voelkel (RIT) Missing-Index Problem 10/09 8 / 35
9 CP Case: Inference Y ij = + i + jk (i) + " jk (i) + i Objective: inference on = ( 1 ; 2 ; : : : ; J ) and (Inference on f i g and (or =J) made from Y i ) Note: information on is contained in contrasts within each time i, e.g. in Y ij Y i for j = 1; 2; : : : ; J Restriction on. We will use J = 0, not P j j = 0 Still identi able (at best) only up to cyclic permutation. Voelkel (RIT) Missing-Index Problem 10/09 9 / 35
10 CP Case: Inference Will not use the J Y ij Y i (correlated) contrasts for inference Instead, will use the J 1 Helmert contrasts: Z i1 = c 1 (Y i2 Y i1 ) Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) ::: PJ 1 Z i;j 1 = c J 1 Y ij j=1 Y ij = (J 1) Here, c j = 1= p 1 + (1=j) de ned so var (Z ij ) = 2 when the indices are correctly aligned Note: Y is I J, Z is I (J 1). Voelkel (RIT) Missing-Index Problem 10/09 10 / 35
11 CP Case: Inference The J 1 Helmert contrasts: Z i1 = c 1 (Y i2 Y i1 ) Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) Actual Y data (CP d) Y1 Y2 Y Z data (w/o the c j...) Z1 Z Voelkel (RIT) Missing-Index Problem 10/09 11 / 35
12 CP Case: Likelihood Y ij = + i + jk (i) + " jk (i) + i Will estimate (; ) using likelihood methods (focus: estimation) Let f (z; ) = density function of N ; 2. For J = 3: Z i1 = c 1 (Y i2 Y i1 ), Z i2 = c 2 (Y i3 (Y i1 + Y i2 ) =2) E [Z i1 ] = c 1 2k (i) 1k (i) E [Z i2 ] = c 2 3k (i) 1k (i) + 2k (i) =2 k is discrete uniform on 1; 2; 3, so i th likelihood contribution is 1=3 of f (z i1 ; c 1 ( 2 1 )) f (z i2 ; c 2 ( 3 ( ) =2)) + f (z i1 ; c 1 ( 1 3 )) f (z i2 ; c 2 ( 2 ( ) =2)) + f (z i1 ; c 1 ( 3 2 )) f (z i2 ; c 2 ( 1 ( ) =2)) Recall 3 = 0, but kept in here for symmetry Note: sum of J terms each term is a product of J 1 terms. Voelkel (RIT) Missing-Index Problem 10/09 12 / 35
13 CP Case: Likelihood Note: sum of J terms each is a product of J 1 terms General J case. Contribution at i to likelihood is 1 P J Q J 1 k=1 j=1 J f Pj i z ij ; c j h jk+1 (i) l=1 jl(i) =j. Voelkel (RIT) Missing-Index Problem 10/09 13 / 35
14 GP Case: Likelihood GP case is analogous to CP case For J = 3, now have J! = 6 permutations (1; 2; 3), (1; 3; 2), (2; 1; 3), (2; 3; 1), (3; 1; 2), and (3; 2; 1) Contribution to the likelihood at time i is 1=6 times f (z i1 ; c 1 ( 2 1 )) f (z i2 ; c 2 ( 3 ( ) =2)) + f (z i1 ; c 1 ( 3 1 )) f (z i2 ; c 2 ( 2 ( ) =2)) + f (z i1 ; c 1 ( 1 2 )) f (z i2 ; c 2 ( 3 ( ) =2)) + f (z i1 ; c 1 ( 3 2 )) f (z i2 ; c 2 ( 1 ( ) =2)) + f (z i1 ; c 1 ( 1 3 )) f (z i2 ; c 2 ( 2 ( ) =2)) + f (z i1 ; c 1 ( 2 3 )) f (z i2 ; c 2 ( 1 ( ) =2)) General J: sum of J! terms each is a product of J Next: back to CP case. 1 terms Voelkel (RIT) Missing-Index Problem 10/09 14 / 35
15 Finding MLE s The log-likelihood is very complex, even in simpler CP case: P I i=1 ln 1 P J Q J 1 J k=1 j=1 f Pj i z ij ; c j h jk+1 (i) l=1 jl(i) =j Maximizing usually fails for all but the simplest CP cases Some similarities to estimation of parameters in normal-mixture case So, consider use of EM algorithm However, our problem is both more and less complex than the normal-mixture problem More complex: likelihood term is sum of products, not just a sum Less complex: mixture probabilities are known. Voelkel (RIT) Missing-Index Problem 10/09 15 / 35
16 EM Algorithm For compactness, de ne (E [Z ij ] for each i) Pj jk = jk () = c j jk l=1 j l =j Then i th contr n to likelihood is 1 P J Q J 1 J k=1 j=1 f z ij; jk Idea behind EM algorithm: consider how likelihood would look if all the CP s were known ( no missing data ) De ne k (i) to be the actual permutation index at time i, and de ne all of these to be k = (k (1) ; k (2) ; : : : ; k (I)) Then incomplete data is Z = (Z 1 ; Z 2 ; : : : ; Z I ), where Z i = (Z i1 ; Z i2 ; : : : ; Z i;j 1 ) is the observed (transformed) data at time i, and complete data is (Z; k ). Voelkel (RIT) Missing-Index Problem 10/09 16 / 35
17 EM Algorithm The i th contr n to incomplete-data log likelihood is the complex 1 P ln J Q J 1 k=1 j=1 J f z ij; jk However, the i th contr n to the complete-data log likelihood is then simply P J 1 j=1 z ln f ij ; jk (i) () Simple index-known case: easy to maximize with respect to (; ) For example, if data indices rearranged so that k (i) = 1 for each i, then the MLE of 2 1 would simply be Y 2 Y1 Idea of EM algorithm: estimate the correct indices so complete-data log likelihood can be used. Voelkel (RIT) Missing-Index Problem 10/09 17 / 35
18 Cycle of EM Algorithm Each cycle of the EM algorithm requires that we 1 Use current estimates ^ c at iteration c, of = (; ) 2 Find conditional expectation of complete-data (i.e., (Z; k )) log likelihood, conditional on the incomplete data Z (using at ^ c to obtain conditional expectation) 3 Maximize the result in (2) with respect to, to get ^ c+1 4 Continue until convergence Questions: Initial estimates Expectation step (interesting!) Maximization step (also interesting!) Convergence criteria. Voelkel (RIT) Missing-Index Problem 10/09 18 / 35
19 EM Algorithm: Expectation step For EM algorithm, it is useful to write complete-data log-likelihood ` () = P I P J 1 i=1 j=1 z ln f ij ; jk (i) () as ` () = P I i=1 P J k(i)=1 (k (i) ; k (i)) P J 1 j=1 z ln f ij ; jk(i) () where (k; k ) = 1 if k = k and is 0 otherwise This makes conditional expectation easier to obtain k (i) is now part of a linear term. Voelkel (RIT) Missing-Index Problem 10/09 19 / 35
20 EM Algorithm: Expectation step Expectation step requires nding `em () = E [` () jz] (conditional expectation evaluated at ^ c ) Here, need to nd `em () = P I i=1 P J k(i)=1 E [ (k (i) ; k (i)) jz i ] P J 1 j=1 z ln f ij ; jk(i) () Voelkel (RIT) Missing-Index Problem 10/09 20 / 35
21 EM Algorithm: Expectation step Need to nd `em () = P I i=1 We can solve this. Find that P J k(i)=1 E [ (k (i) ; k (i)) jz i ] P J 1 j=1 z ln f ij ; jk(i) () E [ (k (i) ; k (i)) jz i ] = P ( (k (i) ; k (i)) = 1 jz i ) Q J 1 j=1 z f ij ; jk(i) () = P J Q J 1 k(i)=1 j=1 z f ij ; jk(i) () (i; k (i)), where (i; k (i)) = P i th CP has index k is called the responsibility of permutation k to observation i Evaluating at ^ c, we obtain estimates ^ c (i; k (i)) of the (i; k (i)). Voelkel (RIT) Missing-Index Problem 10/09 21 / 35
22 EM Algorithm: Expectation step responsibility ^ (i; k (i)) = ^P i th CP has index k Assume ^ c = ^ c ; ^ c = ((4; 1; 0) ; 1) [index k=1] f (z i1 ; c 1 ( 2 1 )) f (z i2 ; :::) + [index k=2] f (z i1 ; c 1 ( 1 3 )) f (z i2 ; :::) + [index k=3] f (z i1 ; c 1 ( 3 2 )) f (z i2 ; :::) Z i data (w/o the c j...). Just consider Z i1 for i = 1 and i = ( 3)? 1 3 (4)? 3 2 ( 1)? ( 3)? 1 3 (4)? 3 2 ( 1)?. Voelkel (RIT) Missing-Index Problem 10/09 22 / 35
23 EM Algorithm: Maximization step Next, we need to maximize `em () = P I i=1 with respect to = (; ) P J k(i)=1 ^ (i; k (i)) P J 1 j=1 z ln f ij ; jk(i) () Close look at sum: portion is mathematically equivalent to a weighted sum of squares in a linear-regression framework, so associated matrix methods can be directly applied However, instead of the usual I terms, we now have IJ (J 1) terms. Voelkel (RIT) Missing-Index Problem 10/09 23 / 35
24 EM Algorithm: Maximization step For matrix machinery, de ne X (predictor) and W (weight) matrices and U (response) vector. For J = 3, here are portions for a given i. (k (i) ; j) X portion diag (W) portion U portion (1; 1) c 1 c 1 0 (i; 1) Z i1 (1; 2) c 2 =2 c 2 =2 c 2 (i; 1) Z i2 (2; 1) c 1 0 c 1 (i; 2) Z i1 (2; 2) 6 c 2 =2 c 2 c 2 =2 7 6 (i; 2) 7 6Z i2 7 (3; 1) 4 0 c 1 c (i; 3) 5 4Z i1 5 (3; 2) c 2 c 2 =2 c 2 =2 (i; 3) Z i2 Solution is (X not full column rank) ^ = X 0 WX X 0 WU. Voelkel (RIT) Missing-Index Problem 10/09 24 / 35
25 EM Algorithm: J=2 case Some algebra leads to An extreme case ^ 1 = (1=Ic 1 ) P I i=1 Z i1 (1 2 (i; 1)) = (1=I) P I i=1 (Y i2 Y i1 ) (1 2 (i; 1)) Say all permutations just happen to be (1; 2) and separation via is much larger than noise via Then estimated values (i; 1) 1, so ^ 1 P I i=1 (Y i1 Y i2 ) =I If permutations have clean separation, each (i; 1) weight either 0 or 1, an indicator of which permutation took place If poor separation, (i; k (i)) tend to be closer to 0:5 in extreme case (all weights = 0:5) get ^ 1 = 0: 0 estimated separation. Voelkel (RIT) Missing-Index Problem 10/09 25 / 35
26 EM Algorithm: Implementation Expectation step Maximization step Questions Initial estimates Convergence criteria Initial permutation of rows. (= Voelkel (RIT) Missing-Index Problem 10/09 26 / 35
27 EM Algorithm: Implementation Initial permutation of rows? Can get initial estimates if reasonable idea of how to permute rows If some signal in the data, this is possible Data Y Y1 Y2 Y Best signal=row with max var? Align other rows to this one Permuted data Y Y1 Y2 Y (= Voelkel (RIT) Missing-Index Problem 10/09 27 / 35
28 EM Algorithm: Implementation Advantage of initial permutation of rows? 1 Initial estimates of the parameters. For, estimate is Y 1 Y J; Y 2 Y J; : : : ; Y (J 1) Y J; 0 2 Best-guess row alignments in place. If reasonably strong signal, most likely permutation in CP (J) to be correct for each i is k = 1 (no permutation) So f^ c (i; k (i)) ; i = 1; 2; : : : ; Ig 1 for k (i) = 1 and 0 for other k (i), across iterations c = 1; 2; : : : 3 If signal very weak, will observe that the f^ c (i; 1) ; i = 1; 2; : : : ; Ig tends to decrease with c algorithm indicates that initial row-alignments could be other ones. Voelkel (RIT) Missing-Index Problem 10/09 28 / 35
29 EM Algorithm: Implementation Convergence criteria? 1 Stability of f^ c (i; k (i))g across iterations c determines stability of estimates 2 So, reasonable stopping rule is rst c such that max i;k where cutoff is, say, ^ c (i; k) ^ c 1 (i; k) < i;k(i) Voelkel (RIT) Missing-Index Problem 10/09 29 / 35
30 Example 1: 8-spindle machine, CP case 24 rows of data Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y Boxplots, after mean-centering each row, and then row-aligning... Voelkel (RIT) Missing-Index Problem 10/09 30 / 35
31 Distribution of Y (Row-Mean Centered) at each Index Y Index
32 Ref Row 4: Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y Distribution after Row-Alignment Permutations Y Index (Permuted)
33 β^c vs Iteration c Iteration c β^c
34 σ^ c vs Iteration c Iteration c σ^c
35 Iteration c= Permutation index k γ^c(., k)
36 Iteration c= Permutation index k γ^c(., k)
37 Iteration c= Permutation index k γ^c(., k)
38 Iteration c= Permutation index k γ^c(., k)
39 Iteration c= Permutation index k γ^c(., k)
40 Iteration c= Permutation index k γ^c(., k)
41 β^ and Actual Y means (Both Centered) β^ and Y Y Index
42 β^ and Actual Y means (Both Centered) β^ and Actual Y means (Centered and Aligned) β^ and Y Y β^ and Y Index Index
43 Y Density Estimates Indices Known (Row Adjusted) Y Density Normal Estimates Indices Unknown Y Y
44 Example 2: 4-cavity machine, GP case 24 rows of data, again... Y1 Y2 Y3 Y Results... Voelkel (RIT) Missing-Index Problem 10/09 31 / 35
45 Distr'n of Y at each Index (Row-Mean Centered) Distribution after Row-Alignment Permutations Y Y Index Index (Permuted)
46 β^ c vs Iteration c Iteration c β^c
47 σ^ c vs Iteration c σ^c Iteration c
48 Iteration c= 2 γ^c(., k) Permutation index k
49 Iteration c= 3 γ^c(., k) Permutation index k
50 Iteration c= 5 γ^c(., k) Permutation index k
51 Iteration c= 10 γ^c(., k) Permutation index k
52 Iteration c= 20 γ^c(., k) Permutation index k
53 Iteration c= 50 γ^c(., k) Permutation index k
54 ( ) ( ) ( ) ( ) 0.8 Iteration c= 112 γ^c(., k) Permutation index k
55 β^ and Actual Y means (Centered and Aligned) Y Index β^ and Y
56 Example 4: Random Data, CP case I=100 rows, J=8 N (0; 1) data Results... Voelkel (RIT) Missing-Index Problem 10/09 32 / 35
57 I=100 J=8 N(0,1) Distribution of Y (Row-Mean Centered) at each Index Y Index
58 Distribution after Row-Alignment Permutations Y Index (Permuted)
59 β^ c vs Iteration c β^c Iteration c
60 σ^ c vs Iteration c σ^c Iteration c
61 Iteration c= Permutation index k γ^c(., k)
62 Iteration c= Permutation index k γ^c(., k)
63 Iteration c= Permutation index k γ^c(., k)
64 β^ and Actual Y means (Both Centered) Y Index β^ and Y
65 Example 4a: Random Data, CP case I=100 rows, J=8 N (0; 1) data Shrinkage of initial location ( j ) estimates? Expansion of initial scale estimate? Results... Voelkel (RIT) Missing-Index Problem 10/09 33 / 35
66 shrink initial β est s by 2 enlarge initial σ est s by 2 β^ c vs Iteration c β^c Iteration c
67 New solution Old solution (!) β^ and Actual Y means (Both Centered) β^ and Actual Y means (Both Centered) β^ and Y Y β^ and Y Y Index Index
68 Using Same values... β^c ˆ ˆ β 5 Using ˆ β N 0, ( 0.01 = = ) β c= 1 c= 1 D β c = 1, ( 2 ) = = ( ) vs Iteration c β^c vs Iteration c β^c β^c Iteration c Iteration c
69 Summary (and future work) Both CP and GP methods appear to work well in cases with good signals Under H 0, results might be optimistic Future work 1 Approximate the standard errors of the estimates 2 Implement likelihood-ratio tests 3 Investigate asymptotics 4 Investigate H 0 case in detail 5 Consider shrinkage of nal estimates 6 Improve (linear) convergence rate, e.g. Aitken s acceleration technique. Voelkel (RIT) Missing-Index Problem 10/09 34 / 35
70 Questions? Voelkel (RIT) Missing-Index Problem 10/09 35 / 35
Computational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More information1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation
1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations
More informationLecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation
Lecture Cholesky method QR decomposition Terminology Linear systems: Eigensystems: Jacobi transformations QR transformation Cholesky method: For a symmetric positive definite matrix, one can do an LU decomposition
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationMaximum Likelihood Estimation
Connexions module: m11446 1 Maximum Likelihood Estimation Clayton Scott Robert Nowak This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License Abstract
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationSection 6.3 Richardson s Extrapolation. Extrapolation (To infer or estimate by extending or projecting known information.)
Section 6.3 Richardson s Extrapolation Key Terms: Extrapolation (To infer or estimate by extending or projecting known information.) Illustrated using Finite Differences The difference between Interpolation
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationNumerical Methods
Numerical Methods 263-2014 Prof. M. K. Banda Botany Building: 2-10. Prof. M. K. Banda (Tuks) WTW263 Semester II 1 / 18 Topic 1: Solving Nonlinear Equations Prof. M. K. Banda (Tuks) WTW263 Semester II 2
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationIntroduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim
Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationAn exploration of matrix equilibration
An exploration of matrix equilibration Paul Liu Abstract We review three algorithms that scale the innity-norm of each row and column in a matrix to. The rst algorithm applies to unsymmetric matrices,
More informationDeterminants - Uniqueness and Properties
Determinants - Uniqueness and Properties 2-2-2008 In order to show that there s only one determinant function on M(n, R), I m going to derive another formula for the determinant It involves permutations
More informationSherman-Morrison-Woodbury
Week 5: Wednesday, Sep 23 Sherman-Mrison-Woodbury The Sherman-Mrison fmula describes the solution of A+uv T when there is already a factization f A. An easy way to derive the fmula is through block Gaussian
More informationLinear Algebra Practice Final
. Let (a) First, Linear Algebra Practice Final Summer 3 3 A = 5 3 3 rref([a ) = 5 so if we let x 5 = t, then x 4 = t, x 3 =, x = t, and x = t, so that t t x = t = t t whence ker A = span(,,,, ) and a basis
More informationApplications of Hidden Markov Models
18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation
More informationThis ensures that we walk downhill. For fixed λ not even this may be the case.
Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationFor a system with more than one electron, we can t solve the Schrödinger Eq. exactly. We must develop methods of approximation, such as
VARIATIO METHOD For a system with more than one electron, we can t solve the Schrödinger Eq. exactly. We must develop methods of approximation, such as Variation Method Perturbation Theory Combination
More informationLecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationMatrices and systems of linear equations
Matrices and systems of linear equations Samy Tindel Purdue University Differential equations and linear algebra - MA 262 Taken from Differential equations and linear algebra by Goode and Annin Samy T.
More informationStrassen-like algorithms for symmetric tensor contractions
Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline
More information. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in
Vectors and Matrices Continued Remember that our goal is to write a system of algebraic equations as a matrix equation. Suppose we have the n linear algebraic equations a x + a 2 x 2 + a n x n = b a 2
More informationAccelerating Convergence
Accelerating Convergence MATH 375 Numerical Analysis J. Robert Buchanan Department of Mathematics Fall 2013 Motivation We have seen that most fixed-point methods for root finding converge only linearly
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationThe Solution of Linear Systems AX = B
Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has
More informationSimple Estimators for Semiparametric Multinomial Choice Models
Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationDEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular
form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix
More informationTime Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley
Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationAn average case analysis of a dierential attack. on a class of SP-networks. Distributed Systems Technology Centre, and
An average case analysis of a dierential attack on a class of SP-networks Luke O'Connor Distributed Systems Technology Centre, and Information Security Research Center, QUT Brisbane, Australia Abstract
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria
ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor guirregabiria SOLUTION TO FINL EXM Monday, pril 14, 2014. From 9:00am-12:00pm (3 hours) INSTRUCTIONS:
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationVariable Selection in Predictive Regressions
Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationExpectation-Maximization
Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationSecretary Problems. Petropanagiotaki Maria. January MPLA, Algorithms & Complexity 2
January 15 2015 MPLA, Algorithms & Complexity 2 Simplest form of the problem 1 The candidates are totally ordered from best to worst with no ties. 2 The candidates arrive sequentially in random order.
More informationEigenvalues and diagonalization
Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationBoolean Inner-Product Spaces and Boolean Matrices
Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationDefinition 2.3. We define addition and multiplication of matrices as follows.
14 Chapter 2 Matrices In this chapter, we review matrix algebra from Linear Algebra I, consider row and column operations on matrices, and define the rank of a matrix. Along the way prove that the row
More informationLecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013
Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationVector, Matrix, and Tensor Derivatives
Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions
More informationHypothesis Testing for Var-Cov Components
Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationIndex Notation for Vector Calculus
Index Notation for Vector Calculus by Ilan Ben-Yaacov and Francesc Roig Copyright c 2006 Index notation, also commonly known as subscript notation or tensor notation, is an extremely useful tool for performing
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationThree-Way Tables (continued):
STAT5602 Categorical Data Analysis Mills 2015 page 110 Three-Way Tables (continued) Now let us look back over the br preference example. We have fitted the following loglinear models 1.MODELX,Y,Z logm
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationUsing R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*
Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses* Amy G. Froelich Michael D. Larsen Iowa State University *The work presented in this talk was partially supported by
More informationGROUP THEORY PRIMER. New terms: tensor, rank k tensor, Young tableau, Young diagram, hook, hook length, factors over hooks rule
GROUP THEORY PRIMER New terms: tensor, rank k tensor, Young tableau, Young diagram, hook, hook length, factors over hooks rule 1. Tensor methods for su(n) To study some aspects of representations of a
More informationLecture Notes 1: Matrix Algebra Part C: Pivoting and Matrix Decomposition
University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 1 of 46 Lecture Notes 1: Matrix Algebra Part C: Pivoting and Matrix Decomposition Peter J. Hammond Autumn 2012, revised Autumn 2014 University
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationCOURSE Iterative methods for solving linear systems
COURSE 0 4.3. Iterative methods for solving linear systems Because of round-off errors, direct methods become less efficient than iterative methods for large systems (>00 000 variables). An iterative scheme
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More information