Constraint Reasoning and Kernel Clustering for Pattern Decomposition With Scaling

Constraint Reasoning and Kernel Clustering for Pattern Decomposition With Scaling Ronan LeBras Theodoros Damoulas Ashish Sabharwal Carla P. Gomes John M. Gregoire Bruce van Dover Computer Science Computer Science Computer Science Computer Science Materials Science / Physics Materials Science / Physics Sept 15, 2011 CP 11

Motivation Cornell Fuel Cell Institute Mission: develop new materials for fuel cells. An Electrocatalyst must: 1) Be electronically conducting 2) Facilitate both reactions Platinum is the best known metal to fulfill that role, but: 1) The reaction rate is still considered slow (causing energy loss) 2) Platinum is fairly costly, intolerant to fuel contaminants, and has a short lifetime. Goal: Find an intermetallic compound that is a better catalyst than Pt. 2

Motivation Recipe for finding alternatives to Platinum 1) In a vacuum chamber, place a silicon wafer. 2) Add three metals. 3) Mix until smooth, using three sputter guns. 4) Bake for 2 hours at 650ºC Rh Deliberately inhomogeneous composition on Si wafer (38% Ta, 45% Rh, 17% Pd) Atoms are intimately mixed [Source: Pyrotope, Sebastien Merkel] Ta Pd 3

Motivation Identifying crystal structure using X-Ray Diffraction at CHESS XRD pattern characterizes the underlying crystal fairly well Expensive experimentations: Bruce van Dover s research team has access to the facility one week every year. Rh 120 100 80 60 40 20 0 15 20 25 30 35 40 45 50 55 60 [Source: Pyrotope, Sebastien Merkel] (38% Ta, 45% Rh, 17% Pd) Ta Pd 4

Motivation Rh γ β α Ta Pd δ 5

Motivation Rh γ β γ+δ α+β Ta α δ Pd 6

Motivation Rh γ β α Ta Pd δ 7

Motivation 120 100 80 60 Rh γ 40 20 0 15 20 25 30 35 40 45 50 55 60 120 100 80 60 40 20 β P j 0 15 20 25 30 35 40 45 50 55 60 P i α+β Ta α Pd δ 8

Motivation 120 100 80 Rh γ 60 40 20 0 15 20 25 30 35 40 45 50 55 60 70 60 50 40 30 20 10 0 15 20 25 30 35 40 45 50 55 60 65 β P k 120 100 80 60 P j α+β 40 20 0 15 20 25 30 35 40 45 50 55 60 Ta α P i Pd δ 9

120 100 80 60 40 20 0 15 20 25 30 35 40 45 50 55 60 120 100 80 60 40 20 15 20 25 30 35 40 45 50 55 60 0 Motivation INPUT: Al OUTPUT: m phase regions k pure regions m-k mixed regions Al pure phase region Mixed phase region Fe Si XRD pattern characterizing pure phases Fe Si Additional Physical characteristics: Peaks shift by 15% within a region Phase Connectivity Mixtures of 3 phases Small peaks might be discriminative Peak locations matter, more than peak intensities 10

Motivation Rh Ta Pd Figure 1: Phase regions of Ta-Rh-Pd Figure 2: Fluorescence activity of Ta-Rh-Pd 11

Outline Motivation Problem Definition Abstraction Hardness CP Model Kernel-based Clustering Bridging CP and Machine learning Conclusion and Future work 12

Problem Abstraction: Pattern Decomposition with Scaling Input Output 13

Problem Abstraction: Pattern Decomposition with Scaling Input Output v 1 v 2 v 3 v 4 v 5 14

Problem Abstraction: Pattern Decomposition with Scaling Input Output v 1 P 1 v 2 P 2 v 3 P 3 v 4 P 4 v 5 P 5 15

Problem Abstraction: Pattern Decomposition with Scaling Input Output v 1 P 1 v 2 P 2 v 3 P 3 v 4 P 4 v 5 P 5 M= K = 2, δ = 1.5 16

Problem Abstraction: Pattern Decomposition with Scaling Input Output B 1 v 1 P 1 v 2 P 2 v 3 P 3 v 4 P 4 v 5 P 5 M= K = 2, δ = 1.5 B 2 17

Problem Abstraction: Pattern Decomposition with Scaling Input Output v 1 P 1 B 1 s 11=1.00 s 21 =0.68 v 2 P 2 s 12 =0.95 s 22 =0.78 v 3 P 3 s 13 =0.88 s 23 =0.85 v 4 P 4 s 14 =0.84 s 24 =0.96 v 5 P 5 s 15 =0 s 25 =1.00 M= K = 2, δ = 1.5 B 2 18

Problem Hardness Assumptions: Each B k appears by itself in some v i / No experimental noise The problem can be solved* in polynomial time. Assumption: No experimental noise The problem becomes NP-hard (reduction from the Set Basis problem) v 1 v 2 v 3 v 4 v 5 Assumption used in this work: Experimental noise in the form of missing elements in P i P 0 = B 1 s 11=1.00 P 1 P 2 P 3 P 4 P 5 s 12 =0.95 s 13 =0.88 s 14 =0.84 s 15 =0 s 21 =0.68 s 22 =0.78 s 23 =0.85 s 24 =0.96 s 25 =1.00 M= K = 2, δ = 1.5 B 2 19

Outline Motivation Problem Definition CP Model Model Experimental Results Kernel-based Clustering Bridging CP and Machine learning Conclusion and Future work 20

CP Model 21

CP Model (continued) Advantage: Captures physical properties and relies on peak location rather than height. Drawback: Does not scale to realistic instances; poor propagation if experimental noise. 22

Running time (in s) CP Model Experimental Results 1400 1200 1000 800 600 400 200 0 Number of unknown phases vs. running times (AlLiFe with P = 24) 0 1 2 3 4 5 6 Number of unknown phases (K ) N=10 N=15 N=28 N=218 For realistic instances, K = 6 and N 218... 23

Outline Motivation Problem Definition CP Model Kernel-based Clustering Bridging CP and Machine learning Conclusion and Future work 24

Kernel-based Clustering Set of features: X = Similarity matrices: [X.X T ] + + Better approach: take shifts into account + Red: similar Blue: dissimilar Method: K-means on Dynamic Time-Warping kernel Goal: Select groups of samples that belong to the same phase region to feed the CP model, in order to extract the underlying phases of these sub-problems. 25

Bridging CP and ML Goal: a robust, physically meaningful, scalable, automated solution method that combines: Underlying Physics A language for Constraints enforcing local details Machine Learning for a global data-driven view Constraint Programming model Similarity Kernels & Clustering 26

Bridging Constraint Reasoning and Machine Learning: Overview of the Methodology 120 100 80 60 40 20 0 15 20 25 30 35 40 45 50 55 60 120 100 80 60 40 20 0 15 20 25 30 35 40 45 50 55 60 INPUT: Al Machine Learning: Fe Si Kernel methods, Dynamic Time Wrapping Peak detection Fix errors in data Machine Learning: Partial Clustering Al Full CP Model guided by partial solutions CP Model & Solver on sub-problems Fe Si OUTPUT only only, +, 27

Experimental Validation Al-Li-Fe instance with 6 phases: Ground truth (known) Previous work (NMF) violates many physical requirements Our CP + ML hybrid approach is much more robust 28

Conclusion Hybrid approach to clustering under constraints More robust than data-driven global ML approaches More scalable than a pure CP model locally enforcing constraints An exciting application in close collaboration with physicists Best inference out of expensive experiments Towards the design of better fuel cell technology 29

Future work Ongoing work: Spatial Clustering, to further enhance cluster quality Bayesian Approach, to better exploit prior knowledge about local smoothness and available inorganic libraries Active learning: Where to sample next, assuming we can interfere with the sampling process? When to stop sampling if sufficient information has been obtained? Correlating catalytic properties across many thin-films: In order to understand the underlying physical mechanism of catalysis and to find promising intermettalic compounds 30

The end Thank you! 31

Extra slides 32

Experimental Sample Example on Al-Li-Fe diagram: 33

Applications with similar structure Fire Detection Detecting/Locating fires. Flight Calls / Bird conservation Analogy: basis pattern = warmth sources samples = temperature recordings physical constraints = gradient of temperatures, material properties Identifying bird population from sound recordings at night. Analogy: basis pattern = species samples = recordings physical constraints = spatial constraints, species and season specificities 34

Previous Work 1: Cluster Analysis (Long et al., 2007) x i = (Feature vector) (Pearson correlation coefficients) (Distance matrix) (PCA 3 dimensional approx) (Hierarchical Agglomerative Clustering) Drawback: Requires sampling of pure phases, detects phase regions (not phases), overlooks peak shifts, may violate physical constraints (phase continuity, etc.). 35

Previous Work 2: NMF (Long et al., 2009) x i = X = A.S + E Min E (Feature vector) (Linear positive combination (A) of basis patterns (S)) (Minimizing squared Frobenius norm Drawback: Overlooks peak shifts (linear combination only), may violate physical constraints (phase continuity, etc.). 36