Finding Nature s Missing Ternary Oxide Compounds using Machine Learning and Density Functional Theory: supporting information

Similar documents
Finding Nature s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory

Introduction to Signal Detection and Classification. Phani Chavali

Data-mined similarity function between material compositions

Bayesian Decision Theory

Model Accuracy Measures

Thermal Stabilities of Delithiated Olivine MPO[subscript 4] (M=Fe,Mn) Cathodes investigated using First Principles Calculations

Machine Learning for the Materials Scientist

Performance Evaluation and Comparison

Novel mixed polyanions lithium-ion battery cathode materials predicted by high-throughput ab initio computations

Application of Machine Learning to Materials Discovery and Development

Detection theory 101 ELEC-E5410 Signal Processing for Communications

11B, 11E Temperature and heat are related but not identical.

Multivariate statistical methods and data mining in particle physics

A COMPUTATIONAL INVESTIGATION OF MIGRATION ENTHALPIES AND ELECTRONIC STRUCTURE IN SrFeO 3-δ

Supplementary information for: Requirements for Reversible Extra-Capacity in Li-Rich Layered Oxides for Li-Ion Batteries

Induction of Decision Trees

Miami Dade College CHM 1045 First Semester General Chemistry

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

DATE: NAME: CLASS: BLM 1-9 ASSESSMENT. 2. A material safety data sheet must show the date on which it was prepared.

Supporting information

Performance evaluation of binary classifiers

Real and virtual screening for materials discovery through first principles calculations

Generalization to a zero-data task: an empirical study

2/18/2013. Spontaneity, Entropy & Free Energy Chapter 16. Spontaneity Process and Entropy Spontaneity Process and Entropy 16.

Chemistry 111 Syllabus

Phosphates as Lithium-Ion Battery Cathodes: An Evaluation Based on High-Throughput ab Initio Calculations

Q1. The electronic structure of the atoms of five elements are shown in the figure below.

Least Squares Classification

Supporting Information. Kinetically-Driven Phase Transformation during Lithiation in Copper Sulfide Nanoflakes

Topic 4 Thermodynamics

Mixtures of Gaussians with Sparse Structure

Chemistry-A I Can Statements Mr. D. Scott; CHS

Name Date Class STATES OF MATTER

Thermodynamics: Entropy

Support Vector Machines Explained

Rh 3d. Co 2p. Binding Energy (ev) Binding Energy (ev) (b) (a)

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

CHEMISTRY 121 FG Spring 2013 Course Syllabus Rahel Bokretsion Office 3624, Office hour Tuesday 11:00 AM-12:00 PM

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Contents. 1 Matter: Its Properties and Measurement 1. 2 Atoms and the Atomic Theory Chemical Compounds Chemical Reactions 111

Chapter 3 Classification of Elements and Periodicity in Properties

Advanced statistical methods for data analysis Lecture 1

Simple Neural Nets For Pattern Classification

Naïve Bayes classification

SGN (4 cr) Chapter 5

Lecture 9: Classification, LDA

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Chem. 1B Final Practice First letter of last name

Supporting Information. Directing the Breathing Behavior of Pillared-Layered. Metal Organic Frameworks via a Systematic Library of

Machine Learning Lecture 7

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Chemistry (

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Machine Learning Linear Classification. Prof. Matteo Matteucci

Group Discriminatory Power of Handwritten Characters

Question 3.2: Which important property did Mendeleev use to classify the elements in his periodic table and did he stick to that?

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Lecture 9: Classification, LDA

p(x ω i 0.4 ω 2 ω

Hydrogen adsorption by graphite intercalation compounds

BLAST: Target frequencies and information content Dannie Durand

Geology 212 Petrology Prof. Stephen A. Nelson. Thermodynamics and Metamorphism. Equilibrium and Thermodynamics

Treatment of Error in Experimental Measurements

Chapter 10: Statistical Quality Control

Concentration-based Delta Check for Laboratory Error Detection

arxiv: v1 [cond-mat.mtrl-sci] 17 Nov 2017

Measuring Discriminant and Characteristic Capability for Building and Assessing Classifiers

EM Algorithm & High Dimensional Data

Chemistry Chapter 16. Reaction Energy

Classification: The rest of the story

Data Mined Ionic Substitutions for the Discovery of New Compounds

Massachusetts Tests for Educator Licensure

Chemistry Curriculum Map

Machine Learning for Signal Processing Bayes Classification and Regression

THE BIG IDEA: ELECTRONS AND THE STRUCTURE OF ATOMS. BONDING AND INTERACTIONS.

Q1. The electronic structure of the atoms of five elements are shown in the figure below.

Supporting information. Origins of High Electrolyte-Electrode Interfacial Resistances in Lithium Cells. Containing Garnet Type LLZO Solid Electrolytes

STRUCTURAL BIOINFORMATICS I. Fall 2015

PDF-2 Tools and Searches

General Chemistry 201 Section ABC Harry S. Truman College Spring Semester 2014

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Table of Contents Preface PART I. MATHEMATICS

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

Structure and Formation Mechanism of Black TiO 2 Nanoparticles

MACHINE LEARNING ADVANCED MACHINE LEARNING

Measures of Basicity in Silicate Minerals Revealed by Alkali Exchange Energetics

ECS289: Scalable Machine Learning

Course Title: Academic chemistry Topic/Concept: Chapter 1 Time Allotment: 11 day Unit Sequence: 1 Major Concepts to be learned:

Pointwise Exact Bootstrap Distributions of Cost Curves

SUPPLEMENTARY INFORMATION

Machine Learning: Exercise Sheet 2

CHEMISTRY CLASS XI CHAPTER 3 STRUCTURE OF ATOM

Chem. 1A Final Practice Test 1

HydroGEN Kick-Off Meeting University of Colorado, Boulder STCH. Charles Musgrave, University of Colorado - Boulder November 14, 2017 NREL, Golden, CO

How to Deal with Multiple-Targets in Speaker Identification Systems?

Predicting Storms: Logistic Regression versus Random Forests for Unbalanced Data

2815/01 CHEMISTRY. Trends and Patterns. OXFORD CAMBRIDGE AND RSA EXAMINATIONS Advanced GCE

Analyze the roles of enzymes in biochemical reactions

General Chemistry (Second Quarter)

Transcription:

Finding Nature s Missing Ternary Oxide Compounds using Machine Learning and Density Functional Theory: supporting information Geoffroy Hautier, Christopher C. Fischer, Anubhav Jain, Tim Mueller, Gerbrand Ceder Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA Cross-Validation results on ternary oxides Cross-validation testing of the predictive strength of our probabilistic model was performed by grouping all the possible ternary oxides chemical systems into 20 random groups. Every crossvalidation step involved removing all the systems included in one group from the database (the test set), training the model with the remaining groups (the training set) and testing how well the model performed in predicting back the compounds present in the test set. Each of these steps was repeated 20 times ensuring that every possible ternary oxide chemical system was present once in the cross-validation test. The cross-validation tests excluded structural prototypes unique to one composition as our data mining method can never predict such crystal structures. A first cross-validation was performed to test the capability of the model to discover compound forming compositions using the compound probability p compound (equation (1)). This is a binary classification problem. The classifier must be able to discriminate between compound forming and non-compound forming compositions. A classical way to evaluate such a classifier is to plot a receiver operating characteristic (ROC) curve 1. A ROC graph plots the true positive rate (i.e. the hit-rate) as a function of the false positive rate (i.e. the false alarm rate). Figure 1 plots in blue this ROC curve from the cross-validation of our model on the ternary oxides ICSD. The dashed red line indicates the random guess discrimination line. Any binary classifier above this line is e-mail:gceder@mit.edu 1

Figure 1: Receiver operating characteristic (ROC) plot for the compound forming composition predictions. The blue curve indicates the ROC curve obtained from our model during cross-validation on the ternary oxides ICSD. The dashed red line is the random guess discrimination line. Any data point above it belongs to a classifier more efficient than a random guess. The perfect classifier (false positive rate equals to zero and true positive rate to one) is in the upper left corner and indicated by a red dot. more efficient than a random guess. The perfect classifier (false positive rate equals to zero and true positive rate to one) is in the upper left corner and indicated by a red dot. The position of the blue curve compared to the random guess discrimination line indicates the predictive power of our model in terms of compound forming compositions discovery. Having tested the method to suggest compound forming compositions we focus on the structure prediction component. Here, the prediction consists of computing from the probabilistic model the l most likely structure prototypes for the given composition. Using the same 20-fold cross-validation procedure, a structure prediction was performed for all compositions for which a compound exists in the ICSD. Figure 2 shows the crystal structure prototype list length l needed to predict the correct ground state with a given probability during this cross-validation. The different curves are plotted for compositions sets that pass different p compound thresholds. Not surprisingly, the two steps of the compound search (composition and structure prediction) are linked. The list length needed depends on the p compound value. Compounds at very likely compositions (i.e. with high p compound values) generally need less candidates structures. The dashed line on the figure shows that a high success rate (95%) can be achieved with a reasonable number (between 4 and 20) of candidates. 2

Figure 2: Cross-validation result for the prediction of the crystal structure at a compound-forming composition in the ICSD. Crystal structure candidate list length needed versus a certain probability to predict the right structure during cross-validation. The different curves correspond to different threshold for p compound. Only compositions with a p compound high enough to recover 100%, 84% and 45% of the known ICSD compound forming compositions during Cross-Validation are taken into account in the corresponding curve. Compositions with higher p compound need fewer candidates crystal structure to find the right one. The dashed line shows that the method can predict with high success rate (95 percent) the right crystal structure prototype for reasonable number (between 4 and 20) of candidates. 3

Comparison of the predictions to the structural data present in the PDF4+ database Some of our predictions presented an entry with structural information at the same composition in the PDF4+ database. For those, a careful comparison between the structural data present in the PDF4+ database and our prediction has been carried out. The two structures (the predicted one and the one present in the PDF4+ database) have been compared using the affine mapping comparison scheme 2,3. Among the 146 predicted compounds with structural information in the PDF4+ database, 107 present a similar structure in the PDF4+ database. 11 structures presented partial occupancy in the PDF4+ database and could not be compared directly to the predictions. For 22 compositions, while not the ground state structure according to DFT, the structure present in the PDF4+ was in the candidate list and had been computed. The vast majority of the PDF4+ structures lie close enough in energy (between 3 and 20 mev/at) to expect entropy or kinetics effects to explain their difference to the DFT ground state. However, one entry: FeSnO 3 presented a PDF4+ structure 590 mev/at above the ilmenite ground state we found. After verification, we discovered that this entry (04-008-6519) presented a transcription error while imported from the original paper it referred to 4. This illustrates how combining machine learning techniques with DFT can also help discovering errors present in crystal structure database. Six compounds in total presented a crystal structure in the PDF4+ not present in any of the candidates we tested by DFT. A first set consists of the HoBrO, TmBrO, PrBrO and CeBrO compounds. The failure of the method to identify the adequate crystal structure comes here from the absence of this crystal structure prototype in our data set. On the other hand, two compounds LuP 3 O 9 and DyP 3 O 9 did have the PDF4+ crystal structure available in the data set (crystal structure prototype from from VP 3 O 9 ) but the probabilistic model suggested other structures. Interestingly, the predicted structure are found to be 9 mev/at for the Lu compound and 6 mev/at for the Dy compound lower in energy than the structure present in the PDF4+. Evaluation of the model in the main group element mixture chemistry Our search for new compounds obtained only very few successes in the main group element mixture region. This lead us to wonder if this could be explained by a systematic failure of the model for this chemistry. Analysis of the cross-validation results did not show any clear evidence for such a systematic failure. Both the composition and structure prediction procedure did not perform worse for main group elements mixture compared to the whole ICSD. The composition prediction showed a 52% 4

false positive rate vs 45% for the whole database. The structure prediction reached a 100% success rate with a list length of 4 vs 95% for the whole database. Evaluating compound synthesis conditions using the oxygen chemical potential It is known that oxide stability can depend largely of the environmental conditions applied (oxidizing or reducing). The key thermodynamic quantity dictating under which environment an oxide is stable are its minimal and maximal oxygen chemical potential. These quantities indicate in which oxygen chemical potential range the compound would be stable. Any environment setting a chemical potential lower than the minimal oxygen chemical potential would be too reducing for the compound while any environment setting a higher chemical potential than the maximal oxygen chemical potential would be too oxidizing. This oxygen chemical potential range can be evaluated using ab initio computations for any compound. In this work, we have set the reference oxygen chemical potential (µ O ) to be zero in air at room temperature (298K, 0.21atm). A correspondence between an estimated temperature and partial pressure of oxygen can be obtained using the oxygen gas entropy from the JANAF tables 5 and neglecting the entropy contribution from the solid phases compared to the oxygen gaseous phase 6. The oxygen molecule energy used is the one obtained by Wang et al. 7. The oxygen chemical potential range for some binary transition metal oxides can be used as reference to estimate the oxidation conditions needed for the synthesis of our predicted compounds (see Figure 3). 5

Figure 3: Oxygen chemical potential range for some binary transition metal oxides. This graph indicates the range of oxygen chemical potential in which common binary transition metal oxides would be stable. This data has been obtained by DFT computations on the binary oxides present in the ICSD. The zero oxygen chemical potential corresponds here to air at room temperature. Some phases only stable on a very narrow range (i.e. Cr 8 O 21, Cr 5 O 12, V 3 O 7, V 3 O 5, V 5 O 9 ) have been removed for the sake of clarity. Effect of the heavy alkalis on the stability of Co 4+ compounds In the case of the heavy alkali compounds with cobalt, table 1 compares the value for the oxygen chemical potential range for the predicted compounds with that of a simple CoO 2 binary compound. CoO 2 would not be stable in typical solid state synthesis conditions such as gaseous oxygen at 800C (µ O 1.0 ev/at). This is consistent with the impossibility to prepare this Co 4+ binary oxide in this way. On the other hand, all the predicted heavy alkali Co 4+ compounds should be stable in these conditions showing the stabilization effect of the heavy alkali. compound minimum µ O maximum µ O CoO 2-0.04 ev/at + Co 2 Cs 6 O 7-2.9 ev/at + Co 2 Rb 6 O 7-3.1 ev/at + CoRb 2 O 3-1.7 ev/at + Table 1: Oxygen chemical potential range for different Co 4+ compounds. 6

CoRb 2 O 3 structure comparison to experiment CoRb 2 O 3 is present in the PDF4+ database (00-027-0515 8 ). Comparison of the simulated pattern from our prediction with the experimental pattern shows good agreement (see Figure 4). Figure 4: Comparison between the experimental and predicted powder diffraction pattern for CoRb 2 O 3. The experimental pattern from the PDF4+ database is below in red. The simulated pattern from our predicted crystal structure is above in blue. MgMnO 3 structure comparison to experiment MgMnO 3 is referenced in the PDF4+ database (00-024-0736 9 ). Comparison with the simulated pattern from our prediction with the experimental pattern shows a reasonable match to the experiment (see figure 5). Only one peak in the 50 degree region does not match the powder diffraction pattern simulated from our prediction. 7

Figure 5: Comparison between the experimental and predicted powder diffraction pattern for MgMnO 3. The experimental pattern from the PDF4+ database is below in red. The simulated pattern from our predicted crystal structure is above in blue. Rate of new ternary oxides discovery in the Inorganic Crystal Structure Database We identified the earliest date of publication for any ternary oxide compound present in the ICSD. We did not take into account multiple reports of the same compound and compounds with partial occupancies. Figure 6 indicates in blue how many new ternary oxides compounds were discovered each year according to the ICSD from 1930 to 2005. The red bar shows how many new compounds have been discovered in this work. 8

Figure 6: New ternary oxide discovery per year according to the ICSD. The blue bars indicate how many new ternary oxides were discovered per year from 1930 to 2005. The red bar shows how many new compounds have been discovered in this work. References [1] Fawcett, T. Pattern Recognition Letters 2006, 27, 861 874. [2] Hundt, R.; Schön, J. C.; Jansen, M. Journal of Applied Crystallography 2006, 39, 6 16. [3] Burzlaff, H.; Malinovsky, Y. Acta Crystallographica Section A Foundations of Crystallography 1997, 53, 217 224. [4] Roh, K. S.; Ryu, K. H.; Yo, C. H. Journal of Solid State Chemistry 1999, 142, 288 293. [5] Chase, M. W. NIST-JANAF Thermochemical Tables; American Institute of Physics: Woodbury, NY, 1998. [6] Ong, S. P.; Wang, L.; Kang, B.; Ceder, G. Chemistry of Materials 2008, 20, 1798 1807. [7] Wang, L.; Maxisch, T.; Ceder, G. Physical Review B 2006, 73, 1 6. [8] Jansen, M.; Hoppe, R. Zeitschrift für anorganische und allgemeine Chemie 1974, 408, 75 82. [9] Chamberland, B.; Sleight, A. W.; Weiher, J. F. Journal of Solid State Chemistry 1970, 1, 512 514. 9