DATA ANALYTICS IN NANOMATERIALS DISCOVERY

Similar documents
Institute for Functional Imaging of Materials (IFIM)

Computational Materials Design and Discovery Energy and Electronic Applications Synthesis Structure Properties

MultiscaleMaterialsDesignUsingInformatics. S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA

Material Surfaces, Grain Boundaries and Interfaces: Structure-Property Relationship Predictions

New Materials and Process Development for Energy-Efficient Carbon Capture in the Presence of Water Vapor

Introduction. Monday, January 6, 14

Machine learning for ligand-based virtual screening and chemogenomics!

A report (dated September 20, 2011) on. scientific research carried out under Grant: FA

Electronic structure and transport in silicon nanostructures with non-ideal bonding environments

(S1) (S2) = =8.16

Application of Machine Learning to Materials Discovery and Development

Introduction to Chemoinformatics and Drug Discovery

André Schleife Department of Materials Science and Engineering

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Modeling and Computation Core (MCC)

Defects and diffusion in metal oxides: Challenges for first-principles modelling

STA 4273H: Statistical Machine Learning

Topology based deep learning for biomolecular data

Kd = koff/kon = [R][L]/[RL]

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Early Stages of Drug Discovery in the Pharmaceutical Industry

Deep learning quantum matter

Large scale classification of chemical reactions from patent data

Combinatorial Heterogeneous Catalysis

GECP Hydrogen Project: "Nanomaterials Engineering for Hydrogen Storage"

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction. OntoChem

Machine Learning Concepts in Chemoinformatics

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Controlled healing of graphene nanopore

Wulff construction and molecular dynamics simulations for Au nanoparticles

PROVIDING CHEMINFORMATICS SOLUTIONS TO SUPPORT DRUG DISCOVERY DECISIONS

Strong Correlation Effects in Fullerene Molecules and Solids

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

Hydrogenated Graphene

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

Nano Materials and Devices

Drug Informatics for Chemical Genomics...

Quantum Mechanics, Chemical Space, And Machine Learning Or

How many materials are left to discover?

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Modeling Coating Thermal Noise at Gravitational Wave Detectors

Report on Atomistic Modeling of Bonding in Carbon-Based Nanostructures

MATH36061 Convex Optimization

Potentials, periodicity

2. Surface geometric and electronic structure: a primer

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

RESEARCH HIGHLIGHTS. Computationally-guided Design of Polymer Electrolytes

Nanomaterials special properties are primarily due to size and shape

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

O WILEY- MODERN NUCLEAR CHEMISTRY. WALTER D. LOVELAND Oregon State University. DAVID J. MORRISSEY Michigan State University

Exploring the chemical space of screening results

A phylogenomic toolbox for assembling the tree of life

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Nanoscale Modeling and Simulation. George C. Schatz Northwestern University

Molecular and Electronic Dynamics Using the OpenAtom Software

Stephen McDonald and Mark D. Wrona Waters Corporation, Milford, MA, USA INT RO DU C T ION. Batch-from-Structure Analysis WAT E R S SO LU T IONS

Binding energy of 2D materials using Quantum Monte Carlo

Breakout Report on Correlated Materials. Identification of Grand Challenges

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Probabilistic photometric redshifts in the era of Petascale Astronomy

Basics of Multivariate Modelling and Data Analysis

Machine Learning Overview

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Expérimentation haut-débit Science ou loterie? David FARRUSSENG

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

Application of Optimization Methods and Edge AI

Carbon Nanomaterials: Nanotubes and Nanobuds and Graphene towards new products 2030

High pressure core structures of Si nanoparticles for solar energy conversion

Constraint Reasoning and Kernel Clustering for Pattern Decomposition With Scaling

Relative Stability of Highly Charged Fullerenes Produced in Energetic Collisions

Chemical Space: Modeling Exploration & Understanding

Machine Learning with Quantum-Inspired Tensor Networks

An Integrated Approach to in-silico

Interatomic potentials with error bars. Gábor Csányi Engineering Laboratory

Opportunities for Spectroscopic Analysis with ALMA (and EVLA)

Quantum Computing & Scientific Applications at ORNL

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Si-nanoparticles embedded in solid matrices for solar energy conversion: electronic and optical properties from first principles

Maximum Margin Matrix Factorization for Collaborative Ranking

CS145: INTRODUCTION TO DATA MINING

9.2 Support Vector Machines 159

Computational Materials Science. Krishnendu Biswas CY03D0031

Machine Learning for the Materials Scientist

Design of Manufacturing Systems Manufacturing Cells

Classification Using Decision Trees

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Supplementary Figure 1(a) The trajectory of the levitated pyrolytic graphite test sample (blue curve) and

Observation of topological phenomena in a programmable lattice of 1800 superconducting qubits

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France

Currently, worldwide major semiconductor alloy epitaxial growth is divided into two material groups.

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Numerical Statistics and Quantum Algorithms. Valerii Fedorov ICON

Supporting Information. Kinetically-Driven Phase Transformation during Lithiation in Copper Sulfide Nanoflakes

Biologically Relevant Molecular Comparisons. Mark Mackey

Computational Methods for Mass Spectrometry Proteomics

Transcription:

DATA ANALYTICS IN NANOMATERIALS DISCOVERY Michael Fernandez OCE-Postdoctoral Fellow September 2016 www.data61.csiro.au

Materials Discovery Process Materials Genome Project Integrating computational methods and information with sophisticated computational and analytical tools to shorten the duration of materials development from 10-20 years to 2 or 3 years. 3 Data Analytics for Nanomaterials Michael Fernandez

Materials and Molecular Modeling Sun Baichuan Team leader Amanda Barnard Collaborators Big data and HPC integration Piotr Szul Yulia Arzhaeva Deep learning and GPU computations Chris Watkins 2 Data Analytics for Nanomaterials Michael Fernandez

Outline Material Discovery Process Methods for atomistic simulations of materials Experimental confirmation Data-driven Computatioanl Nanomaterials Discovery Hypothetical material space sampling (structure generation and statistics) In silico high-throughput characterization (atomistic simulations and machine learning) Data storage, analytics, exploitation and integration 2 Data Analytics for Nanomaterials Michael Fernandez

Atomistic Simulations of Materials Theory can predict properties of materials Quantum chemistry methods can cover any chemistry Empirical potentials exist for large number of elements Computation is scalable and generically deployable N H = E nuclei ({R I })-σ e i=1 2 i + V nuclei r i + 1 σ N e 1 2 i j r i r j 4 Data Analytics for Nanomaterials Michael Fernandez

Atomistic Simulations of Materials Computational predictions later confirmed by experiments Self-assembly mechanism of nanodiamonds Barnard, A. S. et al. Nanoscale 3, 958 62 (2011) DFTB simulations of the surface electrostatic potential of dodecahedral diamond nanoparticles of a) 2.2 nm and b) 2.5 nm The arrow indicates (111) (111) interface between two 4 nm sized nanodiamonds. 5 Data Analytics for Nanomaterials Michael Fernandez

Modern Nano- materials discovery cycle Hypothetical materials Experiment design Materials libraries Database Performance Measurement Polydispersive systems Exponential increase in complexity and diversity Nearly infinite combinatorial problem Theory, modeling and informatics In-silico screening Potyrailo, R. et al. ACS Comb. Sci. 13, 579 633 (2011). Lead materials scale up Knowledge for rational design Banks of materials 6 Data Analytics for Nanomaterials Michael Fernandez

Nanomaterials Screening Departing from the Edisonian approach We can accurately predict a property, so it can be computed for entire materials spaces In silico structure generation vs. Combinatorial In silico design 7 Data Analytics for Nanomaterials Michael Fernandez

Nanomaterials Screening Systematic and extensive materials performance ranking. Big data discovery of structure-property relationships in unknown materials domains. Accelerated identification of high potential candidates and rational design principles. 7 Data Analytics for Nanomaterials Michael Fernandez

Data Analytics Challenges Information representation Fingerprints Information extraction Multivariate statistical analysis Knowledge discovery Data mining and machine learning Knowledge representation Visualization 7 Data Analytics for Nanomaterials Michael Fernandez

Polydispersity Challenge in Nanomaterials $$$ Controlled synthesis Purification Polydispersive sample Quasi-monodisperse sample Polydispersity can be detrimental for high-performing applications Purification of polydispersive nanoparticles samples is expensive 9 Data Analytics for Nanomaterials Michael Fernandez

Data Analytics of Nanocarbons Virtual structures relaxed using TB-DFT Nanodiamonds Graphenes 10 Data Analytics for Nanomaterials Michael Fernandez

Archetypal Analysis (AA) Finds a k m matrix Z that corresponds to the archetypal or pure patterns in the data in such a way that each data point can be represented as a mixture of those archetypes. In other words, the archetypal analysis yields the two n k coefficient matrices α and β, which minimize the residual sum of squares: Cutler, A. & Breiman, L. Archetypal Analysis. Technometrics 36, 338 347 (1994). The predictors of X i are finite mixtures of archetypes Z j, which are convex combinations of the observations. 10 Data Analytics for Nanomaterials Michael Fernandez

Archetypal Analysis of Nanocarbons Nanodiamonds Graphene nanoflakes Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980 11992 (2015). 11 Data Analytics for Nanomaterials Michael Fernandez

Nanocarbons Prototypes Nanodiamonds prototypes Graphene prototypes Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980 11992 (2015). 13 Data Analytics for Nanomaterials Michael Fernandez

Estimation of Nanodiamonds Properties Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980 11992 (2015). 13 Data Analytics for Nanomaterials Michael Fernandez

Structural Diversity Challenge Graphene nanoflakes Trigonal Rectangular Hexagonal Defects, oxidation and edge passivation yield large structural diversity 13 Data Analytics for Nanomaterials Michael Fernandez

Structural Diversity Challenge Silicon qbits P P P Single Si substitution by P yields a large structural diversity 13 Data Analytics for Nanomaterials Michael Fernandez

Structural Diversity Challenge Metal-Organic Framework (MOF) ZIF-68 benzimidazole nitroimidazole 13 Data Analytics for Nanomaterials Michael Fernandez Zn 2+ CO 2 capture and sequestration (Science, 2008)

Structural Diversity Challenge Metal-Organic Framework (MOF) In-silico Combinatorial design 13 Data Analytics for Nanomaterials Michael Fernandez

Structural Diversity Challenge Metal-Organic Framework (MOF) MOF ZBP a) Organic Linkers b) Modification with of 35 functional groups gives a total of ~1.5 million (35 4 ) unique combinations 13 Data Analytics for Nanomaterials Michael Fernandez

Machine Learning Approach Machine learning prediction of functional properties Feature fingerprints Machine learning 13 Data Analytics for Nanomaterials Michael Fernandez

Data Analytics Challenge Binary decision tree of the Band Gap of graphene Accuracy 80% Features Surface area Number of atoms Shape aspect ratio Fernandez, M., Shi, H. & Barnard, A. S Carbon (2016). doi:10.1016/j.carbon.2016.03.005 13 Data Analytics for Nanomaterials Michael Fernandez

Machine Learning vs. Atomistic Simulations Estimation of the graphene Band Gap from topological features P i and P j, are the values of a bond order of the carbon atoms in graphene, while L is the topological distance, whilst L ij is a delta function delta function Fernandez, M. et al. ACS Comb. Sci. (2016) doi:10.1021/acscombsci.6b00094 13 Data Analytics for Nanomaterials Michael Fernandez

Machine Learning of Graphene Radial Distribution Function (RDF) scores for graphene the summation is over the N atom pairs in the graphene structure, and r ij is the distance of these pairs and B is a smoothing parameter set to 10. Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506 13 Data Analytics for Nanomaterials Michael Fernandez

Machine Learning vs. Atomistic Simulations Energy of the Fermi level Ionization Potential From RDF scores Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506 13 Data Analytics for Nanomaterials Michael Fernandez

Machine Learning vs. Atomistic Simulations Machine learning prediction of gas adsorption in MOF Fernandez, M., et al. J. Phys. Chem. C 117, 14095 14105 (2013). 13 Data Analytics for Nanomaterials Michael Fernandez

System size Accuracy and Coverage Challenges 5000 atoms TBDF/ Semiempirical 500 atoms Accuracy of electronic calculations methods vs. system size Density Functional Quantum Monte Carlo Coupled Cluster 100 atoms 20 atoms Accuracy 13 Data Analytics for Nanomaterials Michael Fernandez

Data-driven Challenge Machine learning for large material spaces Full Database Predictions Machine Learning 3 Partial Database Screening Ramakrishnan, R. et al. J. Chem. Theory Comput. 11, 2087 2096 (2015). Fernandez, M et al. J. Chem. Inf. Model. (2015), 55, 2500-2506 Machine learning predictions: -Functional property value or threshold =f( ) structure -Accuracy of different quantum-chemistry methods 15 Data Analytics for Nanomaterials Michael Fernandez

Challenges and Limitations Accuracy Gap Between Different Levels of Theory 6,095 isomers of C 7 H 10 O 2 E B3LYP QMC big gap big gap 13 Data Analytics for Nanomaterials Michael Fernandez

Accuracy Gap Predictions Predictions Machine learning calibration 13 Data Analytics for Nanomaterials Michael Fernandez

Data-driven High-throughput Screening Computational resources Input jobs Queue management Resubmit or kill failed runs Finished runs Data storage outputs Accuracy refinement 13 Data Analytics for Nanomaterials Michael Fernandez

Self-organization Map (SOM) of NPs SOM Ag-NP Electrostatic Potential 13 Data Analytics for Nanomaterials Michael Fernandez

Deep Learning for Nanomaterials Feature extraction Classification input 32x32 C 1 Feature maps 28x28 C 2 Feature maps S 1 10x10 Feature maps 14x14 S 2 Feature maps 5x5 B1 B2 Output Image 32x32 5x5 convolution 2x2 subsampling 5x5 convolution 2x2 subsampling Fully connected 13 Data Analytics for Nanomaterials Michael Fernandez

Thank you Data61 Michael Fernandez e michael.fernandezllamosa@csiro.au w https://research.csiro.au/mmm/ www.data61.csiro.au