Hostname: page-component-848d4c4894-wzw2p Total loading time: 0 Render date: 2024-06-08T10:51:50.962Z Has data issue: false hasContentIssue false

A data ecosystem to support machine learning in materials science

Published online by Cambridge University Press:  10 October 2019

Ben Blaiszik*
Affiliation:
Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
Logan Ward
Affiliation:
Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
Marcus Schwarting
Affiliation:
Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
Jonathon Gaff
Affiliation:
Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA
Ryan Chard
Affiliation:
Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
Daniel Pike
Affiliation:
Department of Computer Science, Cornell University, Ithaca, NY, USA
Kyle Chard
Affiliation:
Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
Ian Foster
Affiliation:
Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA
*
Address all correspondence to Ben Blaiszik at blaiszik@uchicago.edu
Get access

Abstract

Facilitating the application of machine learning (ML) to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materials-specific ML models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with ML models and how users can access those capabilities through web and programmatic interfaces.

Type
Artificial Intelligence Research Letters
Copyright
Copyright © Materials Research Society 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.White, A.: The materials genome initiative: one year on. MRS Bull. 37, 715716 (2012).CrossRefGoogle Scholar
2.Blaiszik, B., Chard, K., Pruyne, J., Ananthakrishnan, R., Tuecke, S., and Foster, I.: The materials data facility: data services to advance materials science research. JOM 68, 20452052 (2016).CrossRefGoogle Scholar
3.Chard, R., Li, Z., Chard, K., Ward, L., Babuji, Y., Woodard, A., Tuecke, S., Blaiszik, B., Franklin, M.J., and Foster, I.: DLHub: Model and Data Serving for Science, 2018. http://arxiv.org/abs/1811.11213 (accessed March 8, 2019).Google Scholar
4.Nguyen, P., Konstanty, S., Nicholson, T., OBrien, T., Schwartz-Duval, A., Spila, T., Nahrstedt, K., Campbell, R.H., Gupta, I., Chan, M., Mchenry, K., and Paquin, N.: 4CeeD: real-time data acquisition and analysis framework for material-related cyber-physical environments. In 2017 17th IEEE/ACM Int. Symp. Clust. Cloud Grid Comput., IEEE, 2017; pp. 11–20. doi:10.1109/CCGRID.2017.51.CrossRefGoogle Scholar
5.O'Mara, J., Meredig, B., and Michel, K.: Materials data infrastructure: a case study of the citrination platform to examine data import, storage, and access. JOM 68, 20312034 (2016).CrossRefGoogle Scholar
6.Dima, A., Bhaskarla, S., Becker, C., Brady, M., Campbell, C., Dessauw, P., Hanisch, R., Kattner, U., Kroenlein, K., Newrock, M., Peskin, A., Plante, R., Li, S.-Y., Rigodiat, P.-F., Amaral, G.S., Trautt, Z., Schmitt, X., Warren, J., and Youssef, S.: Informatics infrastructure for the materials genome initiative. JOM 68, 20532064 (2016).10.1007/s11837-016-2000-4CrossRefGoogle Scholar
7.Kirklin, S., Saal, J.E., Meredig, B., Thompson, A., Doak, J.W., Aykol, M., Rühl, S., and Wolverton, C.: The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater 1, 15010 (2015).CrossRefGoogle Scholar
8.Jain, A., Ong, S.P., Hautier, G., Chen, W., Richards, W.D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G., and Persson, K.A.: Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).CrossRefGoogle Scholar
9.Draxl, C. and Scheffler, M.: NOMAD: the FAIR concept for big data-driven materials science. MRS Bull. 43, 676682 (2018).CrossRefGoogle Scholar
10.Carrete, J., Li, W., Mingo, N., Wang, S., and Curtarolo, S.: Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Phys. Rev. X 4, 011019 (2014).Google Scholar
11.Curtarolo, S., Setyawan, W., Wang, S., Xue, J., Yang, K., Taylor, R.H., Nelson, L.J., Hart, G.L.W., Sanvito, S., Buongiorno-Nardelli, M., Mingo, N., and Levy, O.: AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227235 (2012).CrossRefGoogle Scholar
12.Mannodi-Kanakkithodi, A., Chandrasekaran, A., Kim, C., Huan, T.D., Pilania, G., Botu, V., and Ramprasad, R.: Scoping the polymer genome: a roadmap for rational polymer dielectrics design and beyond. Mater. Today (2017). doi:10.1016/j.mattod.2017.11.021.Google Scholar
13.Tchoua, R.B., Chard, K., Audus, D.J., Ward, L.T., Lequieu, J., De Pablo, J.J., and Foster, I.T.: Towards a hybrid human-computer scientific information extraction pipeline. In 2017 IEEE 13th Int. Conf. e-Science, IEEE, 2017; pp. 109–118. doi:10.1109/eScience.2017.23.CrossRefGoogle Scholar
14.Puchala, B., Tarcea, G., Marquis, E.A., Hedstrom, M., Jagadish, H.V., and Allison, J.E.: The materials commons: a collaboration platform and information repository for the global materials community. JOM 68, 20352044 (2016).10.1007/s11837-016-1998-7CrossRefGoogle Scholar
15.Materials Simulation Toolkit for Machine Learning (MAST-ML), (n.d.): https://github.com/uw-cmg/MAST-ML (accessed June 27, 2019).Google Scholar
16.Wheeler, D., Brough, D., Fast, T., Kalidindi, S., and Reid, A.: PyMKS: materials knowledge system in python (2014).Google Scholar
17.Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N.E.R., Bajaj, S., Wang, Q., Montoya, J., Chen, J., Bystrom, K., Dylla, M., Chard, K., Asta, M., Persson, K.A., Snyder, G.J., Foster, I., and Jain, A.: Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci. 152, 6069 (2018).10.1016/j.commatsci.2018.05.018CrossRefGoogle Scholar
18.Ong, S.P., Richards, W.D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D., Chevrier, V.L., Persson, K.A., and Ceder, G.: Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314319 (2013).CrossRefGoogle Scholar
19.Schneider, J. and Hamaekers, J.: The atomic simulation environment - a Python library for working with atoms: related content ATK-forceField: a new generation molecular dynamics software package. J. Phys. Condens. Matter Top. Rev (2017). doi:10.1088/1361-648X/aa680e.Google Scholar
20.Materials Data Facility Schema Repository, (n.d.): https://github.com/materials-data-facility/data-schemas (accessed June 27, 2019).Google Scholar
21.Foster, I., Chard, K., and Tuecke, S.: The discovery cloud: accelerating and democratizing research on a global scale. In 2016 IEEE Int. Conf. Cloud Eng., IEEE, 2016; pp. 68–77. doi:10.1109/IC2E.2016.46.CrossRefGoogle Scholar
22.Ananthakrishnan, R., Blaiszik, B., Chard, K., Chard, R., McCollam, B., Pruyne, J., Rosen, S., Tuecke, S., and Foster, I.: Globus platform services for data publication. In Proc. Pract. Exp. Adv. Res. Comput. - PEARC ’18; ACM Press, New York, NY, USA, 2018; pp. 1–7. doi:10.1145/3219104.3219127.CrossRefGoogle Scholar
23.Avsec, Z., Kreuzhuber, R., Israeli, J., Xu, N., Cheng, J., Shrikumar, A., Banerjee, A., Kim, D.S., Urban, L., Kundaje, A., Stegle, O., and Gagneur, J.: Kipoi: accelerating the community exchange and reuse of predictive models for genomics. BioRxiv, 375345 (2018). doi:10.1101/375345.Google Scholar
24.DataCite Schema, (n.d.): https://schema.datacite.org/ (accessed March 8, 2019).Google Scholar
25Babuji, Y., Brizius, A., Chard, K., Foster, I., Katz, D.S., Wilde, M., and Wozniak, J.: Introducing parsl: a python parallel scripting library (2017). doi:10.5281/ZENODO.891533.CrossRefGoogle Scholar
26.Stein, H.S., Guevarra, D., Newhouse, P.F., Soedarmadji, E., and Gregoire, J.M.: Machine learning of optical properties of materials – predicting spectra from images and images from spectra. Chem. Sci. 10, 4755 (2019).CrossRefGoogle ScholarPubMed
27.Mitrovic, S., Soedarmadji, E., Newhouse, P.F., Suram, S.K., Haber, J.A., Jin, J., and Gregoire, J.M.: Colorimetric screening for high-throughput discovery of light absorbers. ACS Comb. Sci. 17, 176181 (2015).CrossRefGoogle ScholarPubMed
28.Schwarting, M., Siol, S., Talley, K., Zakutayev, A., and Phillips, C.: Automated algorithms for band gap analysis from optical absorption spectra. Mater. Discov. 10, 4352 (2017).CrossRefGoogle Scholar
29.van der Maaten, L. and Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 25792605 (2008).Google Scholar
30.Cherukara, M.J., Nashed, Y.S.G., and Harder, R.J.: Real-time coherent diffraction inversion using deep generative networks. Sci. Rep. 8, 16520 (2018).CrossRefGoogle ScholarPubMed
31.Curtiss, L.A., Redfern, P.C., and Raghavachari, K.: Gaussian-4 theory using reduced order perturbation theory. J. Chem. Phys. 127, 124105 (2007).CrossRefGoogle ScholarPubMed
32.Ward, L., Blaiszik, B., Foster, I., Assary, R.S., Narayanan, B., and Curtiss, L.: Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations. MRS Commun 9(3), 891899 (2019). doi:10.1557/mrc.2019.107.CrossRefGoogle Scholar
33.Schütt, K.T., Sauceda, H.E., Kindermans, P.-J., Tkatchenko, A., and Müller, K.-R.: SchNet – a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).CrossRefGoogle ScholarPubMed
34.Ramakrishnan, R., Dral, P.O., Rupp, M., and von Lilienfeld, O.A.: Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 20872096 (2015).CrossRefGoogle ScholarPubMed