Improved decision making with similarity based machine learning: applications in chemistry

Author(s)
Dominik Lemm, Guido Falk von Rudorff, O. Anatole von Lilienfeld
Abstract

Despite the fundamental progress in autonomous molecular and materials discovery, data scarcity throughout chemical compound space still severely hampers the use of modern ready-made machine learning models as they rely heavily on the paradigm, ‘the bigger the data the better’. Presenting similarity based machine learning (SML), we show an approach to select data and train a model on-the-fly for specific queries, enabling decision making in data scarce scenarios in chemistry. By solely relying on query and training data proximity to choose training points, only a fraction of data is necessary to converge to competitive performance. After introducing SML for the harmonic oscillator and the Rosenbrock function, we describe applications to scarce data scenarios in chemistry which include quantum mechanics based molecular design and organic synthesis planning. Finally, we derive a relationship between the intrinsic dimensionality and volume of feature space, governing the overall model accuracy.

Organisation(s)
Computational Materials Physics
External organisation(s)
University of Toronto, Vector Institute for Artificial Intelligence, Technische Universität Berlin
Journal
Machine Learning: Science and Technology
Volume
4
No. of pages
15
ISSN
2632-2153
DOI
https://doi.org/10.1088/2632-2153/ad0fa3
Publication date
12-2023
Peer reviewed
Yes
Austrian Fields of Science 2012
102019 Machine learning, 102001 Artificial intelligence
Keywords
ASJC Scopus subject areas
Software, Human-Computer Interaction, Artificial Intelligence
Portal url
https://ucris.univie.ac.at/portal/en/publications/improved-decision-making-with-similarity-based-machine-learning-applications-in-chemistry(77bf6e02-abca-4149-8005-cbff1f810df4).html