Fundamental computational and machine learning (ML) models that can be applied to BTE design
ML is a data-driven artificial intelligence (AI) technology that involves creating predictive models by learning from existing data. It operates by fitting models to a small set of 'training data' with known outcomes to develop its predictive capability, and then the models are applied to 'test data' to make generalized predictions about unseen data [128]. ML can be broadly categorized into supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, each training dataset (scaffold geometry, material properties, etc.) is associated with corresponding outputs (permeability, hydrophobicity, etc.) as its labels. Supervised learning learns the input–output relationships from the training data, and therefore is 'supervised' to predict the outputs of the testing data [129].
(i) A decision tree is a supervised learning algorithm for classification and regression. In a tree-like model of decisions, each internal node represents a test on an attribute, each leaf node indicates a class label or decision based on the attributes, and each branch represents the outcome of a test [130].
(ii) A support vector machine (SVM) is a supervised learning method for classification, regression, and outlier detection. SVMs work by identifying the hyperplane that optimally divides a dataset into distinct classes with high robustness and a low risk of overfitting [130].
Unsupervised learning, by contrast, represents the process of training algorithms with unlabeled data to discover inherent patterns within the input. Finally, reinforcement learning is a method that navigates through uncertain environments by learning from past behaviors to select optimal actions that maximize rewards [129].
Neural networks (NNs) are key architectures in ML, inspired by the biological neural network in the human brain, and are applicable to both supervised and unsupervised learning paradigms. An NN consists of layers of interconnected neurons, which include an input layer, one or more hidden layer(s), and an output layer. The neurons and their connections, associated with different weights, allow each layer to perform transformation on inputs with varying signal strength [130].
Ensemble learning in ML trains and combines multiple models to solve specific problems, and improves model performance by aggregating their predictions, which effectively reduces overfitting. Random forest is a notable example of ensemble learning that combines multiple decision trees when each tree is trained on independent input datasets. It is known for its high accuracy, ability to process unbalanced or missing data, and ability to identify important classification variables [130].