- In drug discovery, metabolism refers to the body’s ability to break down foreign molecules called xenobiotics, which includes drugs. Drugs are metabolized so that the molecules and their effects may be eliminated from the body. The duration and intensity of a drug’s action are dependent on how the drug is metabolized.
- The Cytochrome P450 (CYP) superfamily of 57 enzymes aids in the elimination of over 40% of drugs and CYPs 1A2, 2C9, 2C19, 2D6, 3A4 account for almost 90% of CYP-mediated metabolism. Inhibition, or blockage, of a CYP or another metabolic enzyme could significantly shift the pharmaceutical effect of other drugs or chemicals in the body, resulting in a drug-drug interaction (DDI).
- Logistic regression helps us make predictions for binary outcomes by transforming inputs into a range of values from 0 to 1, inclusive. Logistic regression is typically used for classification by defining a decision boundary or threshold to binarize its probability estimates into class predictions. We can assess model performance over a range of possible decision thresholds via precision-recall (PR) and receiver operating characteristic (ROC) curves.
- Calibrated models build trust in their predictions and increase transparency for users, leading to increased acceptance and adoption. We can diagnose miscalibration with reliability diagrams, root mean squared calibration error, and Brier score. We can correct miscalibration with post-training calibration methods such as platt scaling and isotonic regression or by computing an optimal decision threshold with Youden’s index.
- Decision trees are like flowcharts – breaking down a complex problem into a series of simpler decisions. Imagine that the model is playing a guessing game, asking questions to help it narrow down the possibilities until it arrives at an answer. Each node in a decision tree represents a decision based on a specific feature. Starting from the root node, the tree branches out, with each branch representing a decision. The goal of the decision process is to reach a leaf node in the tree, which will provide the final prediction.
- Random Forest ensembles sample different subsets of data to train multiple decision trees, introducing an extra layer of randomness by considering only a random subset of features at each split in the tree-building process. This feature subsampling helps decorrelate the trees, making the ensemble even more diverse. Random Forests greatly reduce decision tree variance while holding on to their ability to achieve low bias.
- Binary classification only models two possible classes. Multiclass classification is a type of ML task where the algorithm’s goal is to learn to categorize data points into more than two distinct classes or categories. Each class represents a different possible outcome, and the model is trained to make predictions that assign a new input to one of these classes. In contrast, multilabel classification predicts a set of labels for each instance and is applicable when an instance can simultaneously belong to multiple classes. Multilabel ranking metrics including coverage error and ranking loss.