Generating Training Datasets Using Energy Based Models that Actually Scale
Energy-Based Models(EBM) is one of the most promising areas of deep learning that hasn’t seen a tremendous level of adoption yet. Conceptually, EBMs are a form of generative modeling that learns the key characteristics of a target dataset and tries to generate similar datasets. While EBMs results appealing because of its simplicity they have experienced many challenges when applied in real world applications. Recently, AI-powerhouse OpenAI published a new research paper that explores a new technique to create EBM model that can scale across complex deep learning topologies.
EBMs are typically used in one of the most complex problems of real world deep learning solutions: generating quality training datasets. May of the state-of-the-art deep learning techniques relied on large volumes of training data which is unpractical to maintain at scale. EBMs have the ability of observing the key mathematical elements of a training datasets are generate new datasets that follow a similar distribution. EBMs are not the only discipline in this area of generative modeling. Techniques such as Variational Autoencoders(VAEs) or Generative Adversarial Neural Networks(GANs) are also used to address the challenge of dataset generation but, given its simplicity, EBMs present tangible advantages over alternatives. Unfortunately, EBMs have been really hard to scale when applied in practice. To understand why, we can probably start by dissecting some of the key characteristics of EBMs.
Understanding Energy-Based Learning
From some perspectives, of the main goals of machine learning is to capture dependencies between variables. By capturing those dependencies, a model can be used to answer questions about the values of unknown variables given the values of known variables. EBMs capture dependencies by associating a scalar energy (a measure of compatibility) to each configuration of the variables. In that scheme, inference consists on in setting the value of observed variables and finding values of the remaining variables that minimize the energy. Similarly, learning can be achieved by finding an energy function that associates low energies to correct values of the remaining variables, and higher energies to incorrect values.
EBMs provides a unified framework for many probabilistic and non-probabilistic approaches to learning, particularly for non-probabilistic training of graphical models and other structured models. Because there is no requirement for proper normalization, energy-based approaches avoid the problems associated with estimating the normalization constant in probabilistic models. Furthermore, the absence of the normalization condition allows for much more flexibility in the design of learning machines.
The capabilities of EBMs makes it an ideal candidate for different deep learning areas such as natural language processing, robotics or computer vision. However, one of the well-known limitations of EBMs is that they rely on gradient-descent optimization methods that are typically hard to scale in high dimensional datasets.
Scalable Energy Based Models
To mitigate the limitations of traditional EBMs related to the dependency on gradient-descent methods, OpenAI decided to leverage a technique known as Langevin Dynamics as its main optimization method. Named after French physicist Paul Langevin, this optimization technique that draws inspiration from molecular system models. Like stochastic gradient descent, Langevin Dynamics is an iterative optimization algorithm which introduces additional noise to the stochastic gradient estimator to optimize an objective function. The main advantage that Langevin Dynamics offer over traditional optimization methods is that it can be used for Bayesian learning scenarios as the method produces samples from a posterior distribution of parameters based on available data.
OpenAI leveraged Langevin Dynamics to perform noisy gradient descent on the energy function to arrive at low-energy configurations. Unlike GANs, VAEs, and Flow-based models, this approach does not require an explicit neural network to generate samples — samples are generated implicitly. OpenAI combines Langevin Dynamics with a replay buffer of past images that are used to initialize the optimization module.
The idea of combining EBMs and Langevin Dynamics effectively introduces an iterative refinement in EBMs that enables the generation of higher quality datasets. This approach brings some very tangible benefits compared to traditional EBM approaches:
1) Simplicity and Stability: An EBM is the only object that needs to be trained and designed in the model. Unlike VAEs or GANs, there is no need to tune training processes for separate networks to make sure they are balanced.
2) Adaptive Computation Time: The EBM model allows to run sequential refinement for long amount of time to generate sharp, diverse samples or a short amount of time for coarse less diverse samples.
3) Flexibility of Generation: In both VAEs and Flow based models, the generator must learn a map from a continuous space to a possibly disconnected space containing different data modes, which requires large capacity and may not be possible to learn. In EBMs, by contrast, can easily learn to assign low energies at disjoint regions.
4) Adaptive Generation: While the final objective of training an EBM looks similar to that of GANs, the generator is implicitly defined by the probability distribution, and automatically adapts as the distribution changes. As a result, the generator does not need to be trained, allowing EBMs to applied to domains where it is difficult to train the generator of a GAN as well as ameliorating mode collapse.
5) Compositionality: Since each model represents an unnormalized probability distribution, models can be naturally combined through product of experts or other hierarchical models.
OpenAI evaluated their EBM architecture using well-know datasets such as CIFAR-10 and ImageNet 32x32. The EBM model was able to generate high-quality images in a relatively short period of time. What is even more impressive, the EBM model show the ability to combine features learned from one type of image in the generation process of other types of images. The following figure illustrates how the EBM model can auto-complete images and morph images from one class (such as truck) to another (such as frog).
One of the most impressive achievements of OpenAI EBM models was the ability to generalize when evaluated against out-of-distribution datasets. In the initial tests, the EBM method was able to outperform other likelihood models such as Flow based and autoregressive models. OpenAI also tested classification using conditional energy-based models, and found that the resultant classification exhibited good generalization to adversarial perturbations. Our model — despite never being trained for classification — performed classification better than models explicitly trained against adversarial perturbations. The following figure shows the results of the generalization experiments.
EBMs are still considered a nascent area in the deep learning ecosystem. The OpenAI optimizations showed that EBMs are perfectly able to scale across high-dimensional datasets. The work also demonstrated that implicit generation procedures combined with energy-based models allow for compositionality and flexible denoising and inpainting. Together with the research paper, OpenAI open sourced an initial implementation of its EBM model as well as the corresponding datasets. This type of work is likely to inspire other researchers to consider EBM techniques as an important method for generating effective training datasets at a fraction of the current cost.