Keywords

1 Introduction

Reduced-order modelling is a reduced dimensionality model surrogate of an existing system in a reduced-order space. Reduced-order modelling (ROM), in combination with machine learning (ML) algorithms is of increasing interest of research in engineering and environmental science. This approach improves the computational efficiency for high-dimensional systems. Since forecasting the full physical space is computationally costly, much effort has been given to develop ML-based surrogate models in the pre-trained reduced-order space.

In recent years, the algorithm schemes which combine ROM and ML surrogate models have been applied in a variety of engineering problems, including computational fluid dynamics [5, 30], numerical weather prediction [24] and nuclear science [28], among others, to speed up computational models without losing the resolution and accuracy of the original model. Typically, the first stage consists of reducing the dimension of the problem by compression methods such as Principal Component Analysis (PCA), autoencoder (AE), or a combination of both [28, 29]. Solutions from the original computational models (known as snapshots) are then projected onto the lower-dimensional space, and the resulting snapshot coefficients are interpolated in some way, to approximate the evolution of the model.

In order to incorporate real-times observations, data assimilation (DA), originally developed in meteorological science, is a reference method for system updating and monitoring. Some recent researches [2, 5, 10, 25] have focused on combining DA algorithms and ROMs so that the system correction/adjusting can be performed with a low computational cost. Adversarial training and Generative Adversarial Networks (GAN), introduced by [17], has also been used with ROM. Data-driven modelling of nonlinear fluid flows incorporating adversarial networks has been successfully studied previously [6]. GANs are also being used to capture physics of molecular dynamics [38] and have potential to aid in the modeling and simulation of turbulence [23].

The aim of this work is to create general workflows in order to tackle different applications combining DA and ML approaches. In the present work, we propose a modular approach, which combines ROM, ML surrogate models, and DA for complex dynamic systems with applications in computational fluid dynamic (CFD), wildfire spread and air pollution forecasting. The algorithms described in this work can be easily applied/extended to other dynamical systems. Numerical results in both applications show that the proposed approach is capable of real-time predictions, yielding an accurate result and considerable speed-up when compared to the computational time of simulations.

The paper is structured as follows: Sect. 2 presents the modular approach and its components. The applications are shown in Sect. 3 and 4. And finally, Discussions and Conclusions are presented in Sect. 5.

2 Components of the Modular Approach

The modular approach shown in this paper can be summarised by Fig. 1. The state model (\(\mathbf {u}_t\)) is compressed using ROMs approaches such as PCA, AE or a combination of both, followed by a ML-based forecast in the reduced space. This forecast is then corrected using DA, incorporating real-time observations (\(\mathbf {v}_{t}\)). This is an iterative process that can be used to improve the starting point of the next time-level forecast, thus improving its accuracy [3].

Fig. 1.
figure 1

Flowchart of the modular approach, modified using figure 1 of [4]

2.1 Reduced Order Modelling

In this section, we introduce two types of ROMs, namely the PCA and the convolutional autoencoder (CAE).

2.1.1 Principle Component Analysis

Principal component analysis, also known as Karhunen-Loève transform or Hotelling transform, is a reference method of ROM via an orthogonal linear projection. This approach has been widely applied in dynamical systems [35] with snapshots at different time steps. Applications can be found in a large range of engineering problems, including numerical weather prediction [21], hydrology [10] or nuclear engineering [16]. More precisely, a set of \(n_u\) simulated or observed fields \(\{ {\textbf {u}}_{t_0,t_1,..t_{n_u-1}}\}\) at different time are flattened and combined vertically to a matrix,

$$\begin{aligned} {\textbf {U}}= \Big [ {\textbf {u}}_{t_0} \Big | {\textbf {u}}_{t_1}\Big |...\Big | {\textbf {u}}_{t_{n_u-1}} \Big ]. \end{aligned}$$
(1)

The principle components are then extracted by computing the empirical covariance matrix, that is,

$$\begin{aligned} {\textbf {C}}_{{\textbf {u}}} = \frac{1}{n_{u}-1} {\textbf {U}}{\textbf {U}}^T = {{\textbf {L}}}_{{\textbf {U}}} {{\textbf {D}}}_{{\textbf {U}}} {{{\textbf {L}}}_{{\textbf {U}}}}^T, \end{aligned}$$
(2)

where each column of \({{\textbf {L}}}_{{\textbf {U}}}\) which represents an eigenvector of \({\textbf {C}}_{{\textbf {u}}}\) and \({{\textbf {D}}}_{{\textbf {U}}}\) is the associated eigenvalue diagonal matrix. The dynamical field \({\textbf {u}}_t\) can then be compressed to

$$\begin{aligned} \tilde{{\textbf {u}}}_t = {{{\textbf {L}}}_{{\textbf {U}},q}}^T {\textbf {u}}_t, \end{aligned}$$
(3)

where \(\tilde{{\textbf {u}}}_t\) denotes the compressed state vectors and q is the truncation parameter and the reconstruction to the full physical space reads

$$\begin{aligned} {\textbf {u}}^\text {PCA}_t = {{{\textbf {L}}}_{{\textbf {U}},q}} \tilde{{\textbf {u}}}_t = {{{\textbf {L}}}_{{\textbf {U}},q}} {{{\textbf {L}}}_{{\textbf {U}},q}}^T {\textbf {u}}_t. \end{aligned}$$
(4)

2.1.2 Convolutional Autoencoder

PCA, by design, is a linear method for ROM. In the last several years, much effort has been given to dimension reduction for chaotic dynamical systems via deep learning (DL) AEs [15]. Typically, an AE is an unsupervised neural network (NN) which consists two parts: an encoder E which maps the input variables to latent (i.e., compressed) vectors and a decoder D which performs reconstructions from the low-dimensional latent space to the full physical space. These processes can be summarised as:

$$\begin{aligned} \tilde{{\textbf {u}}}_t = E({\textbf {u}}_t) \quad \text {and} \quad {{\textbf {u}}^{\text {AE}}_t} = D(\tilde{{\textbf {u}}}_t). \end{aligned}$$
(5)

It is found that employing convolutional layers in AEs is helpful to i) reduce the number of parameters for high-dimensional systems, and ii) take into account local spatial patterns for structured data (e.g., images and times series) [18]. Following this idea, CAE was developed [18, 32] where both the encoder E and the decoder D consist of a series of convolutional layers.

In general, the encoder and the decoder of an AE should be trained jointly, for instance, with the Mean square error (MSE) or the Mean absolute error (MAE) loss function of reconstruction accuracy, i.e.,

$$\begin{aligned} J_{\text {MSE}}( E, D ) = \sum _{j=0}^{n_u-1} \frac{\vert \vert {\textbf {u}}_{t_j} - D \circ E({\textbf {u}}_{t_j})\vert \vert ^2}{n_u}, \quad J_{\text {MAE}}( E, D ) = \sum _{j=0}^{n_u-1} \frac{\vert \vert {\textbf {u}}_{t_j} - D \circ E({\textbf {u}}_{t_j})\vert \vert }{n_u}. \end{aligned}$$
(6)

2.2 Machine Learning for Surrogate Models

2.2.1 Recurrent Neural Network

Long short-term memory networks neural networks, introduced in [19], is a variant of recurrent neural network (RNN), capable of dealing long term dependency, and vanishing gradient problems that traditional RNN could not handle. One of the components of our modular approach in the present work, is the sequence-to-sequence long short-term memory networks (LSTM). The LSTM learns the dynamics in the latent space from compressed training data.

LSTM can be unidirectional or bidirectional. Recently developed Bidirectional LSTM (BDLSTM) [33] differs from the unidirectional ones, as the latter can capture the forward and backward temporal dependencies in spatiotemporal data [12, 20, 26, 31] . LSTMs are widely recognised as one of the most effective sequential models for times series predictions in engineering problems [27, 37].

The LSTM network comprises three gates: input (\(\mathbf {i}_{t_{k}}\)), forget (\(\mathbf {f}_{t_{k}}\)), and output (\(\mathbf {o}_{t_{k}}\)); a block input, a single cell \(\mathbf {c}_{t_{k}}\), and an output activation function. This network is recurrently connected back to the input and the three gates. Due to the gated structured and the forget state, the LSTM is an effective and scalable model that can deal with long-term dependencies [19]. The vector equations for a LSTM layer are:

$$\begin{aligned} \begin{aligned} \mathbf{i}_{t_{k}}&= \phi (\mathbf{W}_{xi}\mathbf {u}_{t_{k}} + \mathbf{W}_{Hi}{} \mathbf{H}_{t_{k-1}} + \mathbf{b}_{i})\\ \mathbf{f}_{t_{k}}&= \phi (\mathbf{W}_{xf}\mathbf {u}_{t_{k}} + \mathbf{W}_{Hf} \mathbf{H}_{t_{k-1}} + \mathbf{b}_{f})\\ \mathbf{o}_{t_{k}}&= \phi (\mathbf{W}_{xo}\mathbf {u}_{t_{k}} + \mathbf{W}_{Ho}{} \mathbf{H}_{t_{k-1}} + \mathbf{b}_{o})\\ \mathbf{c}_{t_{k}}&= \mathbf{f}_{t_{k}} \circ \mathbf{c}_{t_{k-1}} + \mathbf{i}_{t_{k}} \circ \tanh (\mathbf{W}_{xc}\mathbf {u}_{t_{k}} + \mathbf{W}_{Hc}{} \mathbf{H}_{t_{k-1}} + \mathbf{b}_{c})\\ \mathbf{H}_{t_{k}}&= \mathbf{o}_{t_{k}}\circ \tanh (\mathbf{c}_{t_{k}})\\ \end{aligned} \end{aligned}$$
(7)

where \(\phi \) is the sigmoid function, \(\mathbf {W}\) are the weights, \(\mathbf {b}_{i,f,o,c}\) are the biases for the input, forget, output gate and the cell, respectively, \(\mathbf {x}_{t_{k}}\) is the layer input, \(\mathbf {H}_{t_{k}}\) is the layer output and \(\circ \) denotes the entry-wise multiplication of two vectors. This is the output of a unidirectional LSTM.

For a BDLSTM, the output layer generates an output vector \(\mathbf {u}_{t_{k}}\):

$$\begin{aligned} \mathbf {u}_{t_{k}} = \psi (\overrightarrow{\mathbf {H}_{t_{k}}},\overleftarrow{\mathbf {H}_{t_{k}}}) \end{aligned}$$
(8)

where \(\psi \) is a concatenating function that combines the two output sequences, forwards and backwards, denoted by a right and left arrow, respectively.

2.2.2 Adversarial Network

The work of [17] introduced the idea of adversarial training and adversarial losses which can also be applied to supervised scenarios and have advanced the state-of-the-art in many fields over the past years. Additionally, robustness may be achieved by detecting and rejecting adversarial examples by using adversarial training [34]. GAN are a network trained adversarially. The basic idea of GAN is to simultaneously train a discriminator and a generator, where the discriminator aims to distinguish between real samples and generated samples. By learning and matching the distribution that fits the training data \(\mathbf {x}\), the aim is that new samples, sampled from the matched distribution formed by the generator, will produce, or generate, ‘realistic’ features from the latent vector \(\mathbf {z}\)

The GAN is composed by a discriminator network (\(\mathcal {D}\)) and a generator network (\(\mathcal {G}\)) The GAN losses, binary cross-entropy, therefore, can be written as:

$$\begin{aligned} \begin{aligned} \mathcal {L}^{adv}_{\mathcal {D}}&= -\sum log\mathcal {D}(\mathbf {x})) + log(1-\mathcal {D}(\mathcal {G}(\mathbf {z}))\\ \mathcal {L}^{adv}_{\mathcal {G}}&= -\sum log(\mathcal {D}(\mathcal {G}(z))\\ \end{aligned} \end{aligned}$$
(9)

This idea can be developed further if we consider similar elements of the adversarial training of GAN and applied to other domains, e.g. time-series, extreme events detection, adversarial attacks, among others.

2.3 Data Assimilation

Data assimilation algorithms aim to estimate the state variable \({\textbf {u}}\) relying on a prior approximation \({\textbf {u}}_b\) (also known as the background state) and a vector of observed states \({\textbf {v}}\). The theoretical value of the state vector is denoted by \({\textbf {u}}_\text {true}\), so called the true state, which is out of reach in real engineering problems. Both the background and the observation vectors are supposed to be noisy in DA, characterised by the associated error covariance matrices \({\textbf {B}}\) and \({\textbf {R}}\), respectively, i.e.,

$$\begin{aligned} {\textbf {B}}= Cov(\epsilon _b, \epsilon _b), \quad {\textbf {R}}= Cov(\epsilon _o, \epsilon _o), \end{aligned}$$
(10)

with the prior errors \(\epsilon _b\) and \(\epsilon _o\) defined as:

$$\begin{aligned} \epsilon _b = {\textbf {u}}_b - {\textbf {u}}_{\text {true}} \quad \epsilon _o = \mathcal {H}({\textbf {u}}_{\text {true}})-{\textbf {v}}. \end{aligned}$$
(11)

Since the true states are out of reach in real applications, the covariance matrices \({\textbf {B}}\) and \({\textbf {R}}\) are often approximated though statistical estimations [7, 14]. The \(\mathcal {H}\) function in Eq. 11 is called the transformation operator, which maps the state variables to the observable quantities. \(\mathcal {H}({\textbf {u}}_{\text {true}})\) is also known as the model equivalent of observations.

By minimizing a cost function J defined as

$$\begin{aligned} J({\textbf {u}})&=\frac{1}{2}({\textbf {u}}-{\textbf {u}}_b)^T {{\textbf {B}}}^{-1}({\textbf {u}}-{\textbf {u}}_b) + \frac{1}{2}({\textbf {v}}-\mathcal {H}({\textbf {v}}))^T {{\textbf {R}}}^{-1} ({\textbf {v}}-\mathcal {H}({\textbf {v}})) \nonumber \\&= J_b({\textbf {u}}) + J_o({\textbf {u}}) \end{aligned}$$
(12)

DA approaches attempt to find an optimally weighted a priori analysis state,

$$\begin{aligned} {{\textbf {u}}}_{a}= {\mathop {\hbox {argmin}}\limits _{{\textbf {u}}}} \Big ( J({\textbf {u}}) \Big ). \end{aligned}$$
(13)

The \({\textbf {B}}\) and the \({\textbf {R}}\) matrices, determining the weights of background and observation information (as shown in Eq. 12), is crucial in DA algorithms [11, 36]. When \(\mathcal {H} \) can be approximated by a linear function H and the error covariances B and R are well specified, Eq. 12 can be solved via Best Linear Unbiased Estimator (BLUE) [7]:

$$\begin{aligned} {\textbf {u}}_a={\textbf {u}}_b+K({\textbf {v}}-{\textbf {H}}{\textbf {u}}_b) \end{aligned}$$
(14)

where \({\textbf {K}}\) denotes the Kalman gain matrix,

$$\begin{aligned} {\textbf {K}}={\textbf {B}}{\textbf {H}}^T ({\textbf {H}}{\textbf {B}}{\textbf {H}}^T+{\textbf {R}})^{-1}. \end{aligned}$$
(15)

The optimisation of Eq. 12 often involves gradient descent algorithms (such as “L-BFGS-B”) and adjoint-based numerical techniques. In the proposed modular approach of the present paper, we aim to perform DA in the low-dimensional latent space to reduce the computational cost, enabling a real-time model updating. The latent assimilation (LA) approach was first introduced in the work of [2] for \(CO_2\) spread modeling. A generalised Latent Assimilation algorithm was proposed in the recent work of [9]. The observation quantities \({\textbf {v}}_t\) are first preprocessed to fit the space of the state variables \({\textbf {u}}_t\), i.e.,

$$\begin{aligned} \tilde{{\textbf {v}}}_t = E ({\textbf {v}}_t). \end{aligned}$$
(16)

As a consequence, the transformation operator becomes the identity function in the latent space, leading to the loss function of LA:

$$\begin{aligned} \tilde{J}(\tilde{{\textbf {u}}_t})&=\frac{1}{2}(\tilde{{\textbf {u}}}_t-\tilde{{\textbf {u}}}_{t,b})^T {{\textbf {B}}}^{-1}(\tilde{{\textbf {u}}}_t-\tilde{{\textbf {u}}}_{t,b}) + \frac{1}{2}(\tilde{{\textbf {u}}}_t-\tilde{{\textbf {v}}}_t)^T {{\textbf {R}}}^{-1} (\tilde{{\textbf {u}}}_t-\tilde{{\textbf {v}}}_t), \end{aligned}$$
(17)

where the latent background state \(\tilde{{\textbf {u}}}_{t,b}\) is issued from the RNN predictions as mentioned in Sect. 2.2.1. The analysis state,

$$\begin{aligned} {{\textbf {u}}}_{t,a}= \underset{{{\textbf {u}}}_t}{\hbox {argmin}} \Big ( \tilde{J}(\tilde{{\textbf {u}}}_t) \Big ), \end{aligned}$$
(18)

can then replace the background prediction \(\tilde{{\textbf {u}}}_{t,b}\), which can be used as the starting-point for the next-level prediction in ML algorithms.

3 Application to Wildfires

The first application is real-time forecasting of wildfire dynamics. Wildfires have increasing attention recently in fire safety science world-widely, and it is an extremely challenging task due to the complexities of the physical models and the geographical features. Real-time forecasting of wildfire dynamics which raises increasing attention recently in fire safety science world-widely, is extremely challenging due to the complexities of the physical models and the number of geographical features. Running physics-based simulations for large-scale wildfires can be computationally difficult, if not infeasible. We applied the proposed modular approach for fire forecasting in near real-time, which combines reduced-order modelling, recurrent neural networks RNN, data assimilation DA and error covariance tuning. More precisely, based on snapshots of dynamical fire simulations, we first construct a low-dimensional latent space via proper orthogonal decomposition or convolution AE. A LSTM is then used to build sequence-to-sequence predictions following the simulation results projected/encoded in the reduced space. In order to adjust the prediction of burned areas, latent DA coupled with an error covariance tuning algorithm is performed with the help of daily observed satellite wildfire images as observation data. The proposed method was tested on two recent large fire events in California, namely the Buck fire and the Pier fire, both taking place in 2017 as illustrated in Fig. 2.

Fig. 2.
figure 2

Observed burned area of the Buck fire and the Pier fire at the first day

We first employed an operational cellular automata (CA) fire spread model [1] to generate the training dataset for ROM and RNN surrogate modelling. This CA model is a probabilistic simulator which takes into account a number of local geophysical features, such as vegetation density (see Fig. 2) and ground elevation. Once the latent space is acquired, the ML-based surrogate model is then trained using the results of stochastic CA simulations in the corresponding area of fire events. With a much shorter online execution time, the so-obtained data-driven model provides similar results as physics-based CA simulations (stochastic) in the sense that the mean and the standard deviation of CA-CA and CA-LSTM differences are similar as shown in Fig. 3 for the Pier fire. In fact, the ROM- and ML-based approach run roughly 1000 times faster than the original CA model as shown in Fig. 3(b).

Fig. 3.
figure 3

Model difference and computational time

Each step in CA-LSTM predictions is roughly equivalent to 30 min in real time while the satellite observations are of daily basis. The latter is used to adjust the fire prediction consistently since the actual fire spread also depends heavily on other factors such as real-time climate or human interactions which are not included in the CA modelling. The evolution of the averaged relative root mean square error (R-RMSE) is shown in Fig. 4. The numerical results show that, with the help of error covariance tuning [8, 14], DA manages to improve the model prediction accuracy in both fire events.

4 Application to Computational Fluid Dynamics and Air Pollution

Similar to the wildfire problem, we also present an general workflow to generate and improve the forecast of model surrogates of CFD simulations using deep learning, and most specifically adversarial training. This adversarial approach aims to reduce the divergence of the forecasts from the underlying physical model. Our two-step method, similar to the wildfire application, integrates a PCA AE with adversarial LSTM networks. Once the reduced-order model (ROM) of the CFD solution is obtained via PCA, an adversarial autoencoder (AAE) is used on the principal components time series. Subsequently, a LSTM model is adversarially trained, named adversarial LSTM (ALSTM), on the latent space produced by the principal component adversarial autoencoder (PC-AAE) to make forecasts. Here we show, that the application of adversarial training improves the rollout of the latent space predictions.

Fig. 4.
figure 4

Relative error of pure LSTM and assimilated prediction compared to satellite observations

Fig. 5.
figure 5

Top: Flow past the cylinder - 2D CFD simulation. Bottom: 3D Urban Air pollution CFD simulation. Right: Unstructured meshes of both domains

Fig. 6.
figure 6

Top: Mean absolute error (MAE) of the different compression methods on the velocities (ms\(^{-1}\)) of the flow past the cylinder domain. Red, magenta, and cyan represents a compression to 8, 4, and 2 dimensions, respectively. Circle and triangle markers are a PC and a PC-AAE compression, respectively. Bottom: Ensemble, of 50 different starting points, of the MAE of the forecast, for 100 time-levels, of the velocities (ms\(^{-1}\)) of the 3D Urban Air Pollution domain. Blue is the original test data, and orange is the prediction obtained with the PC-AAE and the ALSTM. (Color figure online)

Different case studies are shown in Fig. 5:

  • FPC: the 2D case describes a typical flow past the cylinder CFD, in which a cylinder placed in a channel at right angle to the oncoming fluid making the steady-state symmetrical flow unstable. This simulation has a Reynolds number (Re) of 2, 300 with \(m = 5,166\) nodes and \(n = 1,000\) time-steps.

  • 3DAirPollution: The 3D case is a realistic case including 14 buildings representing a real urban area located near Elephant and Castle, South London, UK. The 3D case (720 m \(\times \) 676 m \(\times \) 250 m) is composed of an unstructured mesh including \(m = 100,040\) nodes per dimension and \(n = 1,000\) time-steps.

The two-dimensional (2D) CFD case study was performed using Thetis [22] and the three-dimensional (3D) CFD simulations were carried out using Fluidity [13]. For these domains, the framework was trained and validated on the first 1000 time-steps of the simulation, and tested on the following 500 time-steps.

A PCA as applied to a 2-dimensional velocity field \((m s^{-1})\) in Flow past the cylinder and likewise to the velocities of the 3D model. The full-rank PCs were used as input for the AAE and divided in 3 different experiments named \(LS_{\tau }\) and compared to the corresponding reconstruction \(\mathbf {x}_{\tau }\) with \(\tau = \{2, 4, 8\}\) PCs. The results of the mean absolute error using the different dimension reduction approaches are shown in Fig. 6a for the flow past the cylinder case. The AAE outperforms a simple truncation of the PCs in both domains.

In terms of forecasting, our framework generalises well on unseen data Fig. 6b. This is because of the Gaussian latent space obtained with the adversarial AE constraints the further predictions and forces the predictions back into the distribution. Furthermore, the adversarial training of the LSTM learns how to stay within the distribution data. Followed by the training of an adversarial LSTM, we can assess the forecasts using our workflow. An ensemble of 50 different starting points from the test dataset were used to be forecasted in time for 100 time-levels. The ensemble of Mean Absolute Errors results are based on a dimension reduction 8 dimensions in the latent space of the AAE, which is a compression of 5 orders of magnitude. The error percentage of the means of these forecasts is 5% in the test dataset.

5 Conclusions

In the present paper, we introduced a ROM- and ML-based modular approach for efficient predictions of high-dimensional dynamical systems. In addition, this method can also incorporate real-time observations for model correction/adjusting with a low computational cost. A variety of ROM and RNN approaches can be included in the algorithm scheme regarding different applications. The replacement of the physics-based simulation/resolution by these models will speed up the forecast process towards a real-time solution. And, the application of adversarial training could potentially produce more physically realistic scenarios. We showed the strength of the proposed method in predicting wildfire spread and air pollution diffusion in this work. Furthermore, this framework is data-agnostic and could be applied to different physical models when enough data is available .