1. Introduction
Recently, carbon neutrality has become crucial to prevent climate change. Carbon neutrality is a concept that reduces greenhouse gas (GHG) emissions from human activities with the goal of net emission zero. To realize carbon neutrality, renewable generation should increase up to 71% by 2050 to reform the energy system [
1,
2]. The transition to a sustainable energy system is a pressing challenge, and renewable energy sources such as solar power are critical to achieving this goal. In this regard, the Republic of Korea’s government has also announced a commitment to carbon neutrality by 2050, given that the Republic of Korea’s greenhouse gas emissions are the seventh highest in the world [
3].
Solar energy has the potential to provide a reliable, sustainable, and cost-effective source of electricity, particularly in regions with high levels of solar radiation. However, the integration of large amounts of solar energy into the grid presents technical challenges, including the need for accurate forecasting of solar power generation. Many papers have recently discussed the challenges and opportunities associated with its implementation. For example, In [
4], the authors discussed the importance of solar energy technology in promoting sustainable development, arguing that it has the potential to reduce greenhouse gas emissions and increase access to energy in developing countries. In [
5], the authors provided a comprehensive review of the literature on renewable energy and its role in promoting sustainable development. Specifically, the authors examined various aspects of renewable energy (including solar power generation) such as technology, policy, and finance.
The main factors that affect solar power forecasts are daily weather conditions and seasonal insolation [
6]. In an early study, a physical model that considers the relationship between insolation and solar power generation among the above factors was studied first [
7], which approximates the insolation with a model that calculates power generation through the rotation of the earth and the equivalent circuit of the PV cell. Since then, statistical prediction models using traditional forecasting techniques such as autoregressive moving average (ARMA) [
8] and multiple linear regression (MLR) [
9], have been proposed [
10]. However, traditional solar forecasting through such modeling lacks adaptability to weather changes, and it is difficult to accurately predict how much solar power will be generated under different weather conditions.
Since then, there has been a need for an artificial intelligence (AI)-based PV forecasting model that can handle data uncertainty. Recently, machine learning and artificial intelligence techniques have shown great promise in improving the accuracy of PV forecasting. In one paper, ref. [
11] proposed a method for estimating global solar radiation using meteorological variables, including sunshine duration. This method is popular as a benchmark for solar radiation estimation. In [
12], the authors reviewed the state-of-the-art in solar radiation modeling, including physical models and empirical models. In [
13], the authors investigated the impact of feature selection methods on solar power forecasting performance.
In early studies applying deep learning, multi-layer perceptron (MLP)-based models were the most commonly used for solar power generation forecasting. For example, in [
14], an MLP was used to forecast solar power generation in Zimbabwe. Similarly, in [
15], an MLP was used to forecast solar power generation in Malaysia, using historical weather data and solar radiation measurements as inputs to the artificial neural network (ANN) model. Later studies started to explore the use of more advanced models, such as recurrent neural network (RNN) [
16] and LSTM [
17]. In [
18], an RNN was used to forecast solar power generation. The space–time convolutional neural network (STCNN), which exploits the location information of multiple PV sites and historical PV generation data, is used in [
19]. In [
20], a PV forecasting model was based on wavelet transform and LSTM-dropout network. Very recently, studies have focused on more advanced models such as attention-based models, graph neural network (GNN) [
21], and transformer-based models [
22]. By leveraging a transformer encoder and gated recurrent unit (GRU), a framework based on Delaunay triangulation and TransGRU model forecasts PV with robustness against weather forecast error [
23]. In [
24], leveraging deep reinforcement learning using proximal policy optimization [
25], error compensable forecasting is adopted, which switches the objective of forecasting from reducing errors to making compensable errors.
Although the above studies show improved solar power forecasting performance, they assume data without abnormal points. However, the reliability of collected solar power generation data directly affects the performance and reliability of the learning model. That is, when anomalous data enters the input, the accuracy of deep learning-based forecasting models can substantially deteriorate. Therefore, it is necessary to include anomaly detection as a preprocessing stage in PV power forecasting. Note that anomaly detection studies in solar power forecasting mainly focused on cyberattacks or false detection. They detected the data points with false data injection to prevent the power systems from malicious attackers. However, even without false data injection, anomalous data points can exist. In [
26], the authors designed a fault classifier based on thermal image processing using a support vector machine (SVM) by performing anomaly detection at the physical level. In [
27], the authors proposed an unsupervised monitoring system at the physical level by inspecting the DC part of the PV system through momentary shading based on SVM. In [
28], the authors learned by replacing anomalies with predicted values. At each time step, the authors performed simple anomaly detection and then replaced it with DL-based predicted values to compare the prediction performance. By analyzing the performance of the machine learning models, in [
6], the authors identified the best model that can accurately detect anomalies in PV systems. The correlation coefficient between the internal and external characteristic parameters of the power plant is obtained to analyze the anomaly detection efficiency of the machine learning models.
In the real world, solar data potentially have anomalous values due to errors in sensor measurements. In addition, many solar power plants in the Republic of Korea are classified as behind-the-meters (BTMs), which are small-scale generators of 1 MW or less and do not have real-time generation metering. This is one of the factors that greatly increases the uncertainty of power generation forecasts and hinder the forecasting model’s ability to learn in a supervised manner.
In the case of a virtual power plant (VPP), the next day’s power generation is forecasted through collective resources, and a forecasting incentive is given depending on accuracy. However, when anomalous data are included and the power generation pattern is erratic, forecasting performance may be significantly degraded. In fact, we observe that the collected solar power plant data from private owners differ from general power generation patterns, possibly due to the combination of energy storage systems (ESS), which may not be known in advance from the VPP operator’s perspective.
To this end, we propose an integrated anomaly detection framework leveraging a convolutional autoencoder that proactively identifies and removes anomalous data. Then, we configure VPP after filtering out anomaly and forecast power generation using deep learning. We summarize our key contributions as follows.
We propose a preprocessing method along with a forecasting model for various PV sites that exhibit anomalous power generation. Unlike general PV forecasting, which assumes normal power generation or knowledge of the anomaly in the BTM situation, we proactively detect anomalous sites.
For interpretable anomaly detection, we develop a model that combines convolutional autoencoder (CAE) and principal component analysis (PCA) to extract and analyze the features of solar power data with scree plot analysis. As a result, we can extract and utilize features that contain important information from solar power data as low-dimensional vectors.
Our methodology is designed to be robust to real-world data. Leveraging the proposed anomaly detection above, we compare two types of VPPs: the VPP with only normal sites and the VPP with a random mixture of anomaly and normal sites. Based on this, we show that simple and efficient unsupervised learning to construct a VPP with only normal PV sites leads to better forecasting performance than the other case. We observe that the forecasting error of the normal VPP is 6% or less, which satisfies the condition for receiving full incentives in the renewable energy wholesale market run by Korea Power Exchange (KPX).
The rest of this paper is organized as follows.
Section 2 analyzes the actual PV site data and proposes an anomaly detection model and the structure of the PV forecasting model.
Section 3 presents the forecasting results before and after anomaly detection for VPPs of two experimental groups, followed by the conclusion in
Section 4.
4. Conclusions
In this paper, we propose a methodology for training a forecasting model by configuring a VPP using weather data and PV sites with normal power generation patterns, which involves detecting anomalies in PV site data. The anomaly detection model is constructed using CAE, PCA, and K-means clustering. After extracting the latent vectors by applying CAE, power generation with normal patterns can be determined by PCA and K-means clustering. By validating and testing PV site data as an input to the model trained by training data, normal and anomalous sites are separated. These normal and anomalous data sets are then merged with weather data corresponding to each PV site. The results show that forecasting performance improves when a forecasting model is trained using normal data only. In the case of a VPP composed of normal data, the aggregated forecast performance improve more than 23%, on average compared to the mixed VPP. This substantial improvement can be possible because the proposed anomaly detection is highly accurate, e.g., 99% of accuracy.
There still remain some challenges. It is hard to detect a mixed anomaly site that has both normal and anomalous generation patterns in different time periods. This could be due to changes in the way an ESS is installed or operated. As a future work, the impact of site changes on anomaly detection can be investigated, in addition to the methodologies to ensure that anomaly detection is robust for newly added sites with little historical data. Anomaly detection per daily power generation pattern, rather than over the entire data of each site, could probabilistically represent anomalies in power generation; for example, it is possible to extend the proposed preprocessing method to forecast power generation for newly installed sites with limited data.