1. Introduction
Energy, especially electrical energy, plays a key role in a country’s development; it also improves the living standard of people. Therefore, the demand for electrical energy has been increasing day by day due to industrialization and modernization. However, the conventional generation of electrical energy has caused global warming and significant climate changes. Subsequently, the modern world has undertaken different renewable energy initiatives to mitigate global warming and meet the growing electricity demand. Nowadays, many countries have produced a significant portion of their energy demands from renewable energy resources, particularly solar generation power plants [
1]. Among the potential renewable energies, photovoltaics (PV) have undergone enormous growth over the last few years. The total installed capacity of PV systems has reached around 227 GW worldwide, an increase of more than 28% in 2015 (International Energy Agency (IEA) report, 2016); such growth is expected to continue at similar or higher rates in the future. Decreasing prices (i.e., the lowest at below
$1.5/W
P for fixed tilt systems) and improved PV technology can also boost PV system installations [
2,
3].
However, the generation of PV power fully depends on random and ungovernable solar irradiance and other metrological factors, such as atmospheric temperature, module temperature, wind speed, wind direction, and humidity. The power output of a PV system dynamically changes with time due to the variability of environmental factors. Unpredictable PV power output adversely affects system stability and reliability, the scheduling of system operations, and related economic benefits [
4,
5]. Meanwhile, accurate forecasting of PV power generation can reduce the impact of PV power uncertainty on the grid, improve system reliability, maintain power quality, and increase the penetration level of PV systems [
6]. Therefore, accurate forecasting of PV power generation has become an important task for researchers at present.
In a significant number of previous research, solar irradiance on different time scales was forecasted using various approaches, including numerical weather prediction methods, image-based methods, and statistical methods [
7,
8,
9,
10]. The forecasted solar irradiance and other associated data are used as inputs for PV power generation by commercial PV simulation software, such as TRNSYSM, PVFORM, and HOMER. In [
11], first the fuzzy theorem was used to forecast the solar irradiance levels and then the recurrent neural network (RNN) method was used to forecast the 24 hour ahead output power of the PV system. However, most previous research on this problem employed direct methods to forecast PV power generation based on historical time-series data, such as historical PV power output and corresponding meteorological data. The research by Kudo et al. [
12] demonstrated that the direct method of forecasting next-day PV power generation is better when compared with indirect methods.
In direct PV power forecasting, persistence modeling [
13] is generally conducted to justify and select other models, and to decide on benchmarks. Persistence modeling is mainly used for one-hour ahead forecasting; hence, as the time range of forecasting increases, the accuracy of persistence modeling decreases [
14]. Both autoregressive and moving-average modeling and their generalizations (i.e., autoregressive-moving-average models) [
15] are widely used in statistical and time-series data analyses. These are based on classical time-series analysis such as following the Box–Jenkins method [
16]. Yang et al. [
17] proposed an autoregressive method, with an exogenous input (ARX)-based spatio-temporal (ST) model, in order to improve the accuracy of the developed PV output power forecasting technique. However, these time-series models have limitations because they require stationary data-sets [
18]. A stationary time series is one whose statistical properties, such as mean, variance, and autocorrelation, are all constant over time. However, the PV power output and related meteorological data are non-stationary. By contrast, autoregressive-integrated-moving-average (ARIMA) models integrate non-stationarity elements from time-series data [
19], but these models are computationally intensive because of the inclusion of a summation/integration function. The most general time series analysis model is called NARX (Nonlinear Autoregressive with Exogenous Input). In [
20], NARX was chosen as a dynamic artificial neural network (ANN) to forecast the PV power generation, and the result showed that the NARX is more efficient because of its capability to learn and generalize formulas, compared with other ANN models. However, the NARX models have limitations in learning long time dependences because of the “vanishing gradient”. Furthermore, similar to any dynamical system, the models were affected by instability and lack a procedure for optimizing embedded memory. Moreover, time series models need lots of data when the model is structured, and the parameters are difficult to update when new data are uploaded.
ANN modeling is excellent for complicated and non-linear data analysis, and it does not require any prior assumption. Neural networks are widely used in PV power generation forecasting, and forecasted results are better when compared with those from regression analysis methods [
21]. NARX and feed-forward neural networks with tapped delay lines have been used to forecast PV energy production, and they result in errors (MAPE) of less than 5% [
22]. However, the performance of this forecasting model depends on meteorological factors and the correlation between explanatory variables and dependent variables. If the correlation factor between variables is less significant, then the forecasting result will be inaccurate. Multilayer perceptron (MLP) [
23] is an example of the feed-forward NN model that has been widely deployed in PV power generation forecasting. In this model, the input of each neuron is transformed into output through the sigmoid function. The performance of this model is better than the Box–Jenkins method due to its non-linear approximation. The radial basis function neural network (RBFNN) [
24] is a type of ANN model that makes use of linear combinations of radial basis functions. For the radial basis function, Gaussian functions are often used to transform inputs into outputs. If tuned accurately, the RBFNN has better performance than MLP. However, the tuning procedures of the RBFNN sometimes cause problems, such that good results are not obtained due to poor parameter adjustments. Among the ANN-based forecasting methods, back propagation NN (BPNN) has been widely used because of its excellent nonlinear mapping function, which is especially suitable for solving complex regression problems [
25]. BPNN has the advantages of a complex nonlinear systems simulation ability, a strong learning ability, good approximation performance, and a large fault data tolerance. However, inherent defects are found, such as a slow convergence rate as well as a susceptibility to easily fall into local minimum values; thus BPNN is unable to obtain the global optimal solution [
26].
Accuracy is a main consideration when developing models for PV power generation forecasting. For short-term forecasting, errors should be less than 20% [
6]. However, for changing conditions (e.g., morning and evening, or rainy or cloudy weather), forecasting accuracy decreases to a point where the relative mean square errors (RMSE) are sometimes higher than 50%. To build better forecasting models, some studies classified forecasted days into different categories on the basis of weather conditions. Kang et al. [
27] developed an algorithm by utilizing k-means clustering; Chen et al. [
28] presented a RBFNN model; Shi et al. [
6] proposed a model based on weather classification and support vector machines (SVM); Yang et al. [
29] presented a weather-based hybrid method; and Liu et al. [
26] proposed a back-propagation NN model to forecast PV power generation. All of these works classified days into different categories like sunny, cloudy, foggy, and rainy, and then built separate forecasting models for each classification. This approach implies that sub-models should be chosen on the basis of the weather condition of a forecasted day, apart from applying meteorological data to the model. However, although the accuracy is satisfactory for forecasting, sub-modeling is limited by complexity and computational costs.
In this paper, a generalized PV power forecasting model is proposed on the basis of support vector regression (SVR), historical data of PV power output, and meteorological data. In particular, SVR is supervised as a learning method and utilized for model development. In the study, PV power output characteristics and influential factors are analyzed to achieve forecasting accuracy. Subsequently, forecasted days are classified according to Malaysian weather conditions and historical PV power output data. The two types of weather in Malaysia are normal days (clear sky) and abnormal days (cloudy or rainy sky). Consequently, a generalized SVR-based model is introduced to forecast PV power generation accurately for any Malaysian weather condition. The proposed model is applied to and validated by three PV power stations situated in an institutional building of the University of Malaya in Kuala Lumpur. A generalized hourly resolution and day-ahead forecasting model is established to compare the forecasting accuracies of different weather conditions. This proposed single model, which is applicable for different weather conditions, is very simple to use and can reduce the complexities and computational costs.
The paper is organized as follows:
Section 2 presents a brief description of the methods applied to forecast the PV power generation including real PV plant data collection and analysis;
Section 3 indicates the performance metrics to evaluate the forecasting models; and
Section 4 discusses the results of the proposed model including comparison and validations. Finally,
Section 5 summarizes and concludes the study.
2. Methodology
2.1. Data Collection and Analysis
Three PV power generation systems were selected to collect PV power output and related meteorological data. These systems are located in Kuala Lumpur (latitude = 03°09′ N; longitude = 101°41′ E). The PV systems were installed on the rooftop of an institutional building of the University of Malaya.
Table 1 presents the details of these PV systems. The three PV plants are of monocrystalline, polycrystalline, and thin-film types with installed capacities of “1875”, “2000”, and “2700” Wp, respectively. PV power outputs were collected individually from each plant; however, meteorological data (i.e., solar irradiance, atmospheric temperature, module temperature, and wind pressure) were collected as a single dataset for all plants because of their similar geographical locations.
Table 2 shows the collected data for each unit parameter identified for the study. The PV power output and related meteorological data were collected by an automatic data acquisition system in 5 min durations from 1 January 2016 to 31 December 2016.
Figure 1a illustrates the mean daily PV power output generated by plant-1 for January 2016, and
Figure 1b shows the daily energy generated by each PV plant for May 2016. Average power generation differed across days due to variations in the weather conditions.
For plant-1 (
Figure 1a), the three days of 1, 15, and 16 January generated comparatively low power due to cloudy or rainy weather conditions (abnormal day). Meanwhile, due to clear skies (normal day), average power generation was comparatively higher for the days of 2, 7, 10, 11, 20, 24, and 25 January. In
Figure 1b, the daily average PV energy production shows a rather stable fluctuation in relation to the different weather conditions in Malaysia, which suggests the potential of the country to generate PV energy.
Photovoltaic power generation is closely related to meteorological parameters, such as solar irradiance, ambient/atmospheric temperature, module temperature, and wind speed, among others.
Figure 2a shows the patterns of solar irradiance and PV power output of different PV plants on a particular day. During clear-sky days (normal day), PV power output very strongly matched the solar irradiance curve. Similarly, for cloudy or rainy days (abnormal day), a pattern harmony is observed between PV power output and solar irradiance as shown in
Figure 2a. A similar pattern for PV power output and solar irradiance was observed for the different weather conditions because the generation of PV power fully depends on solar irradiance. If the irradiance increases, then PV power will be increased, and vice versa for any weather conditions. In Malaysian weather conditions, a linear relationship occurs between them. Therefore, a high correlation coefficient between PV power and solar irradiance has been observed.
Figure 2b shows a strong positive correlation between solar irradiance and PV power output. Therefore, solar irradiance is an important input vector when developing an appropriate PV power forecasting model, as evidenced by the high correlation coefficient of
R2 = 0.9888.
The analytical results also indicate that the other meteorological variables, such as ambient temperature, module temperature, and wind pressure, were also correlated with PV power generation. A comparatively weak correlation was established between PV power output and atmospheric temperature, while an extremely weak correlation was observed between PV power output and wind speed. However, wind speed has been considered an input of the proposed model to build an appropriate model. By contrast, a strong correlation between PV power output and module temperature was observed. Nevertheless, it has been ignored in the selection as an input vector of the proposed model because the module temperature is highly dependent on other variables, such as atmospheric temperature, wind speed, wind direction, humidity, and the amount of PV power generation.
2.2. Data Preparation
Some meteorological variables showed extremely weak correlations with the PV power generation. In this study, the influence on PV power generation by all of the aforementioned independent meteorological variables was considered.
Sample data on actual PV power generated and related meteorological variables were collected at 5 min intervals. The obtained PV power output data and meteorological data were averaged as hourly datasets. However, during data collection, PV power output samples may be lost due to recording errors or other special events. If abnormal data are collected, the training of SVR becomes unstable. Thus, before averaging, missing data should be replaced by same-hours data from the latest similar day.
2.3. Pre-Processing of Data
The nonlinear SVR model can map nonlinear inputs into higher-dimensional space to make them linear. However, a wider data range results in imprecisions both in terms of fitting and regression. If data are pre-processed into smaller ranges before they are inputted into the model, then regression precision can be increased. A well-known solution to the aforementioned limitation is a normalization process, wherein data are restricted to the range of 0 and 1. Normalization minimizes regression error, improves precision, and maintains correlation in the data-set. The formula for normalization is [
30]:
where
is the normalized input data;
is the original input data (PV power output and meteorological data); and
and
are the minimum and maximum values of the utilized input data, respectively.
2.4. Support Vector Regression
SVM [
31] is a supervised machine-learning method that follows the structural risk minimization principle. SVMs have greater generalization ability compared with other approaches, and they are widely used in resolving classification and regression problems. SVMs are excellent for time-series analysis due to its global minima. When applied to time-series prediction, SVM modeling is referred to as support vector regression (SVR). Forecasting PV power generation is a typical time-series analysis problem; in this case, SVR is an appropriate method.
The SVR algorithm is a nonlinear regression algorithm. Inputs from time-series data samples are mapped into high-dimensional feature space for nonlinear mapping. Subsequently, linear regression is conducted (
Figure 3).
A set of training data
is considered, where
is the input vector (meteorological data), and
is the corresponding output value (PV power output). The estimation function
is shown in Function (2):
where
is a weight vector, and
is the bias term, which can be estimated by minimizing the regularized risk function.
Using the
-insensitive loss function in SVR, the regression problem can be changed into the following optimization problem:
where
is the radius of the tube (margin of tolerance), which refers to the data inside the tube that should be ignored during regression. The feature vector, which lies on the boundary of the tube, is known as the support vector.
In line with the Lagrange multiplier method, the Lagrange function is acquired as follows:
where
are the slack variables representing the distance from actual values to the corresponding boundary values of the ε-tube. Hence,
Next, the saddle point of
L is calculated. The following equations can thus be obtained:
By substituting the original Function (4) with Equation (5), the following model can be obtained:
where
C determines the penalties of the estimation errors.
Subsequently, the kernel function
is introduced with a mercer condition to replace the original function
and the following model can thus be obtained:
The kernel function is one of the key factors of SVR. The performance of SVR is largely dependent on the selection of the kernel function and its parameters. Four traditional kernel functions are commonly used in SVR:
Liner kernel function:
Polynomial:
Gaussian RBF (radial bias function):
Sigmoid:
The radial bias function (RBF) is often used as the kernel in SVR because it requires only one parameter and it has a wide scope of application. RBFs also have the ability to universally approximate any distribution in feature space. Therefore, the RBF was used as kernel in this study. For the RBF, was set as the bandwidth of the kernel function.
To optimally solve the problems, the best
should be obtained. The best-fit regression function can be expressed as follows:
where
and
are the Lagrange multipliers; and
is the kernel function. The nonlinear separable cases can be easily transformed into linear cases by mapping the original variable into a new high-dimension feature space using
.
2.5. Proposed SVR-Based Model to Forecast PV Power Generation
To establish the generalized SVR-based model for PV power generation forecasting, the LIBSVM package [
32] proposed by Chang and Lin (2001) was adopted in present study. For the proposed generalized day-ahead hourly resolution model using the SVR approach, the average hourly data samples were considered for training and testing purposes. The diagram flow of the proposed model is shown in
Figure 4.
Historical data of meteorological variables (i.e., solar irradiance, atmospheric temperature, and wind speed) were considered as the input data-set of the model. Meanwhile, historical data of PV power output were considered as the output data-set for training the proposed model. All of the original historical PV power data and meteorological data were normalized within the range of [0, 1] to minimize regression error. Subsequently, the datasets for training (i.e., more than 70% of the total data) and testing were separated. To develop the model, the initialization values of the three dominating parameters (C, , and γ) in SVR should be also included. The training and the testing of the model were conducted first-time using the stipulated historical data-set. A non-optimization model implies that the dominating parameter values should be changed, and model training and testing should be conducted again; this procedure should be continued until the model is optimized. In the present study, the parameters were chosen on the basis of experience-based trial and error. An optimized model implies that PV power generation can be forecasted for a particular day. To forecast PV power generation using the proposed model, the hourly averages of normalized meteorological data for a given forecasted day (i.e., data collected from the Malaysian meteorological department or numerical predicted data) should be applied. Given that the input data of this model were normalized, the output data should be anti-normalized to extract the original values of PV power; this process consolidates the performance analysis of the proposed model.
A standard three-layer back-propagation ANN with the number of epochs set to 1000 was utilized as the alternative model for a comparative evaluation of the performance of the proposed model. To evaluate for better performance of the ANN model, the number of hidden neurons of the ANN model was changed to be between 5 and 20. To establish the benchmark of the PV power forecasting model, a persistence model was used to validate the proposed model.
4. Result and Discussion
In this research, a generalized forecasting model based on SVR was developed. The model was applied for hourly resolution day-ahead PV power generation forecasting. The actual PV power generation data of three different PV plants and their related meteorological data (i.e., solar irradiance, atmospheric temperature, and wind speed) were utilized in the study. In Malaysia, solar irradiance is available from 08:00 to 19:00 at almost all seasons of the year; accordingly, daily PV power outputs are received during these times only. Experimental data covering three months (January, May, and September 2016) were used to verify the proposed model. However, Malaysian weather conditions have not changed drastically over the year. Nevertheless, these three months have been selected in this analysis because all of the weather conditions throughout the year were considered. Sample data were selected randomly for training and test purposes due to different weather conditions. The forecasted PV power of the proposed model should be anti-normalized to extract the actual value of the PV power forecast. Consequently, the forecasted values of the proposed model were compared with those for the standard back-propagation NN model and the persistence model.
Figure 5a shows the measured PV power outputs (actual) and forecasted PV power outputs of the different models (i.e., proposed model, ANN, and persistence model) of PV plant-1 in normal weather conditions. In this case, 21 and 22 May were set as the forecast days. As shown in
Figure 1b and based on the analysis of the PV power output patterns, these days (21 and 22 May) are clear-sky days (normal day). Hence, the hourly PV power output average and related metrological data of 14 days (7–20 May) were used to train the model. As shown in
Figure 5a, the forecasted curve of the proposed model almost matched the actual measured curve of PV power generation.
Figure 5b represents the actual measured PV power outputs and forecasted PV power outputs of the proposed model, ANN model, and persistence model of plant-1 in abnormal weather conditions. In this figure, 26 and 27 May were set as forecast day. As shown in
Figure 1b and based on the analysis of PV power output patterns, these days are cloudy or rainy days (abnormal day). Hence, the hourly PV power output average and related meteorological data of 14 days (12–25 May) were used to train the model. As shown in
Figure 5b, the forecasted curve of the proposed model almost matched the actual measured curve of PV power generation. However, a significant deviation for the persistence model curve was observed, which might have been affected by abnormal weather conditions.
The same meteorological data-set and related PV power output data of PV plant-2 were used to train and test the models in different weather conditions. Similarly to plant-1, normal forecast days (21 and 22 May) and abnormal forecast days (26 and 27 May) were selected.
Figure 6a,b shows the actual measured PV power output and forecasted PV power output of the different models (i.e., the proposed model, the ANN model, and the persistence model) for normal and abnormal weather conditions, respectively. As shown by
Figure 6a,b, the forecasted result of the proposed model almost matched the actual measured PV power output of plant-2.
Similarly to the previous approach, the same meteorological data-set and related PV power output data of plant-3 were used to train and test the proposed model, the ANN model, and the persistence model in different weather conditions. Normal and abnormal forecast days were selected, similarly to plant-1 and plant-2, in consideration of same-month data.
Figure 7a,b shows the actual measured PV power output and forecasted PV power output of the different models for normal and abnormal weather conditions, respectively. The figures show that the proposed PV power curve almost matched the actual measured PV power curve of plant-3.
The performances of the proposed model, the ANN model, and the persistence model were evaluated using the experimental data of May 2016. In this case, nRMSE, MAE, and MBE were adopted for the forecasting performance evaluation. Calculation results are shown in
Table 3.
Similar work has been done by using other practical datasets collected from the months of January and September of 2016 for further validation of the proposed model. In this case, the analyses have been completed by considering the normal and abnormal weather conditions separately. From the analysis of the PV power output pattern of January, it is clear that 24 and 25 January are normal days, and 15 and 16 January are abnormal days. On the other hand, it is also clear from the analysis of the PV power output pattern of September that 29 and 30 September are normal days and 21 and 22 September are abnormal days. From this study, it has been found that the actual measured output and forecasting output of the proposed model are almost matched in all PV plants for the data-sets of both months. The detail of the error calculation result is shown in
Table 4 for the month of January and
Table 5 for the month of September.
The performances of the proposed model, the ANN model, and the persistence model were calculated by averaging the performance values for the three-month period (
Table 6). This table shows the average nRMSE, MAE, and MBE result for the different models. The overall average result for the different models and those for other ANN-based forecasting models [
33] are also presented in
Table 6.
Based on the figures, tables, and discussions presented in this paper, the proposed generalized SVR-based model performed very well for PV power generation forecasting in different weather conditions. The results also showed that the model could forecast the PV power generation accurately in various seasonalities of Malaysian weather conditions. The errors obtained for all of the plants using the proposed model are very similar for normal and abnormal weather conditions. In the proposed model, the errors computed in normal weather conditions are always slightly lower compared with those for the abnormal weather conditions. The reason for this situation is that the model can be fitted well in normal weather conditions because of less variation in sample data. Hence, the forecasted results are nearly the same in various months because of the absence of drastic changes in the weather conditions of Malaysia. The average forecasting results of the proposed model were 3.08% in nRMSE, 34.57 W in MAE, and 11.34 W in MBE, which were better compared with the ANN model and the persistence model. The proposed model also outperformed other ANN-based forecasting models [
33].
Figure 8 shows the actual measured PV power output and the forecasted PV power output of the proposed model. In this case, 14 and 15 January were set as forecast days. As shown in
Figure 1a and based on the PV power output pattern of plant-1, these two days have cloudy or rainy weather conditions (abnormal day), which suggest comparatively low electrical energy. The shape of the second part of this figure is quite different from the other figures because some hours during these days have clear skies while the rest are cloudy or rainy. However, the highest peak of the first part of the figure is comparatively low due to the existence of some clouds that day. Nonetheless, the forecasting PV power of the proposed model almost matched the actual measured PV power in any situation. Therefore, the proposed model can forecast PV power generation accurately in any weather condition.
Figure 9a shows the deviation of the actual measured PV power output and the PV power output of the proposed forecasting model of plant-2 in different weather conditions. The deviation of the measured and forecasted PV power generation at each particular point was not higher than 10%, a percentage in the allowable range. The figure also shows that deviations in different weather conditions do not change drastically. The deviations for all plants were then evaluated on the basis of all datasets. In all cases, the results were almost similar to the deviation.
For the proposed model,
Figure 9b shows the correlation of the actual measured PV power output and the forecasted PV power of plant-2 in different weather conditions. The correlation coefficient of
in normal weather conditions and
in abnormal weather conditions explains the intensity of the correlation; that is, the correlation between the measured power and the forecasted power is very good. The positive sign of this value expresses the proportional relationship between the two powers. Based on this analysis, the proposed model performs very well.