A Review of Price Forecasting Problem and Techniques in Deregulated Electricity Markets ()
1. Introduction
With the Introduction of deregulation of power industry, new challenges have been encountered by the participants of the electricity market due to which forecasting of wind power, electric loads and energy price have become a major issue globally [1] . Deregulation however, has been associated with the expectation of greater consumer participation and efficiency gains for both consumers and share-holders. Globally energy price forecasting has come up as important area of research due to deregulation of whole sale market. Major market participants such as generators, power suppliers, investors and trades wish to maximize the profitability [2] -[4] . Unlike load forecasting, electricity price forecasting is much more complex because of the unique characteristics and uncertainties in operation as well as bidding strategies [5] . In other commodity markets like stock market, agricultural market price forecasting is always being at the center of studies because of its importance [6] -[9] .
Electricity is also a commodity, and its price should also be forecasted along with time but if the same methods were used for forecasting electricity prices as other commodity prices, the forecasted price will exhibit lower accuracy without any surprise due to volatile nature of electricity price among all commodities. Many techniques and models have been developed for forecasting whole sale electricity prices, especially for short term price forecasting [3] . The state of art techniques for electricity price forecasting are categorized into equilibrium analysis [5] , simulation methods [10] , econometric methods [11] , time series [12] -[14] , intelligent systems [15] -[17] and volatility analysis [18] . Time series and intelligent systems are commonly used for day-ahead price forecasting. This paper reviews established approaches and mainly focusing on soft computing models.
2. Factors Influencing Price Forecasting
In deregulated power markets, fluctuation is a common behavior of price which is because of many different economic as well as technical factors. Some researchers have only used historical data of prices [19] or both prices and demand to forecast spot price excluding other factors such as weather, fuel cost and generation reserve. The various factors that affect the spot price are shown in Figure 1 [20] .
2.1. Electric Power Demand
One of the important factors in spot price is system’s total demand. Studies show that if system demand increases, spot price also increases.
2.2. Whether Conditions
Electricity demand certainly depends on environmental condition and especially daily temperature. Weather fluctuation will affect demand and hence spot price will also be affected.
2.3. Fuel Cost
Fuel cost is one of the main parts of generation cost that its variation has a major impact on electricity spot price.
Figure 1. Factors affecting electricity prices.
2.4. Available Transmission Capacity
Electric power is provided by generator that may be located far from location of consumers. It should be transmitted to consumers via transmission network facilities. There is some physical constraint in transmission networks that is an obstruction for market participants to buy or sell energy. This issue can affect important changes on spot price and may increase it.
2.5. Generation Reserves
Having enough generation reserve is an important factor for electricity spot price, i.e. when demand increase suddenly if there is enough generation reserve capacity available as well as deliverable, consumers will be served. But if there is not sufficient generation reserve available, consumer would face with lack of received energy and therefore to make the balance between supply and demand electricity spot price increases.
3. Electricity Load and Price Forecasting Problems and Methods
3.1. Load Forecasting
Forecasts, in particular have become important after restructuring of the power systems as many countries have deregulated their power system and turned electricity into commodity from necessity. Many countries are still in the process and soon electricity will be a commodity with players in all across global market. Load series is not only complex nut also exhibits several levels of seasonality: the prediction is not only depends on the previous hour load but also on the load of the same hour on previous day, and same denominations in the previous week [21] -[23] .
Various techniques and models have been developed for the forecasting the electrical load with varying degrees of success, but the still the models based on the linear regression scores over the other reported models. These models allow the system operators and engineer, physically interpretation of the components so that their behavior can be understood. Models based on Artificial Intelligence (AI) were also developed for forecasting of electrical load, such as expert systems, fuzzy inference, fuzzy neural models and neural network (NN) based models. Neural networks due to their intrinsic capability to learn complex and non linear relationships that are otherwise difficult by other conventional methods, have gained popularity among all artificial intelligence based models [24] [25] .
3.2. Price Forecasting
With introduction of the deregulated electricity markets major emphasis is on maximizing the profits of the various market players. As far as forecasting is concerned electricity prices and load are mutually interlinked, due their dependability on each other and error in one will propagate to other. Non-storability, Seasonal behavior and Transportability are the major issues which makes electricity price so specific. These issues make it impossible to treat the electricity at par with any other commodity and forbid the application of forecasting methods common in other commodity markets [26] .
Electricity price forecasting can be categorized into three different categories based on time horizons: Short- term forecasting, medium-term forecasting and long-term forecasting as shown in Figure 2. Short term price forecasting (mainly one day ahead) will be mainly used by the market players to maximize profits in the spot markets. Knowledge of medium term forecasting will allow the successful negotiations of bilateral contracts between suppliers and consumer while long term forecasting will influence the decisions on transmission expansion and enhancement, generation augmentation and distribution planning.
4. Prices Forecasting Methods
Survey reveals that various methods have been developed for forecasting. A rough tree of classification is shown in the Figure 3, this classification is not comprehensive and other approaches or methods are possible, these methods can be used for load forecasting as well as price forecasting [1] [3] .
Mainly for price forecasting the approaches can be classified into two categories [6] [27] -[35] 1) time series and 2) simulation approach, time series mainly relies on the historical data of market prices. In simulation approach requires precise modeling of power system equipments and their cost information, because of large
Figure 3. Classification of forecasting techniques.
amount of data involved simulation method can be computationally intensive.
Time series approach can be further classified into the following, linear regression based models and non linear heuristic models. Regression-based models include auto-regressive moving average (ARMA) models, and its extension, auto regressive integrated moving average (ARIMA) models, and their variants. While these models are aimed at modeling and forecasting the changing price itself, generalized autoregressive conditional heterokedasticity (GARCH) is aimed at modeling the volatility of electricity prices [20] .
Nonlinear heuristic based models uses artificial neural network and other artificial intelligent techniques for modeling the input-output data relation without complete information of the connections. Other soft computing methods are also used to extend the data representation capability of the regression based or ANN models.
5. Price Forecasting Methodology
A typical procedure of price forecasting is shown in the Figure 4 [20] . The flow chart is depicting the process of time series based forecasting. The process of forecasting usually starts with the input data, the major input data for the price forecasting are the past market prices, record of a few weeks to several months is taken as input.
Figure 4. Flow chart showing forecasting procedure.
Some complex forecasting models require additional input as demand and/or temperature data.
Simple statistical analysis on the input data set (e.g. mean and volatility) will give some hint of model selection and later model validation. The scope of forecast (e.g. price profile of its volatility, etc.) will be an important factor for the selection and design of forecasting models/techniques. The accuracy of results needed will be an important factor of the selection and design of models/techniques. The model validation is carried out after optimizing the parameters of models to check the performance of the model. The process of validation is repeated if the results are not satisfactory with different starting parameters. If the validation is successful the model is applied to do the actual forecast.
6. Pre-Processing of Data for Time Series Models Using Wavelet Transforms
Wavelet Transform is a mathematical model which analyses data and provides time and frequency representation simultaneously (time-scale analysis) [36] . It is used for analyzing non-stationary signals in power systems such as price time series [37] -[40] , voltage and current waveforms [41] . The wavelet transforms decomposes the original time domain signal into several other scales with different levels of resolution in what is called multi- resolution decomposition [42] .
Wavelet Transform is most suited for the non-stationary data (mean and autocorrelation of series are not constant), the price series data is non-stationary and volatile in nature, that is why use of wavelet transform gives accurate forecasting results [43] . FT (Fourier Transform) decompose the original price series into a linear combinations as sine and cosine functions whereas by using Wavelet Transform (WT) the series is decomposed into a sum of more flexible functions i.e. localized in both time and frequency. Wavelet Transform can be classified into two: Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT) [40] [44] -[46] .
The CWT of a continuous time signal x(t) is defined as (1):
(1)
where ψ(t) is a mother wavelet, a is a scaling parameter, b is a translating parameter and
(2)
Each wavelet is created by scaling and translating operations in a mother wavelet. The mother wavelet is an oscillate function with finite energy and zero average.
The DWT of a sampled signal x(n) is defined as (3):
(3)
where
(4)
where, c and d are scaling and sampling numbers respectively. General block diagram for level 3 decomposition is shown in Figure 5.
Technically, the price data is transformed into low and high coefficients. The low coefficients are an approximated version that is associated with low pass filtering and possess the similar characteristics as of original price series, while latter with high pass filtering which contains information regarding peaks that occur in the original price signal. Results are significantly affected by the selection of mother wavelet. For price forecasting application generally Daubenchies wavelet transforms are suitable because they have compact or narrow window function which is suitable for local analysis of non-stationary price series.
7. Forecasting Models Based on Linear Regression
7.1. ARIMA Model
ARMA stands for Auto-Regressive Moving Average, ARMA is suitable model for stationary time series but most
Figure 5. Level 3 wavelet decomposition.
of the price series are non-stationary. To overcome this problem and to allow ARMA model to handle non-sta- tionary data, the new model is introduced for non-stationary data, the model is called Auto-Regressive Integrated Moving Average (ARIMA), it has been successfully applied to forecast the commodity prices [47] -[49] . The application of ARIMA methodology for the study of time series analysis is due to box and Jenkins [50] .
There are many ARIMA models; generally ARIMA model is defined as ARIMA (p, q, d) where: p is the number of autoregressive terms, q is the number of lagged forecast error in the prediction equation, d is the number of non-seasonal differences. If there is no differencing (i.e. d = 0), then ARIMA model can be called an ARMA model [47] .
Consider a time series xt, then the first order differencing is defined as:
(5)
where, L can be used to express differencing
(6)
Thus, ARIMA (p, d, q) is defined as:
(7)
(8)
ARIMA models are derived from autoregressive (AR), moving average (MA) and auto-regressive moving average (ARMA). In AR, MA and ARMA models conditions of stationary are satisfied; therefore they are applicable only to stationary series. ARIMA model captures the incremental evolution in the price instead of price value.
7.2. GARCH Model
GARCH stands for Generalized Autoregressive Conditional Heteroskedasticity while the (ARIMA) models are aimed at modeling and forecasting the changing price itself, (GARCH) model is aimed at modeling the volatility of prices [51] [52] . GARCH models consider the moments of a time series as variant (i.e. the error term: real value minus forecasted value does not have zero mean and constant variance as with an ARIMA process). The error term is now assumed to be serially correlated and can be modeled by an Auto Regressive (AR) process. Thus, a GARCH process can measure the implied volatility of a time series due to price spikes [53] -[55] . The model GARCH (p, q) is defined as follows:
Consider a time series xt with a constant mean offset, then
(9)
where is offset and.
(10)
where p is the order of GARCH terms σ2 and q is the order of ARCH terms ε2.
As we can easily see in Equation (10), in GARCH (p, q) model is p = 0, i.e. a GARCH (0, q) model becomes an ARCH (q) model. GARCH model can only specified for stationary time series so below equation must be satisfied for stationary time series.
(11)
GARCH process can measure the implied volatility due to price spikes.
8. Forecasting Models Based on Nonlinear Heuristics
8.1. Artificial Neural Network Based Model
Most of the time series models are linear predictors, while electricity price is a non-linear function of its input features, making it difficult for the time series techniques to completely capture the behavior of price signal. Therefore the researchers have come up with the idea of using Neural Network (NNs) for electricity price forecasting [56] -[59] .
Neural networks are highly interconnected simple processing units designed to model how the human brain performs a particular task [60] . Basic structure of the neural network is shown in the Figure 6. The network generally consists of three to four layers and during training process, the neurons in the input layer pass the raw information onto the rest of the neurons in the other layers. The connection weights between different layers keep on updating with the ongoing learning process [60] .
A neural network uses a learning function to modify the variable connection weights at the input of each processing element i.e. neuron. The ANN models could be differentiated based on type of learning function, learning algorithm and no. of hidden layers etc. Generally a three layered neural networks are chosen for forecasting the electricity price.
ANN based models have gained popularity due to their property to solve undefined relationship between input and output variables, approximate complex nonlinear function and implement multiple training algorithms. However, neural network also suffers from the disadvantage that the network will not be flexible enough to model the data well with too few units, and on the contrary, it will be over-fitting with too many units [20] .
In order to overcome such weakness, different evolutionary techniques have been combined with ANNs recently [58] [61] -[65] . ANN model with feature selection technique and relief algorithm [59] and particle swarm optimization is used for ANN training [66] .
8.2. Radial Basis Function Neural Network Model Based Model
Radial basis function Neural Network (RBFNN) has comparatively less chance to trap in local minima and has faster learning rate [67] . RBFNN uses radial basis function as the activation for the hidden layer neurons as compared to the artificial neural network (ANN). Similar to the ANN architecture the RBFNN also contains three layers i.e. input layer, output layer and only one hidden layer. The difference arises in terms of center neurons activation function and training method. The training of RBFNN consists of three steps: 1) centre selection; 2) width selection of basis function and 3) weight calculation for output layer.
Figure 6. Architecture of artificial neural network.
The model of RBFNN can be described as follows:
(12)
is the output of ith neuron of hidden layer and I is an input training vector as described in III. A; Ψ(.) is radial basis function used in non-linear mapping, Ci is center for ith hidden layer neuron and ri is radius for ith hidden layer neuron.
(13)
In (13), is Euclidean distance and it can be calculated using (12), where, q is number of inputs in one training pattern. The width (σ) of the basis function is decided by the singular values of Gtr, which are generated using (14). Here, dmax is maximum euclidean distance between final centre points Ci and all training input points. Weights of the output layer can be measured using Equation (15). and Ytr are pseudo inverse of Gtr and output training patterns matrix respectively.
(14)
(15)
The basic structure of RBFNN is shown in Figure 7, the numbers of neurons in input layer (Ni) and output layer neuron (No) are selected on the basis of training patterns developed. The nonlinearity of the system decides
Figure 7. Architecture of radial basis function neural network (RBFNN).
the number of neurons in the hidden layer (Nh). The data flow start from the input layer and traverse through the hidden layer and arrives at the output layer. Input as well as output layers of RBFNN have linear activation functions, however the hidden layer neurons has a radial basis function (Gaussian) activation function.
(16)
where, Gtst defined as (17) and suffix tst denote any testing or real time input pattern for which output is desired from a trained RBFNN. Output Y can be calculated using (16).
(17)
Input layer weight matrix has value 1 for all its elements, because input is direct and linearly mapped to hidden layer. For training of RBFNN K-mean clustering is applied. In RBFNN weight matrices, Vrb and Wrb contain weights of hidden layer and output layer, respectively.
8.3. Fuzzy Inference System Based Model
An FIS performs input-output mapping based on fuzzy logic. Fuzzy evaluates the intermediate states between discrete crisp states and is able to handle the concept of partial truth instead of absolute truth. Traditional adaptive fuzzy system include ANFIS and neuro-fuzzy methods are intended to combine the advantages of ANN and fuzzy logic with the difference that ANFIS architecture has linear output function [68] , whereas neuro-fuzzy systems are essentially a subset of ANN applied to controls and classification problem [69] .
Wang-mendel suggested an algorithm for implementing FIS for time series prediction [70] and the same approach was extended to forecast the electricity price. The approach is model free and heuristic in nature. A common framework called the fuzzy rule base is constructed to combine both numerical and linguistic information. The numerical information is sampled from measurements, and the linguistic information interprets the numerical information [71] . The FIS is able to bridge the gap between interpretability and accuracy by providing a verbally interpretable rule base and numerical accuracy through training. The FIS using wang-mendel learning algorithm does not require iterative training making it more efficient than ARMA or GARCH time series techniques and ANN or neuro-fuzzy intelligent systems [70] .
Compared to the black box nature of Artificial Neural Network (ANN) the Fuzzy Inference System (FIS) provides a transparent linguistic rule base instead of a black box. The rules may be modified manually to include expert knowledge. The rule base provides FIS the advantage of interpretability and transparency. FIS also provides flexibility in choosing predefined membership function. The FIS algorithm can be modified for higher accuracy and efficiency.
8.4. Fuzzy ARTMAP Based Model
The Fuzzy ARTMAP is a new concept for electricity price forecasting [72] , it has already been applied for wind speed forecasting [73] and load forecasting [74] . Mostly conventional neural networks suffers from plasticity-stability dilemma, i.e. the information related to the plasticity or adaptivity to the new inputs or change in inputs at the same time stable in response [75] [76] . The fuzzy ARTMAP structure shown in Figure 8 addresses this dilemma by incorporating a feedback mechanism between the competitive and input layers to allow new information to be learned without eliminating previously obtained knowledge, in this it becomes more stable and shows a faster convergence capability [77] . ARTMAP is a class of neural architectures that perform incremental supervised learning of recognition categories and multidimensional maps in response to input vectors presented in arbitrary order. An ARTMAP system embodies twin art modules (ARTa and ARTb) to fabricate stable recognition categories corresponding to the arbitrary input patterns. ARTa uses ART-1 while ARTb uses FUZZY ART. This set up enables to switch the binary modules set theory notations to transform into a corresponding feature in the fuzzy ART module.
Example; the intersection operator () of ART1 is replaced by the operator (^) in the FUZZY ART. The architecture, called fuzzy ARTMAP, achieves by synthesis of fuzzy logic and adaptive resonance theory (ART) neural network by employing a close formal similarity between two computations of fuzzy subsets and ART category, resonance, and learning. Fuzzy ARTMAP also actualize a new min-max learning rule that collectively
Figure 8. Architecture of fuzzy ARTMAP [72] .
minimizes predictive error and maximizes generalization, or code compression. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error.
So as a result, the system automatically learns a minimal number of recognition categories, or “hidden units” to meet the criteria of accuracy. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the AND operator (^) and the OR operator (v) of fuzzy logic plays complementary roles.
In training, the best matching category is [75] ;
(18)
where,
(19)
where, Tj = choice function, α = choice parameter, ^ = Fuzzy MIN operator, ρ = vigilance parameter and
is the vigilance criteria. If a vigilance criterion satisfies then resonance occurs. During training, the
vigilance criterion varies from baseline vigilance which is initial value. If vigilance criteria pass then category J becomes representative membership function for time series, and the weight vector of the winning category is updated by following Equation (20):
(20)
where is the learning rate. If a vigilance criterion fails then category J is deactivated for the current price series by setting choice function equals to zero. If ARTb does not predict the correct output for ARTa, then vigilance parameter is increased. This is called match tracking, in match tracking vigilance parameter is slightly increased to a new value:
(21)
where is a learning precision.
The scheme resizes a category on predictive success by amplifying the vigilance parameter by a minimal amount essential to verify the predictive error in the ARTb. The parameter holds an inverse relationship with the category size. A lower value leads to a broadly generalized category with higher compressed code. This parameter rates the minimum faith that ARTa should have while accepting a category during hypothesis testing which focuses ARTa on a new cluster. The failures at ARTa increase to that threshold value which in turn triggers ARTa under a process called match tracking.
This technique reduces generalization essential to correct a predictive error. The combination of these techniques i.e. ARTMAP and Match tracking leads to a faster learning and erudition from a rare event. The fuzzy ART reduces to ART1 for a binary input and works as self for a binary input and works as self for an analog vector. Thus the crisp logics of ART-1 with their fuzzy counterparts form a potent module.
Once the training stage is completed, the Fuzzy ARTMAP network is used as a classifier of the input price series which is given to the ARTa. ARTb is not used during classifying process and the learning capability of the network is deactivated during classifying process (i.e.). In this stage we get predicted classified labels in the output. These output labels are defuzzified for getting the forecasted price series. To find the best training parameters for the neural network some models use optimization algorithms for good results and comparatively low processing time.
9. Forecasting Models Based on Simulation Methods
Simulation methods usually simulate generator dispatch patterns over an extended period of time. These methods mimic the actual dispatch with system operating requirements and constraints. Despite of the high data requirement by these models, they can provide detailed insights into the price curve.
The simulation methods which are currently being used by the electric power industry range from the bubble-diagram type contract path models to production simulation models with full electrical representation, such as GE-MAPS software [10] . The production simulation models by nature of their chronological simulation patterns, will consider time varying systems limits and characteristics. Some important issues that must be addressed in any market simulation program that forecast the LMPs for the electricity market are [10] :
・ Detailed transmission model
・ Unit commitment
・ Economic dispatch with transmission constraints
・ Secure dispatch
・ Chronological simulation
・ Large-scale study capability
・ Data resources
・ Benchmark and application
Simulation model known as MAPS, has been develop which stands for market assessment and portfolio strategies, this model incorporates a full representation of the electrical transmission model. The detailed power flow data and secure dispatch of generators, tracking transmission line flows, loss determination, and transaction evaluation are well integrated, providing an accurate through time simulation of system operation.
The MAPS model is able to simulate large power system for one or multiple year within optimum period. The MAPS model can be applied to solve the following issues:
・ Analyze market power issues
・ Evaluate alternative market structures
・ Estimate stranded generation investments
・ Assess economics of building new generation
・ Assessing transmission costs
・ Understanding market behavior
The general input output structure of MAPS is shown in the Figure 9.
The data requirement of MAPS is similar to any free-standing production cost program or load flow models. Through its integration of generation and transmission models, it captures hour by hour market dynamics while simulating the transmission constraints of the system. Market simulation programs minimize the system cost to serve loads subject to transmission constraints, unit commitment and economic dispatch with transmission constraints are the core functions of typical market simulation programs.
The program automatically provides the location market clearing prices for any bus, identifies the bottlenecks of transmission networks, and produces the generation schedules and power flows on the transmission grid, which are important in deregulated markets. Simulation methods are intended to provide detailed insights into system prices. However, these methods suffer from two drawbacks. First they require detailed system operation data and second simulation methods are complicated to implement and their computational cost is very high.
10. Forecasting Models Based on Game Theory Models
There has been great deal of research to understand electric power markets, and various methods for modeling, analyzing and selecting bidding strategies for power suppliers. Gaming theory is a natural platform for market competition [78] . It is of great interest to model the strategies of the market participants and identify solution to those games. Since participants in oligopolistic electricity markets shift their bidding curves in order to maximize their profits, these model provides the solution to these games and profit can be considered as the outcome of the power transaction game. In this group of models, equilibrium models [5] , take the analysis of strategic market equilibrium as the key point. The gaming models are generally used by the market operators for deciding the market strategies. The detailed discussion on game theory can be found in [79] -[83] .
11. Forecasting Models Accuracy
Mainly following types of accuracy parameters are defined in the literature by authors’ for validating the accuracy of the proposed model. For the maximum accuracy of models values of these measures must be in permissible limits. Error is defined as the difference between the actual value and the forecasted value for the corresponding period.
(22)
where, is the error for the period t, is the actual value for the period t, is the forecasted value for the period t, then measures of aggregate error:
11.1. Mean Absolute Error
(23)
11.2. Mean Absolute Percentage Error
(24)
11.3. Mean Absolute Deviation
(25)
11.4. Percentage Mean Absolute Deviation
(26)
11.5. Mean Square Error
(27)
11.6. Root Mean Square Error
(28)
where N represents the number of observations used for analysis.
12. Conclusions
In the presented work a study of different price forecasting methodologies is done in the deregulated environment. The restructuring of power markets has created an increasing need to forecast accurate future prices among the market participants with the purpose of profit maximization. Price forecasting is a difficult task due to special characteristic of price series such as non-constant mean and variance, outliers, seasonal and calendar effects.
Electricity price forecasting models includes statistical and non statistical models. Time series models, econometric models and intelligent systems methods are three main statistical models. Non-statistical methods include equilibrium analysis and simulation methods. Methods based on time series are more commonly used for electricity price forecasting due to their flexibility and ease of implementation. The main drawback of time series models is that they are usually based on the hypothesis of stationarity, whereas the price series violates this assumption.
The scope of forecast (e.g. price profile, or its volatility etc.) is an important factor for the selection and design of forecasting models/techniques. The complexity of model(s) also largely determines the number of required input data. Depending on the target of forecast, the procedure may apply data filtering and transformation before the model is optimized for the given price data. Wavelet transform is generally used for smoothening the price data, and removing seasonal effect, outliers and other irregularity effects, the result of the approximated series under wavelet transform is better than the original price data and more stable mean and variance with no outliers.
It can be concluded that there is no universal tool for price forecasting which can be used for every market and operator. For specific applications it becomes essential to select the specific tool/techniques, and following points should be kept in mind:
1) Type of forecast (i.e. long term, medium term, short term).
2) Available resources for processing, storing the historical data of the price.
3) Importance of accuracy in forecasting.
By combining wavelet transform with ARIMA, GARCH, Neural Network and other models, the performance characteristics of these models can be increased by reducing forecasting errors.