1. Introduction
Severe acute respiratory syndrome coronavirus-2 (SARS CoV-2) is a novel coronavirus that broke out in Wuhan China in December 2019 and has rapidly spread across the world causing a global pandemic [1] [2]. To date, SARS CoV-2 is the seventh coronavirus known to affect human beings. It affects the lower respiratory tract and causes symptoms ranging from mild fever, cough and sore throat to severe and fatal complications including acute respiratory distress syndrome (ARDS), severe pneumonia, septic shock, pulmonary edema, hypercoagulable state, other organ failure and subsequent death [3] [4]. Patients with underlying comorbidities like diabetes mellitus, cardiovascular diseases, malignancies, chronic respiratory diseases and elderly people are more prone to developing complications of SARS CoV-2.
According to the Worldometer as of January 29th, 2022, there had been 372,153,572 confirmed cases, 5,672,345 reported deaths and 293,778,361 recovered SARS CoV-2 cases worldwide [5]. The spread of this disease has been a growing public health concern as it affects and poses significant challenges to a country’s economic, political and social development. Several nations have tried to curtail this spread by imposing strict hand hygiene and imposing national lockdown. Unfortunately, given their large genetic diversity, frequent genome recombination, multiple viral strains with identified genetic polymorphisms, complex disease manifestation, and multiple routes of transmission of SARS CoV-2, control measures have not been very successful [6].
Sierra Leone reported its first case in March of 2020 [7] and since then there have been 7608 confirmed cases of SARS CoV-2 with 125 deaths [8]. As of 26 January 2022, a total of 1,409,313 vaccine doses have been administered [8]. A recent cross-sectional, nationally representative, age-stratified serosurvey on SARS-CoV-2 antibody prevalence in Sierra Leone shows that overall weighted seroprevalence was 2.6% (95% CI 1.9 - 3.4) which is 43 times higher than the reported number of cases [9]. Despite this relatively low rate of infection and spread of SARS-CoV-2 in Sierra Leone, there is still a lot of uncertainty regarding this ever-changing coronavirus. The healthcare system in this country is very frail and unequipped for large number of admissions at a go and with an economy that is largely dependent on imports and exports; long-term border closure is not feasible.
Therefore to help predict the trajectory of the disease and for short-term forecasting of new cumulative confirmed cases, we utilized a univariate autoregressive integrated moving average (ARIMA). The ARIMA models have been successfully applied to predict the incidence of infectious diseases, such as influenza mortality [10], malaria incidence [11], as well as other infectious diseases [12] [13]. This model is vital for understanding and estimating the disease progression in Sierra Leone, which will help inform policy planning regarding curtailing further spread and future containment measures.
2. Methods
2.1. Data Source
The data for this study consists of confirmed SARS CoV-2, cases per day from 13th March 2020 to 30th January 2022. The daily SARS CoV-2, cases were obtained from Our World in Data, an official website for all SARS CoV-2, (https://covid19.who.int/region/afro/country/sl). We use R statistical software to analyze the SARS CoV-2, data.
2.2. Unit Root Test
Before estimating the parameters for the ARIMA model, the data were tested for stationarity using the Augmented Dickey-Fuller (ADF) test, for which the null hypothesis
of the time series is said to be non-stationary. The result of the ADF test suggested that the time-series data was non-stationary (p > 0.05). After applying the second difference, i.e.,
, the p-value obtained was less than the significance level (p < 0.05) and the statistical ADF is lower than any of the critical values, so the null hypothesis was rejected.
2.3. The Model
The autoregressive integrated moving average (ARIMA) model, is a generalization of the ARMA model with non-stationary series. ARIMA is non-stationary means that it has non-constant mean and variance over time. The integrated part refers to a differencing initial step, which can be applied to eliminate the non-stationarity of the series. An ARIMA model is unequivocal by its three components:
• Auto regression (AR) model is the model which represents a variable that regresses on its lagged, or prior, values.
• Integrated (I) shows the differencing of basic observations so that the time series may be stationary.
• Moving average (MA) provides the docility between an observation and a residual from the MA model for lag observations.
The autoregressive time series regression model of order p, signified by AR(p) is given by
where
is the model parameter,
is a normally distributed random process with mean 0 and a constant variance
which is assumed to be independent of all process values.
White noise series properties with mean 0 and variance
are moving averages, with order q expressed as MA (q). The weighted linear sum of previous forecast errors is given by
where
is the model parameter,
is a normally distributed random process with mean 0 and a constant variance
which is assumed to be independent of all process values.
The ARMA (p, q) model composes of two main polynomials which are AR(p) and MA (q). It is expressed thus:
where
is
.
and
are the model parameters,
is a normally distributed random process with mean 0 and a constant variance
which is assumed to be independent of all process values.
The ARIMA (p, d, q) model is a widely used statistical method used in stationary time-series analysis such as forecasting. To build such a model, the primary step is to investigate whether the statistical stationery of a time series can be satisfied or not. Then, the next phase is estimating the numerical values of p and q parameters for AR and MA models. Thus, the essential idea of the ARIMA model is based on the assumption that the predicted value of the variable
is generated from a linear equation of several previous observations with random errors. A process
is an ARIMA (p, d,q) when it satisfies the form
where
and
are polynomial operators.
, for
, where
is the difference operator.
2.4. Performance Measures
To evaluate the prediction models, we use the following statistical measures.
Root Mean Square Error (RMSE):
Mean Absolute Percentage Error (MAPE):
where
denotes actual value and
denotes the predicted value for the kth instance.
3. Results
Figure 1 shows a strong upward trend of SARS CoV-2, cases in Sierra Leone showing that the series is not stationary. This is confirmed by results of the unit
Figure 1. Cumulative confirmed cases of SARS CoV-2 from 3rd March 2020 to 31st January 2022.
root tests ADF as presented in Table 1, where the p-values are all greater than 5% level of significance. Thus, there is not enough evidence to reject the null hypothesis that the SARS CoV-2, series of Sierra Leone is nonstationary. Nonetheless, a second difference in the series made it stationary, as confirmed by the ADF.
The autocorrelation function (ACF) plot is also useful for identifying nonstationary time series. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Therefore, differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend. Consequently, we will take a second difference in the data. The second differenced data are shown in Figure 2.
Residuals are useful for testing the model’s suitability to capture the information in the data. The estimated autocorrelations between the residuals at various lags are depicted in Figure 3.
From the ACF and PACF graph (as shown in Figure 2) and the models trace summary table (Table 2), we were able to observe the following candidate models and also using the AICc model selection criterion, we detect that the ARIMA (3, 2, 1) with drift as the model with lowest AICc value.
The ARIMA (3, 2, 1) model predicts the number of cumulative confirmed cases over the next 30 days using the previously observed data as shown in Table 3 with lower and upper confidence limits. Although the increasing trend is visible, the model has better performance.
Figure 2. ACF and PACF plot for second-order differenced cumulative confirmed cases of SARS CoV-2.
Table 1. ADF unit root tests on log levels of variables.
Source: STATA software.
Figure 3. Residual plots form the ARIMA (3, 2, 1) model total confirmed cases of SARS CoV-2.
Table 2. AIC, MAPE and RMSE values for various ARIMA models applied for cumulative confirmed cases of SARS CoV-2.
Table 3. Performance of ARIMA (3, 2, 1) model with 80% and 95% CI.
4. Discussion and Conclusions
The ARIMA model is one of the most widely used time-series forecasting techniques because of its structured modeling basis and acceptable forecasting performance [14]. In this paper, we applied an ARIMA (p, d, q) model to analyze the surveillance data of SARS CoV-2, infection in Sierra Leone. We have obtained an ARIMA model that closely fits the spread of SARS CoV-2, in Sierra Leone. According to the results above, the conducted model is reliable with high validity. Once a satisfactory model has been obtained, it can be used to forecast expected numbers of cases for a given number of future time intervals [15]. The forecast results suggest that the cumulative confirmed cases of SARS CoV-2, in Sierra Leone will experience strong growth in the next 30 days (22nd January 2020 to 19th February 2022).
As mentioned above, for adequate ARIMA modeling, a time series should be stationary with respect to mean and variance [16]. If the mean increases or decreases over time, or if the variance does, the series may need to be transformed to make it stationary, before being modeled. Otherwise, the prediction effect of the model will be poor. In order to improve the model, updating the forecasts is very important. A model without seasonal terms will need to be updated frequently. Confidence intervals that widen rapidly as time increases from the starting point of the forecasts also indicate a model that needs frequent updating. Generally speaking, there are two ways to implement the update. The model can be reapplied to the original series with extra observations added at the end to give forecasts based on a later starting point. Alternatively, a new model can be fitted to the longer series. This is probably preferable, since fitting a model is quick, especially when the old model is used as a guide, and it makes better use of the additional observations.
Government of Sierra Leone through the National SARS CoV-2. Emergency Operations Center (NACOVAC) can apply the forecasted trend of much more spread to make more informed decisions on the additional measures in place to curb the spread of the virus. Application of the model can also assist in studying the effectiveness of the lockdown on the spread of SARS CoV-2 in Sierra Leone.