Prediction of Sewer Pipe Main Condition Using the Linear Regression Approach ()
Received 16 March 2016; accepted 16 May 2016; published 19 May 2016
1. Introduction
Sewer pipelines are built to transport sewerage from all areas to the treatment plant eventually. The pipes are designed for a given life span, however the pipe deterioration usually follows no specific trend line. It is always challenging to predict the deteriorating process due to various involved factors, which are usually difficult to obtain relevant data. Those factors are categorized into three groups by Chughtai and Zayed [1] : physical factors, environmental factors, and functional factors. The physical factors include pipe diameter, age, length, and gradient. The environmental factors refer to the pipe surroundings condition, such as supporting soil, bedding condition, ground water level, and traffic volume. The functional factors are related to the maintenance strategies that are adopted by the given municipality.
The pipe conditions are obtained via various approaches. Some of available inspection techniques include: man-walk though, ultrasonic, focused electrode leak location (FELL), sewer scanner and evaluation technology (SSET), laser-based scanning system, and CCTV. The most popular method is the CCTV application. For the data used in this study, the condition data are primarily based on information obtained via CCTV. During the condition determination process, experts’ opinions are often adopted to rate a pipe section’s condition by combining all information. For the data used in this study, the pipe conditions are classified into 5 categories, as shown in Table 1.
Research regarding the sewer pipe condition deterioration has been widely studied in recent years. There are various models applied and developed in the condition prediction. For example, Moselhi and Shehab [2] classified the defects in sewer pipes using the neutral networks. Ariaratnam et al. [3] assessed the infrastructure inspection needs using logistic models. By using a set of condition data related to sewer pipes, they found that the model could be successfully used to predict system conditions. The most popular method used is the Markov chains model, which has been widely applied in the condition evaluation of all infrastructures. For example, Jin and Mukherjee [4] explored the Markov chain application in modeling facility deterioration by proposing multiple methods to estimate the condition transition probabilities. They further evaluated the sensitivity of the model by using data obtained via simulations in Matlab. They concluded that the Markov model has very good robustness [5] . The findings were later applied in the life cycle analysis [6] . Similar studies regarding the Markov chains model can also be found in [7] - [10] . Other studies related to sewer system include the failure analysis [11] - [13] ; and situational simulation to support decision making [14] .
This study aims to evaluate the significance of variables related to sewer mains. It also tends to predict the pipe condition using the linear regression approach.
2. Methodology
The first step is the initial analysis of the data. The descriptive information includes pipe material, pipe diameter, pipe age, pipe installation depth. The pipe material includes concrete pipe and clay pipe. The pipe diameter ranges between 45 cm to 180 cm. The installation depth ranges between 65 cm to 213 cm. Additional data analysis will be presented in the results section
The second step is the application of regression approach. The general linear model is a linear model specifies the relationship between a dependent variable Y, and as set of predictor variables Xi, therefore, we have
(1)
In the equation, β0 is the regression coefficient for the intercept and βi are the coefficients for the variable Xi. The βi values are obtained by the maximum likelihood (ML) estimation, which is an iterative computational procedure. There are multiple method for the Ml estimate, such as Newton-Raphson and Fisher-Scoring methods [15] .
In terms of statistical significance testing, the test are usually performed via Wald statistic, the likelihood ratio (LR), or a score statistic. Detailed information can be found in McCullagh and Nelder [15] . In order to diagnose the linear model, the residuals are usually used to evaluate the fitting. There are two types of such residuals, Pearson residuals and deviance residuals [15] . The Pearson residuals are based on the difference between the observed data and the predicted values. The deviance residuals are related to the contribution of the observed data to the log-likelihood statistic [15] .
Table 1. Pipe condition description.
3. Results and Discussions
Figure 1 displays the histogram for the pipe age. It can be seen that the age roughly follows a normal distribution. Figure 2 and Figure 3 displays the histogram for pipe diameter and pipe installation depth. Such parameters are usually determined according to the flow capacity needs and location characteristics. No normal distributions are noticed for such two parameters.
Table 2 displays the test results when all variables are included. There is a dummy variable involved in the analysis, which is the pipe material, where 1 refers to the concrete pipe, 2 refers to the clay pipe. By checking the P value in this analysis, it can be seen that the pipe diameter is the least significant factor, followed by the pipe material. The pipe installation depth has a P value of 0.294, which is also greater than the significance level 0.05. It means that the hypothesis of β for the depth equals 0 is accepted. Therefore, the depth is not of significant impact. Both the constant and the pipe age are very significant to the model.
Table 2. Fitting with all variables included.
Figure 2. Histogram of the pipe diameter.
In the following analysis, the constant and pipe age are included in the model. Although the pipe diameter does not seem to be a very significant variable, considering the P value (0.294) is not very great, it is also included in the model. The new fitting results are shown in Table 3. The results verify that both the constant and the pipe age are governing factors. The pipe depth has a P value of 0.256. It also has an impact on the modeling. Table 4 shows the results for the ANOVA test. Such variance analysis also shows the importance of pipe age and the constant in the model fitting.
Figure 4 illustrates the residual analysis. The normal probability plot shows that most points are clustered around the blue line. It indicates that the error terms are approximately normally distributed. Therefore, the assumption of normality is valid. The error terms versus the fitted values figure on the upper right shows that approximately half are above and half are below the zero line. It indicates that the assumption of errors with means of 0 is valid. The histogram is the re-checking of the normality assumption. It fits a normal distribution decently. The figure on the lower right shows that half points are above the line and half points are below the line. It means that the error terms are independent on the time variable.
Therefore, based on the available data, the model generated is
Figure 4. Residual information for the model fitting.
Table 3. Model fitting with constant, pipe age and pipe depth included.
4. Conclusion
This paper develops a model to predict the condition of sewer mains. Based on the available data, it shows that among all available variables, the pipe age is the most significant factor. The pipe installation depth also has an impact in the regression analysis. The pipe material and pipe diameter are found to be less important. The regression generates a very decent model, with R square of 0.87 obtained. The ANOVA analysis re-emphasizes the importance of the pipe age variable. The residual analysis shows that the normality assumption of applying the linear model is valid. Although sewer mains are impacted by various factors which also differ significantly from municipality to municipality. The derived equation may not be directly used in other sewer systems. However, the method used and developed can be applied in the analysis of other sewer mains condition when relevant data are available.