A Multilayer Perceptron Artificial Neural Network Study of Fatal Road Traffic Crashes ()
1. Introduction
According to the recent report of the World Health Organization (WHO), each year, about 1.19 million, which is equivalent to 15 traffic deaths per 100,000 people, die due to road traffic crashes and cost 3% of most countries’ GDP [1]. This report indicated that road traffic accidents are the leading cause of death for young adults (5 - 29 years of age), and two-thirds of deaths for people of age between 18 - 59 years are due to road traffic fatalities.
An artificial neural network (ANN) is a powerful and proven technique to model the relationship between input and dependent or output variables. It is inspired by the complex neural structure of a brain with a group of interconnected arrays consisting of input layers, one or more intermediate layers called hidden layers, and output layer/s [2]-[8]. A multilayer perceptron neural network is one of the most commonly used neural network techniques with a high estimate capacity and high-performing model accuracy [3] [9]. It is a mathematical function used to process the input variables and fatal road traffic crash predictors in this study and predict the output, which is the mean crash rate. Artificial neural networks are a type of Artificial Intelligence (AI), that demonstrates the capability of modeling the nonlinear relationship between factors. Neural network models appear to perform better in understanding the effects of primary contributing factors influencing traffic safety [3]. Prior studies suggested that, as compared with the traditional models, a machine learning approach based on ANN are one of the best techniques for predicting road traffic flow and accidents [9]-[13].
Several factors influence the likelihood of fatal and nonfatal traffic accidents. The variables include human factors (age, gender, alcohol and drug use, distraction including cellphone use, and fatigue), the prevailing environmental factors, time of the day and day of the week, pavement conditions, roadway designation (urban versus rural), speed, number of lanes, traffic volume, vehicle condition, roadside obstructions, and others. These variables appear to have complex nonlinear relationships. Various traffic accident analysis and modeling techniques have been explored for the past decades. The present study is motivated by the advent and popularity of Artificial Intelligent (AI) techniques, such as ANN, which are proven to be a better alternative for modeling and analysis of big data, such as traffic accident records, and the inherent complex nonlinear relationship between the potential predictors and the dependent variable, which in this particular study is fatal crashes. The study’s primary objective in using the MLANN approach is to identify the primary factors contributing to traffic crashes on urban and rural interstate highway segments. Hence, the present study utilizes a multilayer perceptron artificial neural network approach to identify the primary factors contributing to fatal road traffic crashes and optimize the models by using the variables selected based on the importance factor and testing the effects of various combinations of the hidden and output activation functions.
2. Literature Review
The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire journals, and not as an independent document. Please do not revise any of the current designations. The application of ANN for traffic safety studies has been explored [3] [9]-[26]. To identify suitable models for predicting traffic accident severity, Kunt et al. [13] compared the results of models constructed by using ANN, Genetic Algorithm (GA), and a combination of Pattern Search and Genetic Algorithm (GA-PS). The study used twelve predictive variables, including drivers’ age, gender, seatbelt use, weather, road surface conditions, and the type of crash, the time of the day, and traffic flow. The dependent variables were three levels of injury severity. The result of the study suggested that the ANN model performed better than the other two models tested. Using a dozen of traffic accident predictors and three levels of injury severity as dependent variables on 1000 traffic crash records in Tehran, Iran, Kunt et al. [13] compared the effectiveness of ANN, Genetic Algorithm (GA), and a combination of Genetic Algorithm and Pattern Search (GA-PS) models in predicting traffic accident severity. The results of the study suggested that the ANN model performed better than the other two.
An ANN-based model called driving behavior risk prediction neural network was formulated (DBRPNN) to study drivers’ status and daily driving behavior and predict distracted drivers’ risk [12]. Using traffic accident data collected from Madrid City in Spain, a deep learning-based traffic accident forecasting model called exogenous spatiotemporal neural network (XSTNN) was also proposed [14]. This model was based on spatiotemporal data, which is an extension of the spatiotemporal neural network (STNN) proposed earlier by Delasalles et al. [15] with additional potential predictors including environmental, roadway classification (urban & rural), and traffic variables [14]. Bharti Sharma et al. used back propagation ANN for a traffic management study during short-term signal control during potential congestion and explored the potential application of the technique for diverting traffic flow during the predicted peak flow [16]. The study used 12 variables and a randomized dataset divided into training (70%) and testing (30%), and developed an ANN model for predicting short-term traffic flow, ranging from 5 - 30 minutes.
3. Data Description & Methodology
The interstate highway fatal crash dataset used in this study is part of the Highway Safety Information System (HSIS), collected by the U.S. Department of Transportation, Federal Highway Administration (FHWA). For a given period of time (N), the fatal crash counts (C) on individual roadway segments (SL) are converted to mean crash rate (MCR) per 100 million vehicles-miles of travel (VMT) [27]. That is,
,
where VMT = AADT × 365 × SL, n is the number of years of the data, SL is segment length, annual average daily traffic (AADT) [28] [29].
The ANN framework used in the study is the multilayer feedforward neural network consisting of three neuron layers: input, hidden, and output. The approach involves fatal crash data partitioning for training and testing, evaluating the model performance using the R-squared, sum of square error (SSE), and achieving accuracy of the model. The results of the predicted values and the dependent variable (MCR) are plotted to determine the R-squared value for all the models formulated. Based on the critical review conducted over 170 articles published for about a decade between 2010 and 2020 primarily focusing on solid-waste and waste management-related research, Xu et al. [2] reported that a single hidden layer with 4 - 20 hidden layer nodes is sufficient. Over 50% of the reviewed articles used data sizes between 100 and 150 and suggested that the dataset is sufficient to optimize the AAN model, but the more data sizes, the better the effectiveness of the models developed. Hence, in the current study, the data size, which is nearly 3000 fatal crash data, is believed to be sufficient to use the ANN approach to analyze the data, identify the critical factors, and formulate a forecasting model. Each variable adds layers of complexity to the analysis and determination of traffic accidents. Hence, the study optimized the model using the variable importance factor. The structure of MLPANN, showing input, hidden, and output layers, has been presented in several textbooks and research articles published over the last several years, including Xu et al. [2]-[8].
The fatal crash dataset used for this study was collected from 2873 roadway segments. The data is partitioned into training and testing for neural network analysis and model building. Among the original dataset, about 70% of the data (2011 records) were used for the training process, and the remaining 30% (862 records) were used for testing and validating the model. The training dataset is used to train and adjust the weights and associated biases for optimizing the model fit, and the testing data is used to evaluate the performance of the fatal traffic crash MLPANN model. The optimization includes identifying predictors based on the importance factor of the variables, varying the number of hidden layer nodes (one or two), and testing the efficiency of both hyperbolic and sigmoid activation functions for the input layer.
The 12 variables used for developing the ANN model are segment length, annual average daily traffic (AADT), day of the week, lighting condition, contour of the roadway (characteristics of the roadway), weather, pavement surface, drivers’ age, drivers’ gender, median type, number of lanes, and roadway classification or designations. The details of the variables and the categories are summarized in Table 1 and Table 2. The crash dataset was randomized and then divided into training and testing datasets. Typically, about 70% of the crash dataset was used for training, and about 30% was used for testing the models. To optimize the nonlinear relationship between the input and output layers and the neural network analysis and select the best-performing model architecture and the network parameters for training and testing, the study begins with 12 variables and one hidden layer. During the training process, the Sum of Square Error (SSE) was used as the stopping criterion of the ANN analysis. Both hyperbolic tangent and sigmoid activation functions have been tested. The performance of the ANN model was evaluated with an R2 (coefficient of determination) and sum of square error. The R2 is called the coefficient of determination, which measures how well the regression line fits the dataset. An R2 value closer to 1 indicates a better prediction model. A lower sum of square error suggests a better model.
Table 1. Descriptive statistics of the dependent variable (mean crash rate, MCR) and continuous independent variables (segment length and Ln(AADT)) were used in the MLAAN study of fatal road traffic crashes on four- and six-lane interstate urban and rural highways.
Variables |
Min. |
Max. |
Mean |
St. Dev. |
Skewness |
Kurtosis |
Fatal crashes, N* = 2873 |
MCR |
1 |
1915 |
21.34 |
89.15 |
14.98 |
253.33 |
SL |
0.02 |
3.9 |
1.3 |
1.28 |
1.49 |
2.65 |
AADT |
4770 |
172000 |
60231.78 |
57030.00 |
0.326 |
0.091 |
Ln (AADT) |
8.47 |
12.06 |
10.87 |
0.55 |
−0.55 |
0.56 |
N* = Number of segments.
Table 2. Categorical variables were used to analyze fatal road traffic crashes on four- and six-lane Interstate urban and rural highways.
Variables |
Fatal Crashes |
N |
Percent |
Days of the week |
Weekday |
2221 |
77.3% |
Weekend |
652 |
22.7% |
Lighting condition |
Light |
1943 |
67.60% |
Dark |
930 |
32.40% |
Contour of roadway |
Straight-Level |
1706 |
59.40% |
Straight-Grade |
498 |
17.30% |
Curve-Level |
269 |
9.40% |
Curve-Grade |
400 |
13.90% |
Weather condition |
Normal |
2371 |
82.50% |
Rain |
319 |
11.10% |
Snow |
183 |
6.40% |
Pavement surface condition |
Dry |
2166 |
75.40% |
Wet |
428 |
5.30% |
Snow |
1667 |
19.30% |
Drivers’ age |
Younger (16 - 24) |
638 |
22.20% |
Middle age (25 - 64) |
2045 |
71.20% |
Older (65 & older) |
190 |
6.60% |
Drivers’ gender |
Female |
995 |
34.60% |
Male |
1878 |
65.40% |
Median type |
Protected |
1443 |
49.90% |
Unprotected |
1440 |
50.10% |
No. of lanes |
Four lane |
1374 |
47.80% |
Six lane |
1499 |
52.20% |
Roadway Designation |
Urban |
2209 |
76.90% |
Rural |
664 |
23.10% |
The performance of the ANN model primarily relies on the characteristics of the activation function used [8]. To optimize the model, and select the appropriate activation function, both hyperbolic tangent and sigmoid activation functions with one and two hidden layers are tested. Based on the initial model outputs of the 12 parameters, further optimization was done by altering the parameters and variables, including omitting variables with lower importance factor and selecting the hidden layer and activation functions. The descriptive statistics of continuous and dependent variables are summarized in Table 1. Table 2 summarizes the categorical variables used in the ANN study of fatal road traffic crashes on interstate highways and the percentage of fatal crashes observed across each category of the independent variables.
The original HSIS dataset consists of five different sub-files (vehicle, occupant, grade, and roadway data by the crash year) containing various related variables. The accident data sub-file includes the specific location of the incident, classification of the roadway, day and time of the accident, weather, lighting condition, pavement surface condition (dry, wet, and snow), the accident type (head-on, rear-end, etc.), the crash severity. The curve and grade sub-file primarily has the grade and the directions of the curves. The occupant file includes the driver’s age, gender, seating position, and the occupant’s injury. The roadway sub-file includes the number of lanes, traffic volume (AADT), pavement width, median type, and access control. The vehicle data sub-file contains variables related to the vehicle condition and the damage scale. The variables in these sub-files are linked and merged using the accident case number and vehicle number, which are both unique to a particular crash recorded. Once the individual file variables were properly merged, the fatal crash data recorded on four and six-lane urban and rural interstate highways were filtered and used for the study.
The reasons for selecting the twelve predictive variables identified and used in this study are the earlier studies that explored the various levels of significance of these factors in contributing to the likelihood of occurrences of traffic accidents [29] [31]. Traffic volume and segment length have been widely used as explanatory variables in traffic crash modeling and analyses [29]-[31]. For instance, focusing on the effects of highway cross-section design elements (median width, lane and shoulder width, type of the median and the pavement friction course, speed limit, AADT, and segment length) on the traffic crashes on highways in the State of Florida, Hadi et al. [30] developed models to predict injury and fatal crashes. The study suggested that AADT and segment length were the primary predictors of crash frequencies. Caliendo et al. [31] reported that segment length, AADT, and wet pavement surface were the main factors that contributed to severe crashes on four-lane rural roads in Italy. In a study conducted on seven different urban and rural interstate highways, including crashes recorded on interchange and non-interchange in the state of Washington, Venkataraman et al. [32] found that the significant factors linked with the traffic crashes were median lighting, roadway curvature, and traffic volume (AADT). The study by Anastrasopoulous et al. [33] used five years of traffic crash data collected on interstate highways in the state of Indiana and over 15 variables, including traffic, geometric elements, and pavement condition. The research findings showed median type and width, vertical and horizontal curves, pavement conditions, and traffic characteristics were critical factors causing the crashes. Drivers’ age and gender [29] [34], and other factors such as time of the day [35] have also been found as significant variables associated with injury and fatal traffic crashes.
Hidden layers and activation functions are critical in determining the performance and accuracy of artificial neural network architecture. Activation functions are transfer functions, which are mathematical expressions used to introduce nonlinearity into the neural network architecture. Prior studies suggested that although it varies across the different fields of study, the optimum number of hidden layers is adequate for training the model, resulting in improved accuracy for the model developed [8]. The number of hidden layers selected may also have an adverse effect in overfitting (where the number of hidden layers selected exceeds the level of complexity of the network due to overtraining), or under-fitting (where the number of hidden layers selected subceed the level of complexity of the network, resulting undertraining of the network and inaccurate result.
4. Results and Discussions
ANN is a self-learning mathematical method (algorithm) inspired by the human brain to process information and analyze complex nonlinear relationships between variables, process the information, and obtain valuable information from the dataset. As with any other statistical modeling technique, there is a need to refine the ANN model by modifying the parameters and predictors considered. The present study employed twelve potential predictive variables listed in Table 1 and Table 2. The models were constructed on the 2873 fatal road traffic crash dataset recorded on four and six-lane highways in the State of Ohio, using multilayer perceptron feedforward artificial neural network architecture. Both hyperbolic tangent and sigmoid single and double hidden layer activation functions were used. The achieved accuracy, the R-squared values, and the sum of square error (SSE) functions were used to compare the best-fit ANN model. To improve/test the performance of the ANN model, only six predictors were identified as important independent variables, and further analyses were conducted with the significant factors.
As discussed in Section 3, the MLANN model formulation began with the initial 12 variables, a single hidden layer, and hyperbolic tangent and identity activation functions for the hidden layer and output layer, respectively. Keeping the 12 variables and a single hidden layer, the input and output activation functions are systematically changed, which resulted in six different MLANN models. The analysis and model formulation is repeated with two hidden layers. The performance of the 12 different models developed with the initial 12 predictors and single and double hidden layers are evaluated with the SSE and R-squared values (Table 3). As can be seen, for the model formulated with a single hidden layer, the hyperbolic tangent and sigmoid activation functions used for the hidden and output layers, respectively, is the better performing model with higher achieved accuracy, lower Sum of Square Error (SSE) and higher R-squared value. In the case of the double hidden layers, the model developed using the sigmoid activation function for both hidden and output layers appears to be the best-performing MLANN model to predict fatal crashes with higher overall achieved accuracy, lower SSE, and higher R-squared values. One of the benefits of using ANN for traffic safety studies is its capability to identify the level of importance of each one of the predictors used as input variables. Twenty-seven units were developed for all the models with single and double hidden layers, excluding the bias unit. Excluding the bias units, for a single hidden layer, there were nine neurons, and for the double hidden layers, the neural network architecture consisted of nine and seven neurons in the first and second layers, respectively.
Table 3. The results of multilayer perceptron ANN models formulated with one and two hidden layers for both hyperbolic tangent and the sigmoid activation functions using 12 predictors.
Hidden Layers |
Activation Function |
Achieved Accuracy |
Sum of Squares Error |
R-Squared |
Input |
Output |
Training |
Testing |
Training |
Testing |
|
1 |
Tanh |
Identity |
84.1 |
77.3 |
157.7 |
188.9 |
0.8105 |
Tanh |
Tanh |
86.3 |
64.2 |
1.312 |
1.028 |
0.817 |
Tanh |
Sigmoid |
93.7 |
79.1 |
0.186 |
0.286 |
0.8939 |
Sigmoid |
Identity |
18.8 |
20.2 |
803.2 |
197.7 |
0.1957 |
Sigmoid |
Tanh |
64.6 |
67 |
3.16 |
1.17 |
0.656 |
Sigmoid |
Sigmoid |
67.5 |
50.7 |
0.723 |
0.44 |
0.6272 |
2 |
Tanh |
Identity |
72.1 |
91.4 |
282 |
26.6 |
0.8005 |
Tanh |
Tanh |
78.3 |
81.1 |
2.745 |
0.89 |
0.803 |
Tanh |
Sigmoid |
85.3 |
35.6 |
0.35 |
0.263 |
0.8032 |
Sigmoid |
Identity |
82.6 |
81.6 |
172.4 |
114.06 |
0.8228 |
Sigmoid |
Tanh |
84.9 |
56 |
1.71 |
2.674 |
0.7598 |
Sigmoid |
Sigmoid |
93.9 |
87.5 |
0.158 |
0.064 |
0.9387 |
Based on the independent variable importance factor results, the top seven predictors, considered more important than the other five, are identified and used for further analysis. The variables selected are segment length, annual average daily traffic (AADT), contour of the roadway, weather, pavement surface, drivers’ age, and roadway designations. Using these seven predictors, and the procedures described above, an additional 12 MLANN models are developed and summarized in Table 4. For the model formulated with a single hidden layer, the hyperbolic tangent and sigmoid activation functions used for the hidden and output layers, respectively, is the better performing model with higher achieved accuracy, lower Sum of Square error, and higher R-squared value, which is similar to the result achieved with the 12 predictors. In the case of the double hidden layers, the model developed using the hyperbolic tangent hidden layer and sigmoid activation function for the output layer resulted in the best-performing MLANN model to predict fatal crashes with higher overall achieved accuracy, lower SSE, and higher R-squared values. The MLANN models were developed with the selected seven variables, and there were seventeen units for both single and double hidden layers, excluding the bias unit. Excluding the bias units, for a single hidden layer, there were eight neurons, and for the double hidden layers, the neural network architecture consisted of eight and six neurons in the first and second layers, respectively.
Table 4. The results of multilayer perceptron ANN models were formulated with one and two hidden layers for both hyperbolic tangent and the sigmoid activation functions using seven predictors selected based on their level of importance.
Hidden Layers |
Activation Function |
Achieved Accuracy |
Sum of Squares Error |
R-Squared |
Input |
Output |
Training |
Testing |
Training |
Testing |
|
1 |
Tanh |
Identity |
85.4 |
89.0 |
145.8 |
34.00 |
0.8733 |
Tanh |
Tanh |
83.7 |
80.1 |
1.247 |
0.856 |
0.8233 |
Tanh |
Sigmoid |
94.4 |
66.6 |
0.160 |
0.507 |
0.8488 |
Sigmoid |
Identity |
13.3 |
15.7 |
870.24 |
164.84 |
0.1425 |
Sigmoid |
Tanh |
78.9 |
76.7 |
1.878 |
0.832 |
0.7930 |
Sigmoid |
Sigmoid |
75.8 |
77.1 |
0.737 |
0.682 |
0.7980 |
2 |
Tanh |
Identity |
80 |
88.4 |
197 |
44.20 |
0.8240 |
Tanh |
Tanh |
74 |
61.5 |
2.623 |
0.913 |
0.7354 |
Tanh |
Sigmoid |
83.9 |
86.9 |
0.445 |
0.208 |
0.8511 |
Sigmoid |
Identity |
28.2 |
32.1 |
728.8 |
96.2 |
0.4035 |
Sigmoid |
Tanh |
77.8 |
78 |
1.361 |
1.399 |
0.7829 |
Sigmoid |
Sigmoid |
80.2 |
78.5 |
0.488 |
0.141 |
0.8033 |
5. Conclusions
The study used the MLAAN approach to develop optimized models for predicting fatal crashes on four and six-lane interstate highways. The analysis began using two continuous and nine categorical predictive factors as input variables and a single hidden layer with hyperbolic and identity activation factors for hidden and output layers. The hidden and output activation factors and the number of hidden layers were methodically altered to optimize the models. A total of twelve MLANN models were developed using the initial twelve predictors. Based on the results of the variable importance factors, seven independent variables with higher importance factors were selected and used for further analysis and model formulations. Table 3 and Table 4 show the results of the MLANN study, the accuracy achieved, the R-squared values, and the SSE functions used to evaluate the performance of the twenty-four models developed. From the results presented in Table 3 and Table 4, it is possible to suggest that the hyperbolic tangent and sigmoid activation functions used for the hidden and output layers, respectively, are the better-performing models for the three modeling scenarios: twelve variable and single hidden layer (R-squared = 0.8939), seven variables single hidden layer (R-squared = 0.8488), and seven variables two hidden layers (R-squared = 0.8511). The other model with the higher R-squared value (0.9387) and lower SSE function was developed using twelve variables, two hidden layers, and a sigmoid activation function for both hidden and output layers.
Acknowledgements
This work is partly supported by the NASA-MUREP program through the Federal Award No. NASA-80NSSC21M0307.
Author Contributions
L.T. and O. A. are undergraduate students at Alabama State University who participated in summer research internship program at Alabama A&M University, supported by the NASA-MUREP program. A.K. originated the idea, and E.P. and A.K. analyzed the results and wrote the manuscript.
Disclaimer
The views and conclusions in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied by NASA.