TITLE:
Application of Surface Water Quality Classification Models Using Principal Components Analysis and Cluster Analysis
AUTHORS:
Mohamed Ahmed Reda Hamed
KEYWORDS:
Surface Water, Principal Component Analysis, Cluster Analysis
JOURNAL NAME:
Journal of Geoscience and Environment Protection,
Vol.7 No.6,
June
21,
2019
ABSTRACT:
Water quality monitoring has one of the highest
priorities in surface water protection policy. Many variety approaches are
being used to interpret and analyze the concealed variables that determine the
variance of observed water quality of various source points. A considerable
proportion of these approaches are mainly based on
statistical methods, multivariate statistical techniques in particular. In the
present study, the use of multivariate techniques is required to reduce the
large variables number of Nile River water quality upstream Cairo Drinking
Water Plants (CDWPs) and determination of
relationships among them for easy and robust evaluation. By means of
multivariate statistics of principal components analysis (PCA), Fuzzy C-Means (FCM) and K-means algorithm for clustering
analysis, this study attempted to determine the major dominant factors
responsible for the variations of Nile River water quality upstream Cairo
Drinking Water Plants (CDWPs). Furthermore, cluster analysis classified 21 sampling stations into three clusters based on similarities of water
quality features. The result of PCA shows that 6 principal components contain
the key variables and account for 75.82% of total variance of the study area
surface water quality and the dominant water quality parameters were:
Conductivity, Iron, Biological Oxygen Demand (BOD), Total Coliform (TC),
Ammonia (NH3), and pH. However, the results from both of FCM
clustering and K-means algorithm,
based on the dominant parameters concentrations, determined 3 cluster groups
and produced cluster centers (prototypes). Based on clustering classification,
a noted water quality deteriorating as the cluster number increased from 1 to 3. However the cluster grouping can be used to identify the physical, chemical and
biological processes creating the variations in the water quality parameters.
This study revealed that multivariate analysis techniques, as the extracted
water quality dominant parameters and clustered information can be used in
reducing the number of sampling parameters on the Nile River in a cost
effective and efficient way instead of using a large set of parameters without
missing much information. These techniques can be helpful for decision makers
to obtain a global view on the water quality in any surface water or other
water bodies when analyzing large data sets especially without a priori
knowledge about relationships between them.