TITLE:
Improve Data Quality by Processing Null Values and Semantic Dependencies
AUTHORS:
Houda Zaidi, Faouzi Boufarès, Yann Pollet
KEYWORDS:
Data Quality, Big Data, Contextual Semantics, Semantic Dependencies, Functional Dependencies, Null Values, Data Cleaning
JOURNAL NAME:
Journal of Computer and Communications,
Vol.4 No.5,
May
26,
2016
ABSTRACT:
Today, the quantity
of data continues to increase, furthermore, the data are heterogeneous, from
multiple sources (structured, semi-structured and unstructured) and with
different levels of quality. Therefore, it is very likely to manipulate data
without knowledge about their structures and their semantics. In fact, the
meta-data may be insufficient or totally absent. Data Anomalies may be due to
the poverty of their semantic descriptions, or even the absence of their
description. In this paper, we propose an approach to better understand the
semantics and the structure of the data. Our approach helps to correct
automatically the intra-column anomalies and the inter-col- umns ones. We aim
to improve the quality of data by processing the null values and the semantic
dependencies between columns.