Data Visualization and Analysis Part I
Data Visualization is an important factor in data science to effectively communicate insights. Visualizing is far easier to understand than a complex explanation, especially when we have a comprehensive data. It is basically creating or generating graphical representation of the information or the data. These graphical representations are often known to be plots or charts. Selecting the best suitable representation or visualization is a crucial part of data visualization, since different visualizations serve different needs of analysis.
Why Data Visualization is Important…
Data visualization plays a crucial role in data science for several reasons. It serves as a means to effectively convey results and discoveries, monitor model performance during evaluation, fine-tune hyperparameters, detect outliers during data cleaning, and validate assumptions made by the model. Additionally, it facilitates the identification of trends, patterns, and correlations among features within the dataset.
How to make Visualization Effective?
Visualizations of 1D/2D (Univariate/Bivariate Data)
Univariate Data
Univariate Data is generally 1D data with single variables.
Histograms
Histograms are a popular visualization for 1D Data Analysis because:
However, there are some issues with the Histogram:
Boxplots
Boxplots are another very popular visualization for 1D data analysis because it shows more information including Median, Interquartile ranges, Outliers, Range and skewness, however, it has some drawbacks as well, such as it sometimes does over plotting. It is also difficult to tell the distributional shape, and has no standard implementation in software.
Bivariate Data
Bivariate data is 2D data with two variables. It has nominal and quantitative data. It can be either Nominal x Nominal, Quantitative x Quantitative or Nominal x Quantitative. For this particular type, mostly different types of scatterplots such as Heteroscedastic Scatterplots, etc. are used for better representation of data. However, if we have larger datasets, we can use other representation such as contour plots.
Recommended by LinkedIn
There are other ways to represent 2D data as well, using line graphs, bar charts, stacked bar charts, etc. These ways of visualizing can enhance the understandability of data it is representing.
Multi-dimensional Data Visualization
Earlier, we discussed about the simple 1D and 2D data visualization techniques, and the way they proved to be helpful for showing good visualization for analysis. However, if we have a data which is multi-dimensional, we need to make the visualizations a bit complex in 3D or 4D shape. We can do this for scatterplots and bar charts, however it sometimes gets difficult to understand, so for that we have more techniques for visualizations which would be helpful in explaining the data with higher dimensions.
Glyphs
Glyphs are one way to represent the data. They show the data in form of symbols, or different visual patterns. Glyphs can help make visualizations more engaging, easier to understand, and can add an extra layer of information. Few examples include Chernoff Faces, Stars, Arrow directions, etc.
Trellis Plots
A trellis plot subdivides space to enable comparison across multiple plots. Typically, nominal and ordinal variables are used as dimensions for the subdivision. Below is an example Trellis plot in which we can see the data about two major political parties of the USA, Democrats and Republicans. The plot shows the distribution of male and female voters region-wise and age-wise. Since, multiple dimensions are being used, hence, a trellis plot is giving a better visualization of the voting trends of the different regions of the USA.
Small Multiples
Small multiples are another way of visualizing the data with multiple dimensions. In that, suppose the same data of the voters is analyzed, it cab plotted in multiple small plots showing data of each dimension in a different plot small plot. An example is shown below.
Difference between Trellis Plots and Small Multiples
Both are same:
Concluding
Above, we discussed some visualizations for univariate, bivariate and multi-variate data, and how we can conclude results of analysis from such visualizations. These provide an better understanding of the data. Data Visualization will be further discussed in the future articles/tutorials as well. We hope that this provides you with an ample understanding of the basic Exploratory Data Analysis and Visualization. We’ll further dive into it, along with Tree Visualizations, Graphs and Networks, etc.
I hope you enjoyed the articles, for feedback email us at immadshahid@gmail.com or write in the comments below.