Data Visualization and Analysis Part I

Data Visualization and Analysis Part I

Data Visualization is an important factor in data science to effectively communicate insights. Visualizing is far easier to understand than a complex explanation, especially when we have a comprehensive data. It is basically creating or generating graphical representation of the information or the data. These graphical representations are often known to be plots or charts. Selecting the best suitable representation or visualization is a crucial part of data visualization, since different visualizations serve different needs of analysis.

Why Data Visualization is Important…

Data visualization plays a crucial role in data science for several reasons. It serves as a means to effectively convey results and discoveries, monitor model performance during evaluation, fine-tune hyperparameters, detect outliers during data cleaning, and validate assumptions made by the model. Additionally, it facilitates the identification of trends, patterns, and correlations among features within the dataset.

How to make Visualization Effective?

  • The visualization goal should be clear
  • Data should be easily understandable
  • The visualizations should match the business problem
  • The plots should be interactive and user-friendly
  • It should highlight important information
  • The graphs should be visually aesthetic
  • It should summarize all are information precisely

Visualizations of 1D/2D (Univariate/Bivariate Data)

Univariate Data

Univariate Data is generally 1D data with single variables.

Histograms

Histograms are a popular visualization for 1D Data Analysis because:

  • It shows center, variability, skewness and modality
  • Identifies outliers
  • Shows bin width

However, there are some issues with the Histogram:

  • For small datasets, histograms can be misleading
  • For large data sets, histograms can be quite effective at illustrating general properties of the distribution
  • Histograms effectively only work with 1 variable at a time

Boxplots

Boxplots are another very popular visualization for 1D data analysis because it shows more information including Median, Interquartile ranges, Outliers, Range and skewness, however, it has some drawbacks as well, such as it sometimes does over plotting. It is also difficult to tell the distributional shape, and has no standard implementation in software.

Bivariate Data

Bivariate data is 2D data with two variables. It has nominal and quantitative data. It can be either Nominal x Nominal, Quantitative x Quantitative or Nominal x Quantitative. For this particular type, mostly different types of scatterplots such as Heteroscedastic Scatterplots, etc. are used for better representation of data. However, if we have larger datasets, we can use other representation such as contour plots.

There are other ways to represent 2D data as well, using line graphs, bar charts, stacked bar charts, etc. These ways of visualizing can enhance the understandability of data it is representing.

Multi-dimensional Data Visualization

Earlier, we discussed about the simple 1D and 2D data visualization techniques, and the way they proved to be helpful for showing good visualization for analysis. However, if we have a data which is multi-dimensional, we need to make the visualizations a bit complex in 3D or 4D shape. We can do this for scatterplots and bar charts, however it sometimes gets difficult to understand, so for that we have more techniques for visualizations which would be helpful in explaining the data with higher dimensions.

Glyphs

Glyphs are one way to represent the data. They show the data in form of symbols, or different visual patterns. Glyphs can help make visualizations more engaging, easier to understand, and can add an extra layer of information. Few examples include Chernoff Faces, Stars, Arrow directions, etc.

Trellis Plots

A trellis plot subdivides space to enable comparison across multiple plots. Typically, nominal and ordinal variables are used as dimensions for the subdivision. Below is an example Trellis plot in which we can see the data about two major political parties of the USA, Democrats and Republicans. The plot shows the distribution of male and female voters region-wise and age-wise. Since, multiple dimensions are being used, hence, a trellis plot is giving a better visualization of the voting trends of the different regions of the USA.

Small Multiples

Small multiples are another way of visualizing the data with multiple dimensions. In that, suppose the same data of the voters is analyzed, it cab plotted in multiple small plots showing data of each dimension in a different plot small plot. An example is shown below.

Difference between Trellis Plots and Small Multiples

Both are same:

  • series of similar graphs or charts
  • easy comparisons
  • It uses multiple views to show different partitions of a dataset using the same scale and axes

Concluding

Above, we discussed some visualizations for univariate, bivariate and multi-variate data, and how we can conclude results of analysis from such visualizations. These provide an better understanding of the data. Data Visualization will be further discussed in the future articles/tutorials as well. We hope that this provides you with an ample understanding of the basic Exploratory Data Analysis and Visualization. We’ll further dive into it, along with Tree Visualizations, Graphs and Networks, etc.


I hope you enjoyed the articles, for feedback email us at immadshahid@gmail.com or write in the comments below.

To view or add a comment, sign in

More articles by Immad S. Qureshi

  • Visualization of Graphs and Networks-Part II

    Visualization of Graphs and Networks-Part II

    As we discussed the basics of graphs and networks in the previous tutorial, that is why graphs are important for…

    1 Comment
  • Visualization of Graphs and Networks – Part I

    Visualization of Graphs and Networks – Part I

    Up till now, in Data Visualization and Analysis, we have discussed data in which data points were independent of one…

  • Machine Learning- An Introduction

    Machine Learning- An Introduction

    Machine Learning is a branch of AI and Data Science that mainly focuses on using data and algorithms to simulate how…

  • Education: Priority for a Nation

    Education: Priority for a Nation

    Education is the most important pillar for any Nation for its uplift. If any Nation wants to grow stronger and compete…

    2 Comments
  • Russia-Ukraine Crisis and the World Affairs

    Russia-Ukraine Crisis and the World Affairs

    Since last month, the tensions between the Russian Federation and Ukraine were rising due to a border crisis. There…

  • Education: Priority for a Nation

    Education: Priority for a Nation

    Education is the most important pillar for any Nation for its uplift. If any Nation wants to grow stronger and compete…

  • Media in the old and new era: What’s different?

    Media in the old and new era: What’s different?

    For the last few hours, I have been going through an Instagram page, which had posts with clips from the 70s, 80s, 90s…

  • Who are we waiting for?

    Who are we waiting for?

    My Motherland, Pakistan came into existence on 14th August 1947, after many sacrifices. She was a result of the…

  • Why Sports are Neglected?

    Why Sports are Neglected?

    Sports is one of the most important activities for an individual. It physically keeps the body fit.

    1 Comment
  • Drug Abuse: Sickness not Pleasure

    Drug Abuse: Sickness not Pleasure

    Drug Abuse is becoming a major problem in our society. It is a sickness in society.

Insights from the community

Others also viewed

Explore topics