Open In App

What is Univariate, Bivariate & Multivariate Analysis in Data Visualisation?

Last Updated : 07 Nov, 2022
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Data Visualisation is a graphical representation of information and data. By using different visual elements such as charts, graphs, and maps data visualization tools provide us with an accessible way to find and understand hidden trends and patterns in data.

In this article, we are going to see about the univariate, Bivariate & Multivariate Analysis in Data Visualisation using Python.

Univariate Analysis

Univariate Analysis is a type of data visualization where we visualize only a single variable at a time. Univariate Analysis helps us to analyze the distribution of the variable present in the data so that we can perform further analysis. You can find the link to the dataset here.

Python3




import pandas as pd
import seaborn as sns
data = pd.read_csv('Employee_dataset.csv')
print(data.head())


Output:

 

Histogram

Here we’ll be performing univariate analysis on Numerical variables using the histogram function.

Python3




sns.histplot(data['age'])


Output:

 

Bar Chart

Univariate analysis of categorical data. We’ll be using the count plot function from the seaborn library

Python3




sns.countplot(data['gender_full'])


Output:

 

The Bars in the chart are representing the count of each category present in the business travel column.

Pie Chart

A piechart helps us to visualize the percentage of the data belonging to each category.

Python3




x = data['STATUS_YEAR'].value_counts()
plt.pie(x.values,
        labels=x.index,
        autopct='%1.1f%%')
plt.show()


Output:

 

Bivariate analysis

Bivariate analysis is the simultaneous analysis of two variables. It explores the concept of the relationship between two variable whether there exists an association and the strength of this association or whether there are differences between two variables and the significance of these differences.

The main three types we will see here are:

  1. Categorical v/s Numerical 
  2. Numerical V/s Numerical
  3. Categorical V/s Categorical data

Categorical v/s Numerical

Python3




import matplotlib.pyplot as plt
plt.figure(figsize=(15, 5))
sns.barplot(x=data['department_name'], y=data['length_of_service'])
plt.xticks(rotation='90')


Output:

 

Here the Black horizontal line is indicating huge differences in the length of service among different departments.

Numerical v/s Numerical

Python3




sns.scatterplot(x=data['length_of_service'],
                y=data['age'])


Output:

 

It displays the age and length of service of employees in the organization as we can see that younger employees have less experience in terms of their length of service.

Categorical v/s Categorical

Python3




sns.countplot(data['STATUS_YEAR'],
              hue=data['STATUS'])


Output:

 

Multivariate Analysis

It is an extension of bivariate analysis which means it involves multiple variables at the same time to find correlation between them. Multivariate Analysis is a set of statistical model that examine patterns in multidimensional data by considering at once, several data variable.

PCA

Python3




from sklearn import datasets, decomposition
iris = datasets.load_iris()
X = iris.data
y = iris.target
pca = decomposition.PCA(n_components=2)
X = pca.fit_transform(X)
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y)


Output:

 

HeatMap

Here we are using a heat map to check the correlation between all the columns in the dataset. It is a data visualisation technique that shows the magnitude of the phenomenon as colour in two dimensions. The values of correlation can vary from -1 to 1 where -1 means strong negative and +1 means strong positive correlation.

Python3




sns.heatmap(data.corr(), annot=True)


Output:

 



Next Article

Similar Reads

three90RightbarBannerImg
  翻译: