What is Univariate, Bivariate & Multivariate Analysis in Data Visualisation?
Last Updated :
07 Nov, 2022
Data Visualisation is a graphical representation of information and data. By using different visual elements such as charts, graphs, and maps data visualization tools provide us with an accessible way to find and understand hidden trends and patterns in data.
In this article, we are going to see about the univariate, Bivariate & Multivariate Analysis in Data Visualisation using Python.
Univariate Analysis
Univariate Analysis is a type of data visualization where we visualize only a single variable at a time. Univariate Analysis helps us to analyze the distribution of the variable present in the data so that we can perform further analysis. You can find the link to the dataset here.
Python3
import pandas as pd
import seaborn as sns
data = pd.read_csv( 'Employee_dataset.csv' )
print (data.head())
|
Output:
Histogram
Here we’ll be performing univariate analysis on Numerical variables using the histogram function.
Python3
sns.histplot(data[ 'age' ])
|
Output:
Bar Chart
Univariate analysis of categorical data. We’ll be using the count plot function from the seaborn library
Python3
sns.countplot(data[ 'gender_full' ])
|
Output:
The Bars in the chart are representing the count of each category present in the business travel column.
Pie Chart
A piechart helps us to visualize the percentage of the data belonging to each category.
Python3
x = data[ 'STATUS_YEAR' ].value_counts()
plt.pie(x.values,
labels = x.index,
autopct = '%1.1f%%' )
plt.show()
|
Output:
Bivariate analysis
Bivariate analysis is the simultaneous analysis of two variables. It explores the concept of the relationship between two variable whether there exists an association and the strength of this association or whether there are differences between two variables and the significance of these differences.
The main three types we will see here are:
- Categorical v/s Numerical
- Numerical V/s Numerical
- Categorical V/s Categorical data
Categorical v/s Numerical
Python3
import matplotlib.pyplot as plt
plt.figure(figsize = ( 15 , 5 ))
sns.barplot(x = data[ 'department_name' ], y = data[ 'length_of_service' ])
plt.xticks(rotation = '90' )
|
Output:
Here the Black horizontal line is indicating huge differences in the length of service among different departments.
Numerical v/s Numerical
Python3
sns.scatterplot(x = data[ 'length_of_service' ],
y = data[ 'age' ])
|
Output:
It displays the age and length of service of employees in the organization as we can see that younger employees have less experience in terms of their length of service.
Categorical v/s Categorical
Python3
sns.countplot(data[ 'STATUS_YEAR' ],
hue = data[ 'STATUS' ])
|
Output:
Multivariate Analysis
It is an extension of bivariate analysis which means it involves multiple variables at the same time to find correlation between them. Multivariate Analysis is a set of statistical model that examine patterns in multidimensional data by considering at once, several data variable.
PCA
Python3
from sklearn import datasets, decomposition
iris = datasets.load_iris()
X = iris.data
y = iris.target
pca = decomposition.PCA(n_components = 2 )
X = pca.fit_transform(X)
sns.scatterplot(x = X[:, 0 ], y = X[:, 1 ], hue = y)
|
Output:
HeatMap
Here we are using a heat map to check the correlation between all the columns in the dataset. It is a data visualisation technique that shows the magnitude of the phenomenon as colour in two dimensions. The values of correlation can vary from -1 to 1 where -1 means strong negative and +1 means strong positive correlation.
Python3
sns.heatmap(data.corr(), annot = True )
|
Output: