Introduction to seaborn! (Part -2)
Hi Guys! Welcome to the next part of the seaborn series.
If you missed Part-1, please make sure you read the previous part because it is the continuous part of seaborn. You can't understand this. click to hit the article 🎯
Here's the small reminder: Seaborn helps to visualize the data effectively, and it has more cool features in it. Every feature (independent variables) needs to visualize properly to get beautiful insights from the data. Seaborn helps to visualize the data statistically, and it has a more flexible function within it, which helps to visualize properly!
In the last article, we had a deeper look at the reg and cat plot, today we will look at hist and violin plot.
Before going into detail about seaborn, let me introduce the useful library. It will help to see the visualization clearly! We have a tool in Ipython that helps to visualize the data clearly because it has the format of SVG (Scalable Vector Graphics) format. The SVG format helps to make the plot clear, and it helps to interpret the image easily. Let me throw the picture before and after applying of SVG format in a notebook.
If you see carefully the image before, the plot looks blurred and is not visible clearly. After applying the SVG module, the plot looks clear and bright! It helps to understand the data more clearly.
Code to activate SVG in a notebook:
# Install the module
pip install IPython
# Import the module
from IPython import display
display.set_matplotlib_formats('svg')
Histogram:
It is the statistical interpretation of the features. It's also called by frequency diagram.
- Basically, the histogram is a graph that helps to show the data distribution, dispersion, and shape of the data.
- By using the histogram, we get more information by looking at the graph.
- It is the statistical method to interpret the data.
Data types in hist plots?
- Categorical (nominal and ordinal)
- Numerical (Discrete)
- We can also use continuous data, but before that, we need to separate them into bins.
It is like Bar Plot:
- Here, we use bins for the x-axis, and for the bar plot, we use categories.
- We can change the order of the categories in the bar plot, but here the order is super important.
- It means we can change the x-axis based on our requirement in the bar plot.
Ex:
Consider this is our dataset:
class Interval (Price ranges of pens): In-dependent features
Frequency (Number of Pens sold) during the ranges: Dependent features
We are going to make a histogram by using this data.
Let's plot by using these value:
So simple, right? But in case of continuous numbers, selecting the bins is too arduous task.,
For selecting the bins, we use Freeman-Diaconis rule. In simple words, it helps to calculate the bins based on the data.
By using this simple formula, we can calculate the bins for our histogram.
Let hit in code: (Matplotlib Code)
# import libraries
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
## create some data
# number of data points
n = 1000
# number of histogram bins
k = 40
# generate log-normal distribution
data = np.exp( np.random.randn(n)/2 )
# one way to show a histogram
plt.hist(data,k)
plt.xlabel('Value')
plt.ylabel('Count')
plt.show()
output:
Using Freedman-Diaconis rule:
## try the Freedman-Diaconis rule
r = 2*stats.iqr(data)*n**
(-1/3) # This is one of the way to find how many bins needed for histogram.
b = np.ceil( (max(data)-min(data) )/r )
plt.hist(data,int(b))
# or directly from the hist function
#plt.hist(data,bins='fd')
plt.xlabel('Value')
plt.ylabel('Count')
plt.title('F-D "rule" using %g bins'%b)
plt.grid()
plt.tight_layout()
plt.show()
output:
Seaborn code:
- Seaborn uses Freedman rule to calculate the bins internally, so don't need to hard code here!
# Seaborn internally using the Freedman rule, so you don't need to worry about anything.
import seaborn as sns
sns.distplot(data) # uses FD rule by default
# that's all
These are all the basics of histogram, let's see about violin plot.
Violin Plot:
- No more introduction needed for this plot, it is exactly histogram but it has two histograms perpendicular to each other.
- All the applications and features are all same, then why we need to use this?
- If you want to see the histogram in 2D, you can use this. Better interpretation will be available here.
It is the beauty of histogram. It has 2 steps:
- we rotate the histogram to 90`.
- Copy the same thing to another side.
Visual interpretation:
Real-World Data Interpretation:
- Violin plot helps to visualize the data more efficient compare to histogram.
- Violin plot is in deferential statistics:
- It describes IQR, MEAN, MEDIAN and DISTRIBUTION.
Let hit in code: (Matplotlib Code)
# import libraries
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
# Install the module
#pip install IPython
# Import the module
from IPython import display
display.set_matplotlib_formats('svg')
## create the data
n = 1000
thresh = 5 # threshold for cropping data
data = np.exp( np.random.randn(n) )
data[data>thresh] = thresh + np.random.randn(sum(data>thresh))*.1
# show histogram
plt.hist(data,30)
plt.title('Histogram')
plt.grid()
plt.tight_layout()
plt.show()
# show violin plot
plt.violinplot(data)
plt.title('Violin')
plt.grid()
plt.tight_layout()
plt.show()
Ouput:
Seaborn Code:
import seaborn as sn
sns.violinplot(data,orient='v')
plt.tight_layout()
Ouput:
This is all about Hist and Violin plot.
Did you like this article? Don't forget to share:
Look at our latest articles:
Click the photo to redirect:
Activation Functions
Gentle Introduction to Inferential Statistics!
+
Name: R.Aravindan
Company: Artificial Neurons.AI
Position: Content Writer