Introduction to seaborn! (Part -2)

Introduction to seaborn! (Part -2)

Hi Guys! Welcome to the next part of the seaborn series.

If you missed Part-1, please make sure you read the previous part because it is the continuous part of seaborn. You can't understand this. click to hit the article 🎯

Here's the small reminder: Seaborn helps to visualize the data effectively, and it has more cool features in it. Every feature (independent variables) needs to visualize properly to get beautiful insights from the data. Seaborn helps to visualize the data statistically, and it has a more flexible function within it, which helps to visualize properly!

In the last article, we had a deeper look at the reg and cat plot, today we will look at hist and violin plot.

Before going into detail about seaborn, let me introduce the useful library. It will help to see the visualization clearly! We have a tool in Ipython that helps to visualize the data clearly because it has the format of SVG (Scalable Vector Graphics) format. The SVG format helps to make the plot clear, and it helps to interpret the image easily. Let me throw the picture before and after applying of SVG format in a notebook.

No alt text provided for this image

If you see carefully the image before, the plot looks blurred and is not visible clearly. After applying the SVG module, the plot looks clear and bright! It helps to understand the data more clearly.

Code to activate SVG in a notebook:

# Install the module 
pip install IPython 

# Import the module 

from IPython import display
display.set_matplotlib_formats('svg')        

Histogram:

It is the statistical interpretation of the features. It's also called by frequency diagram.

  • Basically, the histogram is a graph that helps to show the data distribution, dispersion, and shape of the data.
  • By using the histogram, we get more information by looking at the graph.
  • It is the statistical method to interpret the data.

Data types in hist plots?

  1. Categorical (nominal and ordinal)
  2. Numerical (Discrete)
  3. We can also use continuous data, but before that, we need to separate them into bins.

It is like Bar Plot:

  1. Here, we use bins for the x-axis, and for the bar plot, we use categories.
  2. We can change the order of the categories in the bar plot, but here the order is super important.
  3. It means we can change the x-axis based on our requirement in the bar plot. 

Ex:

Consider this is our dataset:

No alt text provided for this image

class Interval (Price ranges of pens): In-dependent features

Frequency (Number of Pens sold) during the ranges: Dependent features

We are going to make a histogram by using this data.

No alt text provided for this image

Let's plot by using these value:

No alt text provided for this image


So simple, right? But in case of continuous numbers, selecting the bins is too arduous task.,

For selecting the bins, we use Freeman-Diaconis rule. In simple words, it helps to calculate the bins based on the data.

No alt text provided for this image

By using this simple formula, we can calculate the bins for our histogram.

Let hit in code: (Matplotlib Code)

# import libraries
import matplotlib.pyplot as plt
import numpy as np 
import scipy.stats as stats

## create some data


# number of data points
n = 1000


# number of histogram bins
k = 40


# generate log-normal distribution
data = np.exp( np.random.randn(n)/2 )



# one way to show a histogram
plt.hist(data,k)
plt.xlabel('Value')
plt.ylabel('Count')
plt.show()        

output:

No alt text provided for this image

Using Freedman-Diaconis rule:

## try the Freedman-Diaconis rule


r = 2*stats.iqr(data)*n**
(-1/3)   # This is one of the way to find how many bins needed for histogram. 

b = np.ceil( (max(data)-min(data) )/r )


plt.hist(data,int(b))


# or directly from the hist function
#plt.hist(data,bins='fd')


plt.xlabel('Value')
plt.ylabel('Count')
plt.title('F-D "rule" using %g bins'%b)
plt.grid()
plt.tight_layout()
plt.show()        

output:

No alt text provided for this image

Seaborn code:

  • Seaborn uses Freedman rule to calculate the bins internally, so don't need to hard code here!

# Seaborn internally using the Freedman rule, so you don't need to worry about anything. 

import seaborn as sns 
sns.distplot(data) # uses FD rule by default

# that's all         

These are all the basics of histogram, let's see about violin plot.

Violin Plot:

No alt text provided for this image

  • No more introduction needed for this plot, it is exactly histogram but it has two histograms perpendicular to each other.
  • All the applications and features are all same, then why we need to use this?
  • If you want to see the histogram in 2D, you can use this. Better interpretation will be available here.

It is the beauty of histogram. It has 2 steps:

  1.  we rotate the histogram to 90`.
  2. Copy the same thing to another side. 

Visual interpretation:

No alt text provided for this image

Real-World Data Interpretation:

No alt text provided for this image

  • Violin plot helps to visualize the data more efficient compare to histogram.
  • Violin plot is in deferential statistics:
  • It describes IQR, MEAN, MEDIAN and DISTRIBUTION.

Let hit in code: (Matplotlib Code)

# import libraries

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
# Install the module 
#pip install IPython 


# Import the module 


from IPython import display
display.set_matplotlib_formats('svg')

## create the data

n = 1000
thresh = 5 # threshold for cropping data


data = np.exp( np.random.randn(n) )
data[data>thresh] = thresh + np.random.randn(sum(data>thresh))*.1


# show histogram
plt.hist(data,30)
plt.title('Histogram')
plt.grid()
plt.tight_layout()
plt.show()


# show violin plot
plt.violinplot(data)
plt.title('Violin')
plt.grid()
plt.tight_layout()
plt.show()        

Ouput:

No alt text provided for this image

Seaborn Code:

import seaborn as sn
sns.violinplot(data,orient='v')
plt.tight_layout()        

Ouput:

No alt text provided for this image

This is all about Hist and Violin plot.

Did you like this article? Don't forget to share:

Look at our latest articles:

Click the photo to redirect:

No alt text provided for this image


Activation Functions




No alt text provided for this image


Gentle Introduction to Inferential Statistics!





+

Name: R.Aravindan

Company: Artificial Neurons.AI

Position: Content Writer


To view or add a comment, sign in

More articles by Artificial Neurons.AI

Explore topics