Introduction to seaborn! (Part-1)

Introduction to seaborn! (Part-1)

Hi Guys! Welcome to the next interesting article about the EDA (Exploratory Data Analysis) part using python. EDA is the first part of our model-building journey. We can get various insights about data and valuable information by visualizing the data properly. The more you visualize the data, the more you get insights into the data.

For data visualization, we have more tools like Matplotlib, Plotly, NumPy, pandas, seaborn and more. Here we are going to see about the seaborn module present in python. The seaborn tool helps to visualize the data effectively, and it has more cool features in it. Every feature helps to understand the data in a better way! Seaborn is a library in Python predominantly used for making statistical graphics. Seaborn is a data visualization library built on top of Matplotlib and closely integrated with pandas data structures in Python. Visualization is the central part of Seaborn, which helps in the exploration and understanding of data.

Let's see about seaborn plots for both numerical and categorical data. It helps to understand the data more clearly!

They build Seaborn on top of the Matplotlib library; it means you can use both Matplotlib and seaborn syntax to get a good data visualization.

Before going into detail about seaborn, let me introduce the useful library. It will help to see the visualization clearly! We have a tool in Ipython that helps to visualize the data clearly because it has SVG (Scalable Vector Graphics) format. The SVG format helps to make the plot clear. Let me throw the picture before and after applying of SVG format in a notebook.

No alt text provided for this image


If you see carefully the image before, the plot looks blurred and is not visible clearly. After applying the SVG module, the plot looks clear and bright! It helps to understand the data more clearly.

Code to activate SVG in our notebook:

# Install the module 
pip install IPython 

# Import the module 

from IPython import display
display.set_matplotlib_formats('svg')        

Regression Plot:

Let's start with the basic Regression Plot, I hope you know the regression concept very clear! In regression, our ultimate aim is to fit the best fit line to the model. We find the best fit line by using MSE or MAE. It's a normal method, right? Think like this: if we can find the best-fit line by using visualization, it gives more information right! yes, this is what we are going to do now. Seaborn has a special visualization plot for regression. It helps to find the best fit line analytically. Sounds good, right! Let us see some of the basic concepts of regression plot present in seaborn.

The name of the plot: reg plot. It's a bivariate part because we are finding the best fit line for two features.

No alt text provided for this image

See the image, the line in cyan color represents the best fit line and the green dot represents the actual data points. If you see the blurry line near the hyperplanes, those are confidence intervals. Bootstrapping method helps calculate the confidence intervals. It does all the processes on its own. So, you don't need to worry about anything.

You have more hyper-parameter here to customize the visualization based on your protocol.

The hyper-parameters are:

  • fit_reg = False, it helps to remove the best fit line.
  • marker = '*', it helps to customize your marker.
  • scatter = False, it will remove the scatter data points.
  • CI = False, it will remove the confidence intervals in your plots.
  • line_kws = {'color': red, 'alpha': 0.3}, it means line keywords, it helps to control the line color and intensity of the line.
  • scatter_kws = {'color': 'green', alpha': 0.2}, same like previous one but it helps to control the data point color and intensity.

See the code:

import seaborn as sns

diamonds = sns.load_dataset('diamonds').dropna()

diamonds = diamonds.sample(n = 200, random_state = 44) # sampling 

sns.regplot(diamonds.carat, diamonds.price, fit_reg = False,marker = '+', scatter_kws= {'color':'red'});

# change the parameter for your code and analyze the plot clearly!         

What about polynomials?

If you have polynomial data with you, the regular reg plot doesn't help you with that. For visualizing the polynomial, you need to specify the order of your polynomial with your code and it will help you visualize the polynomial data.

Code!:

sns.regplot(diamonds.carat, diamonds.price, order = 2,line_kws = {'color':'cyan'}, scatter_kws = {'color':'pink'});        
No alt text provided for this image

The best-fit line changed based on the order of degree that you have given in the code! The reg plot helps us understand many things in our data especially the best fit line.

Not only polynomial and best-fit line, but you can also visualize the logistic regression using reg plot, for that you need to install the stats module and use it.

Cat Plot:

No alt text provided for this image

  • This is not what you thinking about, cat plot refers to categorical plot.
  • Categorical plot helps to find the relationship between two categories. You may think like this, cat plot was working by chi-square formula! No, it is just a visualization of two categorial features.

We can access to all categorical plots like box plot, violin plot, bar plot, count plot. if you don't know about anything, Don't worry future articles covers everything what I mentioned here. Just understand cat plot not only used one plot, it used multiple plot to visualize the better results.

Cat plot helps to analyze univariate and bivariate plot.

No alt text provided for this image

This is how cat plot looks, this is univariate analysis. It shows horsepower feature has uniform distribution, and it is increasing in a same way.

Same like previous plot, we have more hyper-parameters here.

If you have categorical data with the data, you can analyze the data before you giving the data to a categorical statistical test like chi-square and some other test.

This is the plot look like when we use with the help of some other plots.

No alt text provided for this image

This plot name is swarm plot, if you don't know don't worry, future article cover all the things. Just understand we can use multiple plots using cat plot, swarm plot helps to visualize each data point clearly.

And we can use more plot like this and analyze the data clearly, it helps to understand the categorical data clearly!


Code!

# importing data 

cars = sns.load_dataset('mpg').dropna(
cars = cars[cars.cylinders.isin([4,6,8])]
cars['type'] = ['old' if x <=76 else 'new' for x in cars.model_year])

sns.catplot(x = 'horsepower', data = cars, color = 'green', marker = 'o');

# kind = different plots 

sns.catplot(x = 'horsepower', data = cars,kind = 'swarm', color = 'pink');

# Multivariate plot 

sns.catplot(y = 'horsepower', x= 'origin', data = cars,kind = 'box',hue = 'cylinders');        

output:

No alt text provided for this image


Explore all the hyper parameters and get excellent knowledge in cat plot.

This is all about regression plot and categorical plot. We will look for more plots in future articles!

+

Name: R.Aravindan

Company: Artificial Neurons.AI

Position: Content Writer

Thank you!











To view or add a comment, sign in

More articles by Artificial Neurons.AI

Explore topics