Exploratory Data Analysis using pandas visual analysis library

Exploratory Data Analysis using pandas visual analysis library

Pandas Visual Analysis is an open-source python library which is used to visually analyze the data and that too in just a single line of code.

It creates a user interface that can be used to create different plots and graphs taking different attributes.

It supports a large variety of graphs and plots, also all the graphs are created using Plotly so that they are highly interactive, visually appealing, and easily downloadable.

Pandas Visual Analysis is a package provided by python for interactive visual analysis in jupyter notebook.

It generates an interactive visual analysis widget to analyze pandas Data Frame.

 This allows data exploration and cognition to be simple, even with complex multivariate datasets.

 There is no need to create and style plots, it will automate the whole data exploration part.

# import the library

from pandas_visual_analysis import VisualAnalysis


# visualizing different plots

VisualAnalysis(dataset)        
No alt text provided for this image

Now, we have 3 selection types.

  •  Standard: It describes our dataset. If we write dataset.describe() at that time we will get all these things that are mention in the standard section type.

No alt text provided for this image

  •  Subtractive: we have an option to choose particular features and create a scatter plot among them. Subtractive provides one feature that from the scatter plot we can select some of the data points and remove them which will help us to analyze that what is the impact of that particular data points on our dataset. It will not permanently remove the data points, for only exploration purposes it will remove them.

In the below snapshot it is clearly mentioned that first, we remove the data points available in the red highlighted area, then check out the next snapshot where the removed data points are mentioned in grey color(grey color means data points are removed) and we can also see the change occur in the LHS part of both the images due to removing a small set of the data point.

No alt text provided for this image
No alt text provided for this image

From the above snapshot, we can easily understand that what is the impact of the red highlighted part on our dataset after removing it.

We have two more graphs in which first is describing that how much data is gone after removing the red highlighted area from that dataset.

No alt text provided for this image

Here the greyish part is removed after removing the red highlighted area from the above plot.

It will help us to understand if we remove some certain data points which are far away from the mean of the data set then what will be the impact of removing that specific data points.

  • Additive: Once we remove the elements if we have added that element again that time additive selection type will help us to add.

 Now add all the removed data and look at the changes in our plot.

No alt text provided for this image

It also provides the function of normalizing the features.

 You can check my GitHub profile for code.


To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics