Unlocking Insights: How Everyday Charts Boost Business Understanding and Decision-Making

Data analysis often begins with exploration, a journey into the unknown terrain of information. Effective exploration relies on visualization, transforming raw numbers into insightful patterns and trends. Here, we will explore top 10 charts commonly used in Exploratory Data Analysis (EDA), along with their purposes and use cases with real life example. We can use tools like Matplotlib (Python), ggplot2 (R), or seaborn (Python) to visualize all the plots. For this article, I will use python Matplotlib and Seaborn mostly to generate all the charts.

1. Scatter Plot:

Purpose: Reveals relationships between two numerical variables.
Use Case: Analyzing the correlation between temperature and ice cream sales over a period.
Example: Temperature (in degrees Celsius) vs. Ice Cream Sales (in units) over 30 days.

import matplotlib.pyplot as plt

# Assuming you have two lists of data, one for temperature and one for ice cream sales
temperature = [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70]
ice_cream_sales = [100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600]

# Create the scatter plot
plt.scatter(temperature, ice_cream_sales)

# Add a title and labels
plt.title('Temperature vs. Ice Cream Sales')
plt.xlabel('Temperature (°C)')
plt.ylabel('Ice Cream Sales (units)')

# Display the plot
plt.show()

2. Histogram:

Purpose: Displays the distribution of a single numerical variable.
Use Case: Understanding the distribution of student scores in a class.
Example: Student scores (out of 100) in a class of 50 students.

import numpy as np
import matplotlib.pyplot as plt

# Sample data (replace this with your dataset)
student_scores = np.random.randint(0, 100, 50)

# Generate histogram
plt.figure(figsize=(8, 6))
plt.hist(student_scores, bins=10, color='skyblue', edgecolor='black')
plt.title('Distribution of Student Scores')
plt.xlabel('Scores')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

3. Box Plot (Box-and-Whisker Plot):

Purpose: Summarizes the distribution of a numerical variable and identifies outliers.
Use Case: Comparing the distribution of salaries across different departments in a company.
Example: Salaries (in USD) across different departments: Sales, Marketing, HR, and IT.

import matplotlib.pyplot as plt
import numpy as np

# Data for demonstration purposes only
salaries = np.array([
    [50000, 60000, 70000, 80000],  # Sales
    [40000, 50000, 60000, 70000],  # Marketing
    [30000, 40000, 50000, 60000],  # HR
    [60000, 70000, 80000, 90000]  # IT
])

# Create a box plot
plt.boxplot(salaries, notch=True, sym='', whis=1.5)

# Add labels and titles
plt.xlabel('Department')
plt.ylabel('Salary (USD)')
plt.title('Distribution of Salaries Across Departments')

# Show the plot
plt.show()

4. Bar Chart:

Purpose: Compares categories or groups using rectangular bars.
Use Case: Visualizing sales performance of different products in a retail store.
Example: Sales performance of different products (e.g., smartphones, laptops, tablets) in a retail store over a month.

import matplotlib.pyplot as plt
import pandas as pd

# Data for demonstration purposes only
data = pd.DataFrame({
    'Product': ['Smartphones', 'Laptops', 'Tablets'],
    'Sales': [1000, 800, 1200],
    'Month': ['January', 'January', 'January']
})

# Create a bar chart
plt.bar(data['Product'], data['Sales'])

# Add labels and titles
plt.xlabel('Product')
plt.ylabel('Sales')
plt.title('Sales Performance of Different Products in January')

# Show the plot
plt.show()

5. Line Chart:

Purpose: Displays trends over time or sequential categories.
Use Case: Tracking the monthly website traffic of an online business.
Example: Monthly website traffic (number of visits) over the past year.

import numpy as np
import matplotlib.pyplot as plt

# Create numpy arrays from the data
months = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'])
website_traffic = np.array([1000, 1200, 1500, 1800, 2000, 2200, 2500, 2800, 3000, 3200, 3500])

# Pad website_traffic with np.nan to match the length of months
website_traffic = np.concatenate((website_traffic, [np.nan]))

# Plot the data
plt.plot(months, website_traffic)

# Add labels and titles
plt.xlabel('Month')
plt.ylabel('Website Traffic (Number of Visits)')
plt.title('Monthly Website Traffic Over the Past Year')

# Show the plot
plt.show()

6. Pie Chart:

Purpose: Illustrates the proportion of each category in a dataset.
Use Case: Showing the composition of expenses in a monthly budget.
Example: Monthly budget allocation for expenses such as rent, groceries, utilities, entertainment, and savings.

import matplotlib.pyplot as plt

# Data for the pie chart
values = [3000, 2000, 1500, 1000, 500]
labels = ['Rent', 'Groceries', 'Utilities', 'Entertainment', 'Savings']

# Create the pie chart
plt.pie(values, labels=labels, autopct='%1.1f%%')

# Add a title and labels
plt.title('Monthly Budget Allocation')
plt.xlabel('Category')
plt.ylabel('Value')

# Show the plot
plt.show()

7. Heatmap:

Purpose: Visualizes relationships in a matrix format.
Use Case: Analyzing correlation between various features in a dataset.
Example: Correlation matrix of various features in a dataset, such as age, income, education level, and spending habits.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data (replace this with your dataset)
data = {
    'Age': np.random.randint(20, 60, 100),
    'Income': np.random.randint(20000, 100000, 100),
    'Education': np.random.randint(10, 20, 100),
    'Spending': np.random.randint(100, 500, 100)
}
df = pd.DataFrame(data)

# Compute correlation matrix
corr = df.corr()

# Generate heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

8. Violin Plot:

Purpose: Combines aspects of box plot and kernel density plot to visualize distribution.
Use Case: Comparing the distribution of exam scores between different schools.
Example: Exam scores (out of 100) from three different schools: School A, School B, and School C.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data (replace this with your dataset)
data = {
    'School': np.random.choice(['School A', 'School B', 'School C'], 100),
    'Exam Score': np.random.randint(40, 100, 100)
}
df = pd.DataFrame(data)

# Generate violin plot
plt.figure(figsize=(8, 6))
sns.violinplot(x='School', y='Exam Score', data=df, palette='muted')
plt.title('Exam Scores Distribution Across Schools')
plt.xlabel('School')
plt.ylabel('Exam Score')
plt.show()

9. Area Chart:

Purpose: Depicts the magnitude of change over time for multiple variables.
Use Case: Tracking the market share of competing companies over several years.
Example: Market share (%) of three competing companies (Company X, Company Y, and Company Z) over the past five years.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Sample data (replace this with your dataset)
years = [2019, 2020, 2021, 2022, 2023]
company_x = np.random.uniform(10, 30, 5)
company_y = np.random.uniform(20, 40, 5)
company_z = np.random.uniform(5, 25, 5)

# Create DataFrame
data = {
    'Year': years,
    'Company X': company_x,
    'Company Y': company_y,
    'Company Z': company_z
}
df = pd.DataFrame(data)

# Plot area chart
plt.figure(figsize=(10, 6))
plt.stackplot(df['Year'], df['Company X'], df['Company Y'], df['Company Z'], labels=['Company X', 'Company Y', 'Company Z'])
plt.title('Market Share Over Time')
plt.xlabel('Year')
plt.ylabel('Market Share (%)')
plt.legend(loc='upper left')
plt.show()

10. Scatter Matrix (Pair Plot):

Purpose: Shows pairwise relationships between multiple variables.
Use Case: Exploring correlations between different features in a dataset.
Example: Multiple features (e.g., age, income, education level, spending) from a dataset.

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sample data (replace this with your dataset)
data = {
    'Age': np.random.randint(20, 60, 100),
    'Income': np.random.randint(20000, 100000, 100),
    'Education': np.random.randint(10, 20, 100),
    'Spending': np.random.randint(100, 500, 100)
}
df = pd.DataFrame(data)

# Generate pair plot
sns.pairplot(df)
plt.show(

In conclusion, by showcasing various types of charts with real-life examples, readers can gain insight into their practical applications. Whether analyzing correlations between features, comparing distributions across categories, tracking changes over time, or exploring relationships between variables, each chart serves a distinct purpose. Through Python code snippets provided, readers can readily apply these visualizations to their own datasets, enhancing their understanding and facilitating data exploration and analysis.

Unlocking Insights: How Everyday Charts Boost Business Understanding and Decision-Making

Khondaker Sajid Alam

Recommended by LinkedIn

Insights from the community

Others also viewed

F-distribution and its Application in Hypothesis Testing

Unveiling the Multifaceted Tapestry of Data: Diversity of Variables in Statistics

Nonparametric Regression in R Studio

Unlocking the Power of Data Science: The Expertise of Brian Namanya

DECISION TREES AND TITANIC DATASET

EFFECTIVE DATA SCIENCE PRESENTATIONS

The Art of Asking Questions: A Data Scientist's Guide to Problem-Solving.

Association Rules in Data Science: Unveiling Hidden Patterns in Data

Check Regional Information via Coarsen

Explore topics