Unlocking Insights: How Everyday Charts Boost Business Understanding and Decision-Making
Data analysis often begins with exploration, a journey into the unknown terrain of information. Effective exploration relies on visualization, transforming raw numbers into insightful patterns and trends. Here, we will explore top 10 charts commonly used in Exploratory Data Analysis (EDA), along with their purposes and use cases with real life example. We can use tools like Matplotlib (Python), ggplot2 (R), or seaborn (Python) to visualize all the plots. For this article, I will use python Matplotlib and Seaborn mostly to generate all the charts.
1. Scatter Plot:
import matplotlib.pyplot as plt
# Assuming you have two lists of data, one for temperature and one for ice cream sales
temperature = [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70]
ice_cream_sales = [100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600]
# Create the scatter plot
plt.scatter(temperature, ice_cream_sales)
# Add a title and labels
plt.title('Temperature vs. Ice Cream Sales')
plt.xlabel('Temperature (°C)')
plt.ylabel('Ice Cream Sales (units)')
# Display the plot
plt.show()
2. Histogram:
import numpy as np
import matplotlib.pyplot as plt
# Sample data (replace this with your dataset)
student_scores = np.random.randint(0, 100, 50)
# Generate histogram
plt.figure(figsize=(8, 6))
plt.hist(student_scores, bins=10, color='skyblue', edgecolor='black')
plt.title('Distribution of Student Scores')
plt.xlabel('Scores')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
3. Box Plot (Box-and-Whisker Plot):
import matplotlib.pyplot as plt
import numpy as np
# Data for demonstration purposes only
salaries = np.array([
[50000, 60000, 70000, 80000], # Sales
[40000, 50000, 60000, 70000], # Marketing
[30000, 40000, 50000, 60000], # HR
[60000, 70000, 80000, 90000] # IT
])
# Create a box plot
plt.boxplot(salaries, notch=True, sym='', whis=1.5)
# Add labels and titles
plt.xlabel('Department')
plt.ylabel('Salary (USD)')
plt.title('Distribution of Salaries Across Departments')
# Show the plot
plt.show()
4. Bar Chart:
import matplotlib.pyplot as plt
import pandas as pd
# Data for demonstration purposes only
data = pd.DataFrame({
'Product': ['Smartphones', 'Laptops', 'Tablets'],
'Sales': [1000, 800, 1200],
'Month': ['January', 'January', 'January']
})
# Create a bar chart
plt.bar(data['Product'], data['Sales'])
# Add labels and titles
plt.xlabel('Product')
plt.ylabel('Sales')
plt.title('Sales Performance of Different Products in January')
# Show the plot
plt.show()
5. Line Chart:
import numpy as np
import matplotlib.pyplot as plt
# Create numpy arrays from the data
months = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'])
website_traffic = np.array([1000, 1200, 1500, 1800, 2000, 2200, 2500, 2800, 3000, 3200, 3500])
# Pad website_traffic with np.nan to match the length of months
website_traffic = np.concatenate((website_traffic, [np.nan]))
# Plot the data
plt.plot(months, website_traffic)
# Add labels and titles
plt.xlabel('Month')
plt.ylabel('Website Traffic (Number of Visits)')
plt.title('Monthly Website Traffic Over the Past Year')
# Show the plot
plt.show()
Recommended by LinkedIn
6. Pie Chart:
import matplotlib.pyplot as plt
# Data for the pie chart
values = [3000, 2000, 1500, 1000, 500]
labels = ['Rent', 'Groceries', 'Utilities', 'Entertainment', 'Savings']
# Create the pie chart
plt.pie(values, labels=labels, autopct='%1.1f%%')
# Add a title and labels
plt.title('Monthly Budget Allocation')
plt.xlabel('Category')
plt.ylabel('Value')
# Show the plot
plt.show()
7. Heatmap:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data (replace this with your dataset)
data = {
'Age': np.random.randint(20, 60, 100),
'Income': np.random.randint(20000, 100000, 100),
'Education': np.random.randint(10, 20, 100),
'Spending': np.random.randint(100, 500, 100)
}
df = pd.DataFrame(data)
# Compute correlation matrix
corr = df.corr()
# Generate heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()
8. Violin Plot:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data (replace this with your dataset)
data = {
'School': np.random.choice(['School A', 'School B', 'School C'], 100),
'Exam Score': np.random.randint(40, 100, 100)
}
df = pd.DataFrame(data)
# Generate violin plot
plt.figure(figsize=(8, 6))
sns.violinplot(x='School', y='Exam Score', data=df, palette='muted')
plt.title('Exam Scores Distribution Across Schools')
plt.xlabel('School')
plt.ylabel('Exam Score')
plt.show()
9. Area Chart:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Sample data (replace this with your dataset)
years = [2019, 2020, 2021, 2022, 2023]
company_x = np.random.uniform(10, 30, 5)
company_y = np.random.uniform(20, 40, 5)
company_z = np.random.uniform(5, 25, 5)
# Create DataFrame
data = {
'Year': years,
'Company X': company_x,
'Company Y': company_y,
'Company Z': company_z
}
df = pd.DataFrame(data)
# Plot area chart
plt.figure(figsize=(10, 6))
plt.stackplot(df['Year'], df['Company X'], df['Company Y'], df['Company Z'], labels=['Company X', 'Company Y', 'Company Z'])
plt.title('Market Share Over Time')
plt.xlabel('Year')
plt.ylabel('Market Share (%)')
plt.legend(loc='upper left')
plt.show()
10. Scatter Matrix (Pair Plot):
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample data (replace this with your dataset)
data = {
'Age': np.random.randint(20, 60, 100),
'Income': np.random.randint(20000, 100000, 100),
'Education': np.random.randint(10, 20, 100),
'Spending': np.random.randint(100, 500, 100)
}
df = pd.DataFrame(data)
# Generate pair plot
sns.pairplot(df)
plt.show(
In conclusion, by showcasing various types of charts with real-life examples, readers can gain insight into their practical applications. Whether analyzing correlations between features, comparing distributions across categories, tracking changes over time, or exploring relationships between variables, each chart serves a distinct purpose. Through Python code snippets provided, readers can readily apply these visualizations to their own datasets, enhancing their understanding and facilitating data exploration and analysis.
Data Scientist (8+ years ) । Machine Learning। Deep Learning | Generative AI
10moVery good Sajid. Next time can try to add interpretation of charts