Using ChatGPT for Sales Data Analysis - Risks & Benefits
New feature from ChatGPT

Using ChatGPT for Sales Data Analysis - Risks & Benefits

In this article, I share my experience with using ChatGPT's Code Interpreter for data analysis. This covers prompts, ChatGPT's responses, spotting errors, how to correct errors, and key learnings from this process.

To maintain a narrative structure in the article, I've shared my prompts exactly as entered and ChatGPT's responses exactly as received.


TL;DR / Key takeaways:

  • Overall, it's an excellent tool for anyone that doesn't want to spend hours analyzing data and creating visualizations.
  • Take the time to ask exploratory questions to understand the dataset.
  • Develop the prompts step-by-step in a logical flow after having understood the dataset.
  • Take the time to review ChatGPT's responses. This allows you to catch errors in ChatGPT's reasoning.
  • After ChatGPT rectifies any error you've found, stress-test the solution by asking it the same question in a different form.


Seeing all the hype surrounding ChatGPT's CodeInterpreter, I decided to test it out. The first step was finding out a list of uses for Code Interpreter, so I entered the charming prompt below into ChatGPT.

Prompt:

"shortened bullet points - what are uses of chatgpt's codeinterpreter"

ChatGPT's response:

"Uses of ChatGPT's CodeInterpreter:

  1. Execute Python code
  2. Provide real-time code solutions
  3. Offer interactive coding assistance
  4. Teach programming concepts
  5. Solve math problems
  6. Generate plots and visualizations
  7. Analyze datasets

Remember, this is within the context of our current session. Always exercise caution when running or sharing code. "End of response.


Prompt:

"In analyze datasets and generate plots and visualizations - can one upload an excel sheet?"

ChatGPT's response:

"Yes, in the context of our current session with the CodeInterpreter tool:

  • Users can upload Excel sheets (both .xls and .xlsx formats).
  • The data can be read, processed, and analyzed.
  • Plots and visualizations can be generated based on the data.

If you have an Excel sheet you'd like to analyze or visualize, you can upload it, and I can assist you further!" End of response.


As an MBA student, I'm very interested in the last two use cases - data analysis and visualization. I've held client-facing and strategic roles at a major Pakistani bank and went on to Product roles at two startups.

One thing that's true for all impactful roles today - professionals need to analyze data, extract valuable insights, and share learnings across the organization.

I wanted to see how fast I could do it with Code Interpreter. So I went to Kaggle - that great repository of datasets - and downloaded this (Sales Pipeline Conversion at a SaaS Startup | Kaggle) dataset uploaded 3 years ago by Kaggle user Gaurav Ditta. A complete description of the dataset here would disturb the narrative and so I've added it at the end of the article. To keep it brief, it's 78,025 rows and 13 columns about the company's sales pipeline.


The first step is uploading the dataset, which is fairly straightforward.

GPT-4 dropdown - visible whenever you start a new chat

After enabling ChatGPT in the GPT-4 dropdown, you simply press the + button next to the text bar to attach your file.

ChatGPT text bar

Once your file is uploaded, ChatGPT takes a few seconds to process it and respond with something like

ChatGPT's response:

"It appears that you've re-uploaded the file "Sales Dataset.xlsx". Would you like to continue exploring this dataset, or do you have a specific task or question in mind?" End of response.


My first prompt after this was

Prompt:

"Give me a chart that details the pipeline for each B2b sales medium"

ChatGPT's response:

"To visualize the pipeline for each B2B sales medium, we can consider the number of opportunities in each sales medium. I'll plot a bar chart showing the count of opportunities for each B2B sales medium. Let's create that visualization.

I apologize for the inconvenience, but it seems there was an issue generating the visualization. Let me attempt to recreate the chart for you.

I apologize for the oversight. It seems I need to reload the dataset and set up the required libraries again. Let me quickly do that and generate the chart for you.

Pipeline for Each B2B Sales Medium

Here's the chart detailing the pipeline for each B2B sales medium. The chart displays the number of opportunities associated with each sales medium. As we can see, "Enterprise Sellers" and "Marketing" are the dominant mediums in terms of the number of opportunities.

If you have any further questions or need additional visualizations, please let me know!" End of response.

That's when I thought "Very cool, let's check if it can handle multiple questions at a time."

Prompt:

"Okay, now give me charts that answer the following questions

  1. Rank the pipelines for each "technology primary". Create stacked columns so we can judge the quality of each pipeline
  2. Rank the b2b sales medium for each city - I want to see which sales medium is most effective in which city, which is second most effective and so on
  3. Rank the b2b sales medium for "technology primary". - I want to see which sales medium is most effective in which "technology primary". which is second most effective and so on"

ChatGPT's response:

"Alright, let's tackle these questions step by step:

  1. To rank the pipelines for each "Technology Primary", we'll create a stacked bar chart. For the quality of each pipeline, I'll use "Opportunity Status" (Won or Loss) to indicate the quality of leads. A higher proportion of "Won" would indicate better quality.
  2. For the ranking of B2B sales mediums in each city, a stacked bar chart would be appropriate. The height of each section in the bar will represent the number of opportunities for each sales medium.
  3. Similar to the second question, for ranking the B2B sales mediums for each "Technology Primary", I'll use a stacked bar chart.

Let's begin by creating the first chart.

Chart - Pipelines for Each Technology Primary (Stacked by Opportunity Status)

The chart above displays the pipelines for each "Technology Primary", stacked by "Opportunity Status" (Won or Loss). This helps in judging the quality of the pipeline for each technology. A larger proportion of "Won" status indicates better quality.

Next, let's move to the second chart: Ranking the B2B sales medium for each city.

I apologize for the oversight. There seems to be an error while trying to reorder the cities based on the sum of opportunities.

Let me correct that and regenerate the chart for ranking the B2B sales medium for each city.

Chart - Pipeline for Each B2B Sales Medium

Here's the stacked bar chart showing the ranking of B2B sales mediums for each city. The height of each segment in the bar represents the number of opportunities associated with each sales medium. This visualization gives an idea of which sales medium is most effective in each city.

Next, let's move on to the third chart, which will rank the B2B sales medium for each "Technology Primary".


Chart - B2B Sales Medium Ranking for Each Technology Primary

Here's the third chart that shows the ranking of B2B sales mediums for each "Technology Primary". Similar to the previous visualizations, the height of each segment in the bar represents the number of opportunities associated with each sales medium. This provides a clear picture of which sales medium is most effective for each technology primary.

If you have any further questions or need more insights, please let me know!" End of response.


It's interesting to note that although I inadvertently asked for "charts" in the next prompt, ChatGPT took it upon itself to "derive actionable insights"


Armed with a basic understanding of the company's sales efforts, I decided to go all out. To test its limits, I gave it a very comprehensive problem in the next prompt.

Prompt:

Analyze the whole dataset for relationships between variables and give me charts for anything that has decision-making implications for marketing and sales managers

ChatGPT's response:

"Certainly! To analyze relationships between variables and derive actionable insights, we'll follow this approach:

  1. Correlation Analysis: Understand the linear relationship between numeric variables.
  2. Categorical Relationships: Explore relationships between categorical variables and outcomes.
  3. Sales Outcome Analysis: Analyze factors affecting sales outcomes (e.g., 'Opportunity Status').

Let's start with the correlation analysis of numeric variables to identify any strong relationships.

Heatmap - Sales Velocity, Sales Stage Interations, Opportunity Size

The heatmap above showcases the correlation between numeric variables in the dataset. A few points to note:

  1. There's a moderate negative correlation between "Sales Velocity" and "Sales Stage Iterations". This suggests that opportunities with a higher number of stage iterations tend to have a lower sales velocity. It's an indication that deals that bounce between stages take longer to close.
  2. Other numeric variables do not show strong correlations with one another.

Next, let's explore relationships between categorical variables and outcomes. We'll focus on the "Opportunity Status" (Won or Loss) as the primary outcome and examine how different variables influence this outcome. This can provide insights into what factors are most influential in winning a sale.

We'll begin by analyzing the influence of "Technology Primary" on the outcome.

Chart - Opportunity Outcome by Technology Primary

The above chart showcases the proportion of "Won" and "Loss" opportunities for each "Technology Primary". Key insights from this chart:

  1. "Analytics" appears to have a higher win rate compared to other technologies. This might suggest that marketing and sales campaigns focusing on "Analytics" solutions could have better outcomes.
  2. "ERP Implementation" and "Technical Business Solutions" show roughly similar win rates, whereas "Legacy Modernization" has a slightly lower proportion of wins.

Next, let's analyze how different B2B sales mediums influence the opportunity outcome. This will provide insights into which sales mediums are most effective.

Opportunity Outcome by B2B Sales Medium

The chart above displays the proportion of "Won" and "Loss" opportunities for each B2B sales medium. Key takeaways:

  1. "Enterprise Sellers" appears to have the highest win rate among the sales mediums, suggesting it might be the most effective approach for converting opportunities.
  2. "Tele Sales" and "Partners" also have a commendable proportion of wins, indicating they are effective sales channels as well.
  3. "Online Leads" and "Marketing" have a relatively lower win rate. This could imply that while these channels might generate a high number of leads, the conversion rate is not as high.

Such insights can guide marketing and sales managers to allocate resources effectively and prioritize channels that yield higher conversion rates.

Finally, let's examine how cities influence the sales outcome, as geographic targeting can be an important aspect of sales and marketing strategies.

Chart - Opportunity Outcome by City

The chart above details the proportion of "Won" and "Loss" opportunities across different cities. Here are some insights:

  1. Mumbai, Delhi, and Bengaluru have a notably higher win rate compared to other cities. This could indicate the effectiveness of sales strategies in these cities or the presence of a more receptive audience.
  2. Hyderabad and Chennai have a relatively lower win rate. This might suggest room for improvement in sales strategies in these regions or the need for more targeted marketing campaigns.

For marketing and sales managers, these insights can be valuable in the following ways:

  • They can allocate more resources to cities with higher win rates to maximize returns.
  • They can investigate and refine sales and marketing strategies in cities with lower win rates to improve outcomes.
  • The effectiveness of different sales mediums can guide their decision on where to invest more for lead generation and conversion.

Would you like any further analysis or details on specific aspects of the dataset?"

End of response.

Cracks finally appeared in the next prompt. While the last prompt asked for data analysis and visualizations, the next asked for a strategy.

Prompt:

"Based on your analysis, act as a Chief Marketing & Sales Officer with 20 years of experience and generate a marketing and sales strategy for me - if I have $ 1 million in budget to allocate across all marketing channels, technologies and cities, how should I break it up? Share all assumptions in the start"

ChatGPT's response:

"Alright, let's craft a marketing and sales strategy for the given scenario. I'll begin by outlining a few assumptions based on the dataset analysis and common industry practices.

Assumptions:

  1. ROI Correlation: Higher win rates indicate a higher return on investment (ROI) for that particular channel, city, or technology. Thus, investing more in areas with higher win rates will yield better results.
  2. Market Saturation: Doubling the investment in a particular channel, technology, or city doesn't necessarily double the returns. There's a point of diminishing returns, so diversification is essential.
  3. Budget Allocation: The budget will be divided among three primary categories - channels (B2B Sales Medium), technologies (Technology Primary), and cities.
  4. Future Potential: Cities or technologies with lower win rates are not necessarily bad investments. They might represent untapped potential or areas requiring different strategies. A portion of the budget will be reserved for experimenting in these areas.
  5. Operational Costs: Some cities or channels might have higher operational costs than others. For simplicity, we're assuming uniform costs across all channels and cities.
  6. Channel Effectiveness: Not all channels might be suitable for all technologies or cities. We'll consider this based on the data we have.
  7. Budget Flexibility: The strategy will provide an initial allocation, but reallocation is possible based on real-time performance feedback.

Strategy:

Given our assumptions, I'll provide a budget split across channels, technologies, and cities. Let's break down the budget allocation:

Based on the analysis and assumptions, here's the proposed budget allocation:

B2B Sales Medium (Channels) Allocation (40% of budget = $400,000):

  • Marketing: $191,025.95
  • Enterprise Sellers: $178,189.04
  • Partners: $14,641.46
  • Tele Sales: $12,970.20
  • Online Leads: $3,173.34

Technology Primary Allocation (35% of budget = $350,000):

  • ERP Implementation: $230,904.29
  • Technical Business Solutions: $116,157.03
  • Analytics: $1,469.34
  • Legacy Modernization: $1,469.34


City Allocation (25% of budget = $250,000):

  • Mumbai: $75,636.81
  • Delhi: $48,845.52
  • Bengaluru: $29,415.10
  • Hyderabad: $28,663.41
  • Chennai: $23,415.78
  • Kolkata: $22,196.06
  • Pune: $21,827.31

Recommendations:

  1. Channel Strategy: Invest heavily in Marketing and Enterprise Sellers as they generate the most opportunities. However, keep an eye on the conversion rates of "Online Leads" and "Tele Sales" to identify potential growth or changes in trends.
  2. Technology Focus: ERP Implementation and Technical Business Solutions should be the primary focus due to their higher win rates. Allocate a small budget to "Analytics" and "Legacy Modernization" to test and potentially identify areas for improvement.
  3. Geographic Focus: Mumbai, Delhi, and Bengaluru are the top cities in terms of win rates. While they should receive a significant portion of the budget, it's also important to maintain a presence in other cities like Hyderabad and Chennai to tap into potential markets.

Overall, the strategy aims to capitalize on existing strengths while also allocating resources to test and explore areas with potential growth. Regular reviews and real-time feedback should guide any budget reallocations or strategy shifts.

Would you like any further breakdowns or insights?"

End of response.

Unable to understand the last prompt, I asked for a clarification

Prompt

"This part of your answer was not clear. Let me know if you need me to reupload the file so you can recheck how you got to the answer. Explain it to me in a way that allows me to answer the following question - how much of my budget should go towards enterprise sellers selling technical business solutions in hyderabad?

"Based on the analysis and assumptions, here's the proposed budget allocation..."

ChatGPT's response

"Certainly, let's break down the budget allocation for "Enterprise Sellers" selling "Technical Business Solutions" in "Hyderabad".

1. B2B Sales Medium Allocation for Enterprise Sellers:

From the proposed budget allocation, $178,189.04 is allocated to Enterprise Sellers out of the total budget of $1,000,000. This means 17.8189% of the total budget is allocated to Enterprise Sellers.

2. Technology Primary Allocation for Technical Business Solutions:

From the proposed budget allocation, $116,157.03 is allocated to Technical Business Solutions out of the total budget of $1,000,000. This means 11.6157% of the total budget is allocated to Technical Business Solutions.

3. City Allocation for Hyderabad:

From the proposed budget allocation, $28,663.41 is allocated to Hyderabad out of the total budget of $1,000,000. This means 2.8663% of the total budget is allocated to Hyderabad.

To determine the budget allocation for "Enterprise Sellers" selling "Technical Business Solutions" in "Hyderabad", we need to take the product of these three percentages and then apply it to the total budget.

Let's calculate it:

I apologize for the inconvenience. It seems there was a technical glitch. Let me recalculate the budget allocation for you.

Given:

  • Total budget = $1,000,000
  • Percentage allocated to Enterprise Sellers = 17.8189% of total budget
  • Percentage allocated to Technical Business Solutions = 11.6157% of total budget
  • Percentage allocated to Hyderabad = 2.8663% of total budget

To determine the budget allocation for "Enterprise Sellers" selling "Technical Business Solutions" in "Hyderabad", we need to take the product of these three percentages and then apply it to the total budget.

Let's calculate the combined budget allocation:

I apologize for the oversight. Let me define the necessary variables and calculate the combined budget allocation again.

Based on the budget allocations provided and the calculations, you should allocate approximately $593.26 (rounded to the nearest cent) of your $1,000,000 budget towards "Enterprise Sellers" selling "Technical Business Solutions" in "Hyderabad".

This value represents the combined influence of the channel (Enterprise Sellers), the technology (Technical Business Solutions), and the city (Hyderabad) on your total budget.

Would you like any further explanations or insights?" End of response.

Here's where I pointed out the mistake, and ChatGPT was quick to correct it.

Prompt:

Are you sure about that? Seems too small. I do think we should be adding the three values 178189.04, 116157.03 and 28663.41"

ChatGPT's response

"I apologize for the confusion. You're right; my approach was incorrect in taking the product of the percentages. Instead, we should be considering the intersection of these budgets.

Given that you have $178,189.04 allocated for "Enterprise Sellers", $116,157.03 for "Technical Business Solutions", and $28,663.41 for "Hyderabad", the budget for "Enterprise Sellers" selling "Technical Business Solutions" in "Hyderabad" would be the smallest of these three values, as that would be the limiting factor.

Let's determine the correct budget allocation based on this approach.

I apologize for the oversight in my previous calculations.

Given the allocations:

  • $178,189.04 for "Enterprise Sellers"
  • $116,157.03 for "Technical Business Solutions"
  • $28,663.41 for "Hyderabad"

The budget for "Enterprise Sellers" selling "Technical Business Solutions" in "Hyderabad" would be $28,663.41. This is because the budget for Hyderabad is the smallest of the three values and acts as the limiting factor.

Again, I apologize for the confusion, and thank you for pointing it out."

End of response.

Kudos for making it this far. In the next article, I'll cover the advanced analysis and recommendations.

I've restated the key takeaways below.

TL;DR / Key takeaways:

  • Overall, it's an excellent tool for anyone that doesn't want to spend hours analyzing data and creating visualizations.
  • Take the time to ask exploratory questions to understand the dataset.
  • Develop the prompts step-by-step in a logical flow after having understood the dataset.
  • Take the time to review ChatGPT's responses. This allows you to catch errors in ChatGPT's reasoning.
  • After ChatGPT rectifies any error you've found, stress-test the solution by asking it the same question in a different form.

End of Part 1 of my Code Interpreter article




Description of the dataset - 78,025 rows and 13 columns. Column names are in bold.

  1. Opportunity ID: simple serial number.
  2. Technology Primary: The vertical within technology; there are 4 in total. These are ERP Implementation, Technical Business Solutions, Legacy Modernization and Analytics).
  3. City: 7 major Indian cities.
  4. B2B Sales Medium: The sales channel or how they're selling; there are 5 in total. These are Marketing, Enterprise Sellers, Partners, Tele Sales, Online Leads.
  5. Sales Velocity: A discrete, numerical figure. According to Google, "Sales velocity is the measurement of how quickly a sales team closes deals and generates revenue - the speed at which leads in the pipeline are converted into customers."
  6. Opportunity Status: Whether the company won or lost the deal. This allows the company to track the impact of its marketing efforts.
  7. Sales Stage Iterations: Another discrete, numerical figure - that refers to how many times a deal moved to the previous stage in the pipeline.
  8. Opportunity Sizing (USD): There are 7 categories - The smallest one is 0-10k and the largest one is More than 60k.
  9. Client Revenue Sizing: This one's self-explanatory; has 5 categories.
  10. Client Employee Sizing: Number of employees.
  11. Business from Client Last Year: Whether or not the client transacted in the previous year. 5 categories - the smallest one is 0 (No business)
  12. Compete Intel: How much we know about the client - Unknown, Known and None
  13. Opportunity Sizing: 10K or less, 10K to 20K, 20K to 30K, 30K to 40K, 40K to 50K, 50K to 60K, More than 60K



I am amazed with CHAT GPT advance. Learnt some new insights.

Like
Reply
Oluwole Ayokunle PMP®

Project Manager | Engineer | Operations Management | MBA

1y

It’s great to see how ChatGPT can make work so much easier. The future belongs to those who can use AI to perform repetitive tasks, thus freeing up time to focus on more high value and complex tasks.

Like
Reply

That was quite insightful, Suwaid. Thanks for sharing 🙌🏻

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics