Thinking fast and distributions.
"Thinking, Fast and Slow," written by Nobel Prize winner Daniel Kahneman, is the gold standard for knowing how we make fast decisions.
And the skill of knowing how to make quick decisions is important for any role.
As data analysts, we usually look for different distributions in our datasets.
We love to see when datasets (at least mostly) fit or reflect a certain distribution.
This is because we can develop "heuristics" on how different KPIs or metrics behave.
This can help you make decisions faster as well. It also helps a lot in knowing what to expect. This is important for developing good intuition, and having good intuition can lead to savings, avoiding mistakes, and attracting more luck on your side when you make the next call.
Thus, let's get you to know the distributions of 2 metrics (engagement rate & CPC), so you can also start "thinking fast and distributions".
The Power Law.
I was wondering what can be learned about the engagement that happens on LinkedIn.
And I asked myself: "What is the distribution that the engagement rate on LinkedIn follows?"
It might not seem like a big deal to some (unless you're a SoMe manager).
So later in this article, I'll also analyze 1 more metric that is more cost-oriented if that's what you are interested in.
Nonetheless, there are many benefits to knowing your engagement rate distribution, including:
To find the answer to my question, I "ChatGPT-ed it" using the following prompt:
Context: (You are a data analyst specializing in SoMe analytics.)
Details: (Write down an answer that is no longer than 300 characters.)
Targeted social media: (LinkedIn.)
If you were to choose, what distribution would the engagement rate on LinkedIn posts follow?
And ChatGPT-4 came up with this answer:
For LinkedIn posts, the engagement rate typically follows a right-skewed distribution, with most posts receiving low engagement and a few posts achieving exceptionally high engagement.
A highgly right-skewed distribution is also called the "Power Law". It's a distribution with very few observations on the right side where you observe the high values for your variable of interest. On the other hand, the majority of observations are concentrated on the left side where you observe the lowest values for your variable of interest.
It looks something like this:
When I asked to double-check whether ChatGPT-4 meant the so-called "Power Law", which is a highly rightly-skewed distribution, it confirmed to me that yes:
Initially, it seemed accurate.
You would expect to have A LOT of posts on LinkedIn that get a low engagement rate.
And then only a few would be sticking out with a higher engagement rate, creating a long tail.
But, when I thought about it more, it got me thinking...
Wouldn't it actually make more sense that the distribution would be more......normal?
Taking into account that you have a SoMe manager who creates "average" content.
Shouldn't there be a few posts in both? A few "losers" receive low and a few "winners" receive high engagement rates, with most posts falling in the middle?
Did this very small note: "ChatGPT can make mistakes. Consider checking important information," just become true?
Did ChatGPT just make a mistake?
Is your engagement rate on LinkedIn following the "Power Law" distribution?
To confirm or deny what ChatGPT told me, I decided to put it to the test.
I got my hands on a dataset including a sample of engagement rates from 3 different anonymized LinkedIn pages.
I then started to simulate the distributions of different, randomly selected samples:
...and I saw some surprising results.
Histogram (n=5 posts)
All right!
This is looking promising?
It looks like there could be a hint of the mentioned "Power Law".
Let's keep going and increase the sample size.
Histogram (n=13 posts)
Ok.
Even with as few as 13 posts, this is starting to look a little different.
Let's increase the sample size again.
Histogram (n=20 posts)
Not looking good for ChatGPT at this point...
Let's see what happens if we increase the sample size to 1 year's worth of posts on LinkedIn.
Histogram (n=289 posts)
In the end, the intuition was right.
The engagement rate on LinkedIn seems to be approximatelly normally distributed around the mean of 0.05112 (5.11%) with a quite high standard deviation of 0.02387 (+-2.39%).
This would be quite in line with this analysis that found that the average engagement rate on LinkedIn for an Italian property management company was 0.06 (6%).
Perhaps, a little right-skewed, as there are 151 values on the right and 138 on the left side from the mean.
But, I think it's fair to say that this resembles a bell-shaped normal distribution (especially if you "smooth it out" with KDE) which you can see in the simulation I prepared below.
By placing all 4 distributions on one visual with the calculated Kernel Density Estimation (KDE), you can observe how the engagement rate distribution changes with a bigger sample size.
The Kernel Density Estimation (KDE) will help you understand the distribution of the engagement rate on LinkedIn by estimating the probability density function.
This function can provide insights into the shape, spread, and modality of the engagement rate on LinkedIn.
Recommended by LinkedIn
It simply allows for a smoother and more interpretable representation of the distribution compared to a histogram.
The first smallest sample (blue) was hinting in favor of the "Power Law" distribution with a low engagement rate (1.36%) having the highest probability.
But, as the sample increases, and becomes more representable of the population, the distribution starts to flatten on the left side, and morphs into what resembles a "normal distribution".
We can confirm this by performing the Shapiro-Wilk test that determines whether your data is normally distributed.
What actually drives the engagement rate on LinkedIn?
When you find the distribution of your variable of interest, you can start using it as your heuristic.
However, you can also start building your case for your SoMe manager by forming different hypotheses (and then either accepting or rejecting them).
H1: Posts with videos on LinkedIn have higher engagement rates.
To investigate, we can first plot the two distributions on a single chart.
This allows us to see whether it makes sense to perform a t-test to find whether there is a difference between these two means.
But, as it can be observed, the posts with videos seem to form a similar normal distribution, letting us reject the H1.
To make sure, we can still perform the t-test and find out whether the test can confirm what we've observed.
Ok, so what else?
I hypothesized about "reposts".
H2: Posts with >=10 reposts on LinkedIn have higher engagement rates.
Bingo!
This is a very nice example of finding what is driving your engagement rate on LinkedIn using distributions.
To confirm this, we can once again perform a t-test to compare these two means.
And by doing so, we find a very low p-value.
The results from the t-test lead us to confirm what we have observed with our eyes on the chart.
The posts that have gotten >= 10 reposts show statistically higher engagement rate than those that have not.
I can then go to our amazing Brand Lead & SoMe Manager and let her know about these results.
She can then get creative and take care of creating content that focuses on people wanting to repost it on their profiles if she wants to try to increase her overall engagement rate.
Other than that, based on this dataset from 3 anonymized LinkedIn pages, I can also let her know about some additional conclusions, so she can have some benchmarks in mind:
From now on, she will be able to use these distributions as heuristics in the future and start "thinking fast and distributions".
Is your cost-per-click (CPC) on LinkedIn following the "Power Law" distribution?
We didn't find evidence that the engagement rate on LinkedIn follows the highly rightly-skewed distribution.
However, I wasn't ready to give up on the "Power Law" just yet.
And I asked myself: "Would cost-per-click (CPC) on LinkedIn follow the Power Law distribution?"
In a new conversation, I wrote down a similar prompt and gave ChatGPT-4 a second attempt to redeem itself:
As seen in the screenshot above, ChatGPT confirmed that yes.
On LinkedIn, the Cost Per Click (CPC) distribution typically follows a right-skewed distribution due to a range of bids and competition levels across different industries.
This time, the answer made sense.
As, ideally, you would want your CPC to follow the "Power Law" if you or your advertising agency is doing a good job.
Most of your ads on LinkedIn should show low CPC values, and as you move to higher CPC values, you would want these to diminish and become more rare.
Thus, I decided to create a hypothesis:
H1: The CPC on LinkedIn follows the right-skewed "Power Law" distribution.
To test this, I got my hands on a dataset containing 1,144 CPC values (in $) in the IT sector.
Bingo!
It's fair to say that this time, the CPC on LinkedIn does indeed seem to follow the right-skewed "Power Law".
To simulate the "Power Law", randomized samples of 10, 50, 500, and 1,144 CPCs on LinkedIn were created and visualized below.
By placing all 4 randomized distributions on one visual with the calculated Kernel Density Estimation (KDE), you can observe how the CPC distribution changes with a bigger sample size.
As the sample increases, and becomes more representable of the population, the distribution consistently develops a longer tail, rises on the left side, and morphs into what resembles the "Power Law".
I can then go to our great CMO and Director of Revenue Marketing, and let them know about these results.
They can then anticipate that most clicks on LinkedIn should come at a lower cost, but they could expect a few clicks to be significantly more expensive, as this highly right-skewed "Power Law" suggests that spikes in CPC can occur due to less frequent but potentially highly competitive targeting and ad placements represented in the long tail.
Other than that, based on this sample dataset containing 1,144 CPCs, I can also let them know about some probability conclusions, so they can have some benchmarks in mind:
From now on, they will also be able to use this "Power Law" in the CPC context as a heuristic in the future and start "thinking fast and distributions".
Here are 3 other quick examples of distributions observed in marketing.
Customer Acquisition Cost (CAC). This metric will often in real life, with average everyday results, follow a normal distribution (especially in larger datasets where the central limit theorem applies).
Customer Lifetime Value (CLV). CLV can often follow a right-skewed distribution similar to the "Power Law" (here is also where you can apply the "Pareto Principle" which, in fact, is the "Power Law" in itself; 80% of your revenue will most likely come from only 20% of your customers).
Website Traffic. The number of visitors to a website usually follows a poisson distribution (if you are measuring the count of visitors arriving in fixed intervals of time).
What's in it for you?
I hope you now grasp the importance of recognizing distributions in real life...
...and how a) spotting, b) remembering, and c) recalling them as your heuristics can lead to faster decision-making, savings, and attracting more luck on your side when you make the next call.
And whatever your next call is, I hope the luck will be on your side :)