Pacing and Think Time in Load Test Design
Continuing on my mission to clarify performance testing and engineering concepts that are commonly misunderstood, today I'm writing about the concept of think time and pacing, which are central to the definition of a load test scenario.
What is Pacing?
Pacing refers to the cadence or frequency of occurrence of one iteration of a test execution thread. For example "1 bill payment every 10 seconds".
A test execution thread is an individual unit that will execute one or multiple operations in sequence to emulate a user or consumer transactional flow.
Pacing is what helps you achieve your target input rate during test execution. It is important to understand that pacing covers a flow, not a single transaction or request, since a user or consumer flow can include multiple requests in a sequence.
For example a "Bill Payment" flow can include login, open bill section, opening pay section, submitting payment information, displaying pay confirmation, and logout transactions as part of the flow.
If it is your preference to model after individual operations or requests that is your choice, just be mindful on what pacing really is to properly calculate it.
Within a workload model of service queue based systems (most web applications and web services), you can identify it as "W" in Little's Law:
L=λW
where L = average customers in the system, λ = average arrival (input) rate, W = average time in system.
Be mindful you will need to stablish your λ for each user flow, module or transaction you set in each execution thread. Also remember you could have multiple different execution thread groups running in parallel during your test execution. (e.g. multiple Threadgroups in JMeter)
What is Think Time?
Think time refers to the waits or pauses within a test execution thread iteration.
If you look at W in Little's Law formula above, you'll realize the average time in system for a user is both the time it takes for the user to be served and the time the user spends or waits in the system.
For example, if I have in my system a single transaction with an average response time of 500ms but my observed input rate is 2 requests per second, my W or pacing will be the sum of 500ms response time and 1.5secs waiting. This waiting is my think time.
There are two important purposes of think time:
We should strive to achieve the most uniform input rate possible during our test executions, with consistent intervals between one transaction and another in order to obtain the best statistical data about system behavior. Introducing think time between transactions in a test execution thread helps us achieve this uniformity.
What I mean is it's not the same to perform all transactions in a row and wait before the next iteration, than wait in between transactions; the load to the system will be different since we will be saturating it at a moment, and relieving it later.
Although mathematically speaking this shouldn't affect the resulting metrics, it does affect the system's reaction to the load, and could raise false red flags due to artificial peaks that don't occur in real life.
Recommended by LinkedIn
We don't have to spend so much time trying to perfectly emulate variations in load behavior (unless it's a noticeable spike) since from a probabilistic perspective the return of this effort is negligible, but we should be mindful of being uniform and consistent to achieve predicting results.
Putting it all together.
Exercise: A bill payment system is designed as a SaaS pop up window, which will receive information on a customer and bill, process the payment and return the result to the parent application.
The flow and average response times observed are as follows: presenting bill information (300ms), capturing payment method details (300ms), submitting payment for processing (1100ms), communicate result to user and parent system (300ms).
The system is observed to process 7,200 payments in an hour. We need a load test to validate it in a 100% scaled test environment: what pacing and uniform think time would you define for this flow?
The general recommendation is to leave at a minimum 50% of buffer time between iterations, to account for variances in response time (degradations) and keep our throughput (output rate) consistent. In this case let's target 100% time buffer.
Total average response time of the payment flow: 2 seconds (300ms + 300ms + 1100ms + 300ms).
Total think time: 2 seconds (100% buffer).
Answer: following the recommendation, we'll target a pacing of 4 seconds (2 response time + 2 think time), and 250ms think time after each transaction (2secs / 4 transactions).
Effect on Concurrency.
Now that we have defined the pacing for our execution thread, we can derive the number of concurrent threads we will need to achieve the desired input rate of 7,200 payments per hour, or 2 payments per second.
With a pacing of 1 payment every 4 seconds, we will need at least 8 concurrent threads to achieve our input rate:
1 payment / 4 seconds = 0.25 payments per second per thread. 2 payments per second / 0.25 payments per second per thread = 8 threads
Final Thoughts.
As you can see, understanding and defining pacing is critical to define the minimum concurrency in our workload model.
We can then adjust our workload model to a higher concurrency and increase the pacing interval as needed, to have a load model that is more representative of actual user behavior in the system.
Remember a thread is not the same as a user: you can emulate multiple distinct users in a single execution thread.
Think time is used within a transaction to both help achieve the desired pacing and more realistically emulate real user behavior. It also helps us attain a uniform interval of requests to the system, that will help us obtain more statistically significant results.
I hope this article has been helpful to clarify these concepts and how they are applied in performance testing, particularly in load modeling.