Tracking Asynchronous Lambda Function Invocation Lifecycle
Before we begin, if you enjoy staying updated on how different design challenges are being addressed based on specific use cases, consider following me for more such stories.
You might wonder why not simply use a synchronous invocation. With synchronous invocation, you can immediately determine if the function has completed or failed. You’re probably right, but solutions are defined based on specific use cases. Without further delay, let's first understand the use case.
Use Case
For asynchronous invocation, Lambda places the event in a queue and returns a success response without additional information. A separate process reads events from the queue and sends them to your function.
The following diagram shows clients invoking a Lambda function asynchronously. Lambda queues the events before sending them to the function.
So far, we have understood how asynchronous Lambda function invocation works. Now, let's get into the details of the application setup.
A task is created by some other process to process the data. We will not go into details of how the task is created, as that is out of scope for this article. The task has its lifecycle defined with four statuses: created, leased, done, and failed. Status created indicates the task is created, status leased indicates the task has been picked up for processing, status done indicates the task has completed processing, and status failed indicates the task has failed. These tasks are stored in a PostgreSQL task table.
The Lambda function processes the tasks. Tasks in the created state are waiting to be processed. One Lambda function can process up to three tasks in some special scenarios. While existing tasks are being processed, new tasks can be created.
The average task runtime is approximately 10 minutes, which eliminates the possibility of invoking the Lambda function synchronously and waiting for it to complete.
To determine the number of asynchronous Lambda function invocations required, three important variables come into play:
Information on the number of tasks in the created state is available in the task table, but the queued Lambda function and ready state Lambda counts are not readily available. We began brainstorming ways to get the Lambda queue size and concluded with a new design to solve the problem. In the next section, let's delve deeper to learn more.
Recommended by LinkedIn
Approach
At first, instead of reinventing the wheel, we explored whether there was a well-known proven way to get the queue size.
AWS doesn't provide a first-class way to get the queue size; there are a few other methods that provide approximate queue sizes, such as using SQS and CloudWatch. If you use Amazon SQS to trigger Lambda functions, you can monitor the SQS queue length directly using CloudWatch metrics for SQS. The metric ApproximateNumberOfMessagesVisible gives you the number of messages available for retrieval from the queue. Knowing the approximate queue size would not be very efficient and can cause extra Lambda invocations, which has direct cost implications.
We decided to replicate the queue in a persistent store. We created a new table that can define the Lambda invocation uniquely and track the lifecycle of the Lambda function. The lifecycle is defined with statuses such as created, pending, ready, running, succeeded, failed, orphaned, and terminated.
Created Status: Insert the record in the table and get the autogenerated unique ID by insertion. Pass that unique ID to the Lambda function request payload to tag the Lambda function with the record inserted in the table. Using the unique ID, the Lambda function can update its status in the table.
Ready Status: The Lambda function will update the status to the ready state as the first step after it starts running and before processing the tasks.
Not covering other statuses as those are out of the scope of this discussion and very specific to use-case.
We are interested in the queued lambda function count and ready state functions count to determine the number of Lambda function invocations. We have necessary information to precisely determine the number of lambda invocations.
Conclusion
Tracking the lifecycle of asynchronous Lambda function invocations is essential for effectively managing and scaling serverless applications. While synchronous invocations provide immediate feedback on function completion, they are not always practical for long-running tasks. Designing a system to replicate the Lambda queue in a persistent store, we can accurately track the status of each invocation.
By leveraging a custom table to track the lifecycle of Lambda functions, we gain better control and visibility.
In scenarios where AWS does not provide a direct solution, such as obtaining precise queue sizes, innovative design and problem-solving become crucial. By implementing a robust tracking system, we can minimize unnecessary Lambda invocations, reducing costs and optimizing performance.
Thank you for reading. If you found this blog helpful, follow me for more such stories.
Accounts manager
2moAmit, 👍