Run your application in a robust and safe environment using AWS Step Functions
by Jan Borowski
Reliability, complexity and security are key requirements in modern IT projects. Across all industries, nowadays’ users need to access online services to perform increasingly complex tasks without ever encountering technical issues or fearing for the safety of their data.
Cloud technology addresses at once most of these major aspects, taking care of all key concerns.
However, cloud technologies are only leveraged at best when the architecture is well designed. In this article, we explain how to use AWS Step Functions to ensure a safe and reliable environment in which to run your application.
According to the AWS documentation:
“AWS Step Functions makes it easy to coordinate the components of distributed applications as a series of steps in a visual workflow. You can quickly build and run state machines to execute the steps of your application in a reliable and scalable fashion.”
Step Functions, also known as State Machines, serve as an orchestrator provided by AWS. They are akin to tools such as Apache Airflow, with the extra benefit of being integrated with nearly all AWS services. Therefore, they enable seamless connection among them. In addition, AWS Step Functions offers a clear and user-friendly graphic interface, with editing options which are simpler than those of Apache Airflow. AWS Step Functions enable a vast range of tasks, from running Lamba's functions in parallel to deploying highly intricate workflows (execution and termination).
In this article, we will focus on a simpler case study, aiming at isolating key features of AWS Step Functions to present them separately and explain their full benefit. This use case will illustrate how AWS Step Functions enable high control, high availability and scalability of your application.
Case Study: AWS Step Functions as a place to run code
We consider the scenario in which a central service, that we call SHOP, needs to be integrated with other devices or services. The typical setup is illustrated in Figure 1.
This setup, without specifying the nature of Service A and B already illustrates a key issue, arising from the synchronization of service A and B with the SHOP. What would happen if a user initiated a time-consuming action in System A? In such case, the SHOP session will be open but inactive until the action is complete. As no other SHOP is opened in this scenario, the SHOP cannot be interacted with.
2. Using AWS Step Functions to enable asynchronous operations
How can we enable accessibility in case of time-consuming actions? The solution lies in asynchronous operations. In this scenario, AWS Step Functions and AWS Lambda could answer our need. They both offer the capability to run asynchronously and can be configured to send a callback upon completion or failure of a process. However, AWS Lambda functions are restricted in term of duration. Lambda invocations cannot exceed 15 minutes, and this strongly restricts the complexity they can handle. For this reason, AWS Step Functions is the only suitable option of the two, in this context.
Now, with an asynchronous solution in place, does it guarantee a fully robust system? No entirely. It would suffice if we had full control over System A and B, but this cannot always be the case. For example, if we connect our SHOP to SnowFlake or SAP, we would need to rely on these systems to relay information back. If, for instance, the process completes successfully in these systems, but that the callback fails, our SHOP would remain unaware of an action taken in System A, which is unacceptable.
To address this challenge, we can once more rely on AWS Step Function.
3. Using an extra AWS Step Function as Wrapper to guarantee consistent feedback
By introducing an extra AWS Step Function, we enable users to trigger an action in SYSTEM A while ensuring that they consistently receive information back, regardless of the outcome. This achieves an asynchronous process while guaranteeing that we maintain full control of our system over the entire operation.
This trigger is called a Wrapper function that we will now detail.
The Wrapper function itself is a straightforward step function, primarily used to encapsulate the 'Run Integration' step, responsible for calling System A. Once the execution of System A concludes, the subsequent step in this function catches the response: SUCCESS or ERROR. It transforms this response into a format compatible with our system and consistently returns it to the SHOP. This setup ensures that, regardless of what occurs in System A, our system SHOP always receives consistent feedback
In the following, we detail the code utilized for interfacing with other systems, specifically within the 'Run Integration' Step.
{
"Comment": "An example of combining workflows using a Step Functions StartExecution task state with various integration patterns.",
"StartAt": "Run integration",
"States": {
"Run integration": {
"Comment": "Start an execution of the same 'NestingPatternAnotherStateMachine' and wait for its completion",
"Type": "Task",
"Resource": "arn:aws:states:::states:startExecution.sync",
"Parameters": {
"StateMachineArn.$": "$.execution_arn",
"Input.$": "$.payload"
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Generate Error description",
"Comment": "In case of error prep payload",
"ResultPath": "$.input.error"
}
],
"Next": "Final Transformation",
"ResultPath": "$.input.result"
}
Two points on this code are of special interest:
Recommended by LinkedIn
"Parameters": {
"StateMachineArn.$": "$.execution_arn",
"Input.$": "$.payload"
}
Values marked as “$.” are parameters passed from a SHOP and can be different in each execution. It follows that one wrapper function can be used to handle many outside systems and scenarios without need for more specialized functions and changes in the code.
This solution is both robust and reliable over time. It does not have the time restriction of, for example, AWS Lambda, and could run up to a year if needed and SYSTEM A step function can also integrate more AWS services.
AWS Step Function as a SAFE place to run code
In the previous section, we illustrated how AWS Step Functions enable a robust solution for executing processes asynchronously. In the following section, we expand the approach to enable an additional connection between System A and System B, as illustrated below:
This additional step involves an extra challenge: security. In our system, authentication is managed by SHOP. Therefore, we lack a mechanism that prevents SYSTEM A from performing unauthorized actions within SYSTEM B. With this new connection, we need an authentication mechanism that would not rely on SHOP. How can we address this issue?
A first step in the good direction consists in adding an AWS Step Functions connected to System A and System B and preventing a direct connection between both systems. This additional step function is called Handover, and its role is like a Wrapper function. However, instead of handling calls from SHOP to SYSTEM A, it handles calls SYSTEM A-SYSTEM B. The workflow of an example requests where SYSTEM A needs to invoke system B will look like this.
This solution looks very good, but it is still missing one crucial component. As we mentioned at the beginning, we are interested in safely running our code, and potentially the code of others, in our AWS infrastructure. As it is, our solution is not safe yet.
To illustrate the problem, we are considering one more system, let's call it SYSTEM C. Now we need to ensure that SYSTEM A can call SYSTEM B while SYSTEM C cannot. Such access restriction is not possible in this solution. The handover function cannot identify which system calls it. This is a fundamental shortcoming of step function and solving it is a final step to ensure a safe solution. To solve this, we include the utilization of tokens.
This token is an encrypted name of the system and only wrapper and handover functions can decrypt it. They basically serve as a signature. The key is known only to the wrapper and the handover. Therefore, SYSTEM A needs to pass a token that it gets from wrapper since it is incapable of creating its own in this way wrapper always knows which system is using it.
With this alteration to the flow, it now looks as follow:
This workflow provides a standard solution for safety running process from multiple systems on one AWS account.
Conclusion and extra security concerns
With this, we have achieved a generic solution that can be used in many contexts, as we purposedly did not specify System A and System B. We have introduced systematic feedback in our architecture using Wrapper functions, without relying on the full control over System A and B. We also discussed how to enable safe connections between system A and B using Handover functions and token.
However, this solution, which is rather standard, has shortcomings. While this setup can operate within a single AWS account, managing multitenancy becomes a responsibility—ensuring that developers from System A cannot interfere with those from System B, and so on.
A straightforward solution to this problem consists in employing multiple accountants, which incurs additional costs. For a more cost-effective alternative, the current setup suffices.
In addition, the security standards provided by our solution do not cover all security issues. This solution is designed to provide a secure environment for running code and its safety primarily guards against accidental issues and poorly written code. Using tokens is a standard practice enhancing safety, yet tokens alone do not shield against malicious actions. For instance, if SYSTEM A shares its token with SYSTEM C, the latter can illicitly call SYSTEM B, which is not permitted. Consequently, solutions like this are better suited for deployment within a Virtual Private Cloud (VPC) where there are existing safeguards against malicious actions.
Machine Learning Reply GmbH great point about breaking down complex workflows! Has anyone used Step Functions to improve a specific business process?