Thoughts on Apache Airflow AWS Lambda Operator
Apache Airflow is a popular open-source workflow management platform. Typically tasks run remotely by Celery workers for scalability. In AWS, however, scalability can also be achieved using serverless computing services in a simpler way. For example, the ECS Operator allows to run dockerized tasks and, with the Fargate launch type, they can run in a serverless environment.
The ECS Operator alone is not sufficent because it can take up to several minutes to pull a Docker image and to set up network interface (for the case of Fargate launch type). Due to its latency, it is not suitable for frequently-running tasks. On the other hand, the latency of a Lambda function is negligible so that it's more suitable for managing such tasks.
In this post, it is demonstrated how AWS Lambda can be integrated with Apache Airflow using a custom operator inspired by the ECS Operator.
Thanks for sharing! I was wondering... It looks like the wait on the task is a busy-wait (Airflow servers) Thus, wouldn't I prefer using an Operator that executes the service and a Sensor (mode=reschedule) that checks whether the task is finished?