Thoughts on Apache Airflow AWS Lambda Operator

Thoughts on Apache Airflow AWS Lambda Operator

Apache Airflow is a popular open-source workflow management platform. Typically tasks run remotely by Celery workers for scalability. In AWS, however, scalability can also be achieved using serverless computing services in a simpler way. For example, the ECS Operator allows to run dockerized tasks and, with the Fargate launch type, they can run in a serverless environment.

The ECS Operator alone is not sufficent because it can take up to several minutes to pull a Docker image and to set up network interface (for the case of Fargate launch type). Due to its latency, it is not suitable for frequently-running tasks. On the other hand, the latency of a Lambda function is negligible so that it's more suitable for managing such tasks.

In this post, it is demonstrated how AWS Lambda can be integrated with Apache Airflow using a custom operator inspired by the ECS Operator.

Continue...

Thanks for sharing! I was wondering... It looks like the wait on the task is a busy-wait (Airflow servers) Thus, wouldn't I prefer using an Operator that executes the service and a Sensor (mode=reschedule) that checks whether the task is finished?

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics