At Halodoc we use Amazon Web Services (AWS) Managed Workflows for Apache Airflow (MWAA) to efficiently orchestrate and monitor complex workflows. It offers scalability, availability, and security for reliable data pipeline execution. This blog outlines best practices for optimizing an Airflow environment to reduce CPU usage and costs. Key strategies include minimizing top-level code in DAGs, decreasing DAG parsing time, and reducing the number of DAG Python files. Read on as Jitendra Bhat shows how these optimizations led to lower CPU usage and improved worker node efficiency, resulting in significant MWAA cost savings. Read the full blog here... https://lnkd.in/gs2dP7SU #HalodocTechnology #SimplifyingHealthcare #dataengineering #DAG #Airflow #DynamicDAG
Halodoc Technology’s Post
More Relevant Posts
-
Checkout how we at Halodoc Technology optimized Airflow DAG code here.
At Halodoc we use Amazon Web Services (AWS) Managed Workflows for Apache Airflow (MWAA) to efficiently orchestrate and monitor complex workflows. It offers scalability, availability, and security for reliable data pipeline execution. This blog outlines best practices for optimizing an Airflow environment to reduce CPU usage and costs. Key strategies include minimizing top-level code in DAGs, decreasing DAG parsing time, and reducing the number of DAG Python files. Read on as Jitendra Bhat shows how these optimizations led to lower CPU usage and improved worker node efficiency, resulting in significant MWAA cost savings. Read the full blog here... https://lnkd.in/gs2dP7SU #HalodocTechnology #SimplifyingHealthcare #dataengineering #DAG #Airflow #DynamicDAG
Dynamic DAG Generation in Airflow: Best Practices and Use Cases
blogs.halodoc.io
To view or add a comment, sign in
-
This is my portfolio project for Exploratory Data Analysis and Web Scraping using Python.
IBM Cloud Pak for Data
dataplatform.cloud.ibm.com
To view or add a comment, sign in
-
No wonder FastAPI makes things very simple. Now let's make things simple for FastAPI and Amazon Web Services (AWS) . Here is my new blog with full code on how to easily create centralized logging for FastAPI and store them on AWS CloudWatch: #cloudwatch #aws #fastapi #logging #monitoring #python https://lnkd.in/d_EtG7ty
How to Upload FastAPI Logs to AWS CloudWatch: A Beginner’s Guide
medium.com
To view or add a comment, sign in
-
🚀 Project Update: Building a Serverless REST API with AWS Lambda, API Gateway, and DynamoDB Using Python 🚀 I’m thrilled to share my latest project, where I developed a serverless REST API using key AWS services: AWS Lambda for efficient, scalable backend processing API Gateway to securely expose the API DynamoDB for high-performance, noSQL data storage Using Python, I built endpoints that perform seamless CRUD operations, making it a highly available and cost-effective solution. This project allowed me to deepen my skills in cloud-native application development and serverless architecture. 🔗 Check out the project here: https://lnkd.in/gzmxb-k6 Looking forward to applying these skills in future projects and connecting with others in the cloud community! 🌐📊 #AWS #Serverless #Python #CloudComputing #APIGateway #DynamoDB #AWSLambda #RESTAPI #Project
"Building a Serverless REST API with AWS Lambda, API Gateway, and DynamoDB Using Python"
dev.to
To view or add a comment, sign in
-
Imagine you were building a distributed operating system. What would it look like? It would have distributed storage. GCS It would have process isolation/namespaces. Kubernetes It would have the ability to do data processing in a distributed fashion and read data from anywhere, whether it was batch or stream. Dataflow It would have processes. Cloud functions/Cloud Run It would have Interprocess Communication. Pub/Sub It would have the ability to cache long running computations. TFX + Vertex It would support storing data column-wise. BigQuery It would support storing data row-wise. CloudSQL many options It would support storing nosql data. CloudSQL many options What got me down this rabbit hole you ask? I have hit the limits of scaling within an operating system in terms of "distributed" computation via apache beam and cuda via tensorflow. Now to get rid of those quotes, I need to do object storage externalization, and then post that, do python packaging/containerization such that the pipeline could be truly hybrid, some parts run on-prem, some parts in cloud A, some parts in cloud B. Lot's to learn! #operatingsystems #beam #io #distributedcomputation
To view or add a comment, sign in
-
The latest update for #Integrateio includes "Unleashing the Power of #AmazonRedshift #Analytics" and "#Python Code Transformations for Efficient #ETL Pipelines". #DataAnalytics #DataPipelines https://lnkd.in/ePvwyvTQ
Integrate
systemsdigest.com
To view or add a comment, sign in
-
Just announced at #MongoDBlocal NYC: Atlas Stream Processing is GA and ready to support your production workloads! With this developers can: - Effortlessly handle complex and rapidly changing data structures - Use the familiar MongoDB Query API for processing streaming data - Seamlessly integrate with MongoDB Atlas - Benefit from a fully managed service that eliminates operational overhead Dive deeper into the announcement today. https://lnkd.in/g79qXPHy #artificialintelligence #ai #machinelearning #technology #datascience #python
Atlas Stream Processing is Now Generally Available! | MongoDB Blog
mongodb.com
To view or add a comment, sign in
-
🎉 Excited to introduce AtomicExecutionControl, now available on PyPI! 🎉 In our journey towards more efficient and reliable distributed applications, managing atomic operations and preventing race conditions are pivotal challenges, especially across AWS services like Lambda, Fargate, and EC2. That's where AtomicExecutionControl steps in. This Python library is crafted with the complexities of distributed systems in mind, offering a robust solution to ensure that each task in your application is executed exactly once, mitigating risks of duplicate processing and enhancing overall efficiency. Key Features: * Atomic Execution: Guarantee exclusive processing for each task. * Status Management: Real-time tracking of task execution status. * Timeout Handling: Automatic handling of execution stalls and failures. * Easy Integration: Seamlessly integrate with your existing AWS infrastructure. Getting Started is as Simple as: pip install atomic_execution_control Whether you're orchestrating microservices, ensuring data integrity, or managing event-driven workflows, AtomicExecutionControl is designed to make your applications more robust and 'atomic'. I’m looking forward to seeing how it can streamline your projects and solve the critical challenges of task coordination in distributed applications. Your feedback, questions, and contributions are what will help this project grow and improve. Let’s make our distributed systems more reliable together! 👉 Check it out and let me know your thoughts! Also, feel free to reach out if you encounter any issues or have suggestions for improvement. Contributions are always welcome! GitHub: https://lnkd.in/emsuHK_y PyPI: https://lnkd.in/e9uxH_Jv #OpenSource #Python #Serverless #AWS #DataEngineering #CloudComputing #DistributedSystems #AtomicExecution
atomic-execution-control
pypi.org
To view or add a comment, sign in
-
🎓 Proud to announce that I've completed the "Learning Apache Airflow" course led by Janani Ravi, a renowned Google Cloud Architect and Data Engineer. This advanced course has been a game-changer for me, providing deep insights into the world of workflow automation. Through this training, I've mastered the art of designing and scheduling intricate workflows, managing task dependencies, and automating batch processes—all within the robust framework of Apache Airflow. I've learned to program these workflows programmatically using Python, which enhances my ability to create flexible and scalable automation solutions. The course also covered essential features like conditional branching and the mechanisms of catch-up and backfill, which are crucial for maintaining data integrity and consistency in automated tasks. I'm now well-equipped to streamline IT operations and drive efficiency in any tech-driven environment. 🔗 View My Certificate https://lnkd.in/eSEeesXi #ApacheAirflow #ITAutomation #WorkflowManagement #ProfessionalGrowth
Certificate of Completion
linkedin.com
To view or add a comment, sign in
-
So while working on an issue I came accross about aws lambda inmemory caching which can be done using global variables and like Lambda's design that emphasizes statelessness, which means that each invocation of a function should be independent of any previous invocations. However, the re-use of execution environments (containers) by AWS Lambda creates an opportunity for a form of in-memory caching, which can significantly improve the performance of your function. When AWS Lambda reuses an execution environment, any variables or data initialized outside of the main function handler e.g (lambda_handler) can persist between invocations. This persistence allows you to store data in these global variables, effectively using them as a cache. import time # Global variable to act as a cache cache_variable = None def lambda_handler(event, context): global cache_variable if cache_variable is None: cache_variable = "data fetched at " + time.strftime("%H:%M:%S") return cache_variable So now When your Lambda function is invoked for the first time in a fresh execution environment, the global variable cache is None. The lambda_handler function checks this and finds that the cache_variable is empty, so it "fetches data" (simulated by setting cache_variable to a string with the current time) and stores it in the cache variable. This fetched data is then returned as the response. If the Lambda container is reused for another invocation, the global variable cache will still hold the value from the previous invocation. lambda_handler will see that cache_variable is not None and will skip the slow operation, directly returning the cached data instead. I hope this read might help you at some point of time in the future, and will come up sooner with complete article in my next post. Thanks #aws #cache #python #issue #lambda
To view or add a comment, sign in
4,168 followers