Halodoc Technology’s Post

View organization page for Halodoc Technology, graphic

4,168 followers

3mo

At Halodoc we use Amazon Web Services (AWS) Managed Workflows for Apache Airflow (MWAA) to efficiently orchestrate and monitor complex workflows. It offers scalability, availability, and security for reliable data pipeline execution. This blog outlines best practices for optimizing an Airflow environment to reduce CPU usage and costs. Key strategies include minimizing top-level code in DAGs, decreasing DAG parsing time, and reducing the number of DAG Python files. Read on as Jitendra Bhat shows how these optimizations led to lower CPU usage and improved worker node efficiency, resulting in significant MWAA cost savings. Read the full blog here... https://lnkd.in/gs2dP7SU #HalodocTechnology #SimplifyingHealthcare #dataengineering #DAG #Airflow #DynamicDAG

Dynamic DAG Generation in Airflow: Best Practices and Use Cases

blogs.halodoc.io

To view or add a comment, sign in

More Relevant Posts

Jitendra Shah

Engineering Manager - Data @Halodoc
3mo
Report this post
Checkout how we at Halodoc Technology optimized Airflow DAG code here.

Halodoc Technology

4,168 followers
3mo

At Halodoc we use Amazon Web Services (AWS) Managed Workflows for Apache Airflow (MWAA) to efficiently orchestrate and monitor complex workflows. It offers scalability, availability, and security for reliable data pipeline execution. This blog outlines best practices for optimizing an Airflow environment to reduce CPU usage and costs. Key strategies include minimizing top-level code in DAGs, decreasing DAG parsing time, and reducing the number of DAG Python files. Read on as Jitendra Bhat shows how these optimizations led to lower CPU usage and improved worker node efficiency, resulting in significant MWAA cost savings. Read the full blog here... https://lnkd.in/gs2dP7SU #HalodocTechnology #SimplifyingHealthcare #dataengineering #DAG #Airflow #DynamicDAG

Dynamic DAG Generation in Airflow: Best Practices and Use Cases

blogs.halodoc.io
Like Comment
To view or add a comment, sign in
Devin Baird

Aspiring Data Analytics professional | SQL | R | Python | Machine Learning
5mo
Report this post
This is my portfolio project for Exploratory Data Analysis and Web Scraping using Python.

IBM Cloud Pak for Data

dataplatform.cloud.ibm.com
Like Comment
To view or add a comment, sign in
Maaz Bin Mustaqeem

Software Engineer | AWS | Backend Development
3mo
Report this post
No wonder FastAPI makes things very simple. Now let's make things simple for FastAPI and Amazon Web Services (AWS) . Here is my new blog with full code on how to easily create centralized logging for FastAPI and store them on AWS CloudWatch: #cloudwatch #aws #fastapi #logging #monitoring #python https://lnkd.in/d_EtG7ty

How to Upload FastAPI Logs to AWS CloudWatch: A Beginner’s Guide

medium.com
Like Comment
To view or add a comment, sign in
Pavan S

Aspiring Cloud Engineer with a foundation in cloud computing and AWS services. Proficient in Python, SQL, and tools like EC2, S3, and Lambda. Skilled in deploying, managing, and scaling cloud applications efficiently.
2mo
Report this post
🚀 Project Update: Building a Serverless REST API with AWS Lambda, API Gateway, and DynamoDB Using Python 🚀 I’m thrilled to share my latest project, where I developed a serverless REST API using key AWS services: AWS Lambda for efficient, scalable backend processing API Gateway to securely expose the API DynamoDB for high-performance, noSQL data storage Using Python, I built endpoints that perform seamless CRUD operations, making it a highly available and cost-effective solution. This project allowed me to deepen my skills in cloud-native application development and serverless architecture. 🔗 Check out the project here: https://lnkd.in/gzmxb-k6 Looking forward to applying these skills in future projects and connecting with others in the cloud community! 🌐📊 #AWS #Serverless #Python #CloudComputing #APIGateway #DynamoDB #AWSLambda #RESTAPI #Project

"Building a Serverless REST API with AWS Lambda, API Gateway, and DynamoDB Using Python"

dev.to
Like Comment
To view or add a comment, sign in
Pritam Dodeja

AI/ML Practice Lead
3mo
Report this post
Imagine you were building a distributed operating system. What would it look like? It would have distributed storage. GCS It would have process isolation/namespaces. Kubernetes It would have the ability to do data processing in a distributed fashion and read data from anywhere, whether it was batch or stream. Dataflow It would have processes. Cloud functions/Cloud Run It would have Interprocess Communication. Pub/Sub It would have the ability to cache long running computations. TFX + Vertex It would support storing data column-wise. BigQuery It would support storing data row-wise. CloudSQL many options It would support storing nosql data. CloudSQL many options What got me down this rabbit hole you ask? I have hit the limits of scaling within an operating system in terms of "distributed" computation via apache beam and cuda via tensorflow. Now to get rid of those quotes, I need to do object storage externalization, and then post that, do python packaging/containerization such that the pipeline could be truly hybrid, some parts run on-prem, some parts in cloud A, some parts in cloud B. Lot's to learn! #operatingsystems #beam #io #distributedcomputation

2 Comments
Like Comment
To view or add a comment, sign in
SystemsDigest

314 followers
2mo
Report this post
The latest update for #Integrateio includes "Unleashing the Power of #AmazonRedshift #Analytics" and "#Python Code Transformations for Efficient #ETL Pipelines". #DataAnalytics #DataPipelines https://lnkd.in/ePvwyvTQ

Integrate

systemsdigest.com
Like Comment
To view or add a comment, sign in
Nitin K.

Talent Acquisition Specialist at MongoDB | Strategic Sourcing | #hiring
8mo
Report this post
Just announced at #MongoDBlocal NYC: Atlas Stream Processing is GA and ready to support your production workloads! With this developers can: - Effortlessly handle complex and rapidly changing data structures - Use the familiar MongoDB Query API for processing streaming data - Seamlessly integrate with MongoDB Atlas - Benefit from a fully managed service that eliminates operational overhead Dive deeper into the announcement today. https://lnkd.in/g79qXPHy #artificialintelligence #ai #machinelearning #technology #datascience #python

Atlas Stream Processing is Now Generally Available! | MongoDB Blog

mongodb.com
Like Comment
To view or add a comment, sign in
José David Arévalo Espinosa

Lead Data Engineer at Thoughtworks
9mo
Report this post
🎉 Excited to introduce AtomicExecutionControl, now available on PyPI! 🎉 In our journey towards more efficient and reliable distributed applications, managing atomic operations and preventing race conditions are pivotal challenges, especially across AWS services like Lambda, Fargate, and EC2. That's where AtomicExecutionControl steps in. This Python library is crafted with the complexities of distributed systems in mind, offering a robust solution to ensure that each task in your application is executed exactly once, mitigating risks of duplicate processing and enhancing overall efficiency. Key Features: * Atomic Execution: Guarantee exclusive processing for each task. * Status Management: Real-time tracking of task execution status. * Timeout Handling: Automatic handling of execution stalls and failures. * Easy Integration: Seamlessly integrate with your existing AWS infrastructure. Getting Started is as Simple as: pip install atomic_execution_control Whether you're orchestrating microservices, ensuring data integrity, or managing event-driven workflows, AtomicExecutionControl is designed to make your applications more robust and 'atomic'. I’m looking forward to seeing how it can streamline your projects and solve the critical challenges of task coordination in distributed applications. Your feedback, questions, and contributions are what will help this project grow and improve. Let’s make our distributed systems more reliable together! 👉 Check it out and let me know your thoughts! Also, feel free to reach out if you encounter any issues or have suggestions for improvement. Contributions are always welcome! GitHub: https://lnkd.in/emsuHK_y PyPI: https://lnkd.in/e9uxH_Jv #OpenSource #Python #Serverless #AWS #DataEngineering #CloudComputing #DistributedSystems #AtomicExecution

atomic-execution-control

pypi.org
Like Comment
To view or add a comment, sign in
Pravallika Seela Padma

Year 3 student at Nanyang Polytechnic | AI and Data Engineering
6mo
Report this post
🎓 Proud to announce that I've completed the "Learning Apache Airflow" course led by Janani Ravi, a renowned Google Cloud Architect and Data Engineer. This advanced course has been a game-changer for me, providing deep insights into the world of workflow automation. Through this training, I've mastered the art of designing and scheduling intricate workflows, managing task dependencies, and automating batch processes—all within the robust framework of Apache Airflow. I've learned to program these workflows programmatically using Python, which enhances my ability to create flexible and scalable automation solutions. The course also covered essential features like conditional branching and the mechanisms of catch-up and backfill, which are crucial for maintaining data integrity and consistency in automated tasks. I'm now well-equipped to streamline IT operations and drive efficiency in any tech-driven environment. 🔗 View My Certificate https://lnkd.in/eSEeesXi #ApacheAirflow #ITAutomation #WorkflowManagement #ProfessionalGrowth

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Kuldeep Dhadhwal

Senior Engineer | AWS Cloud specialist | Machine learning
4mo Edited
Report this post
So while working on an issue I came accross about aws lambda inmemory caching which can be done using global variables and like Lambda's design that emphasizes statelessness, which means that each invocation of a function should be independent of any previous invocations. However, the re-use of execution environments (containers) by AWS Lambda creates an opportunity for a form of in-memory caching, which can significantly improve the performance of your function. When AWS Lambda reuses an execution environment, any variables or data initialized outside of the main function handler e.g (lambda_handler) can persist between invocations. This persistence allows you to store data in these global variables, effectively using them as a cache. import time # Global variable to act as a cache cache_variable = None def lambda_handler(event, context): global cache_variable if cache_variable is None: cache_variable = "data fetched at " + time.strftime("%H:%M:%S") return cache_variable So now When your Lambda function is invoked for the first time in a fresh execution environment, the global variable cache is None. The lambda_handler function checks this and finds that the cache_variable is empty, so it "fetches data" (simulated by setting cache_variable to a string with the current time) and stores it in the cache variable. This fetched data is then returned as the response. If the Lambda container is reused for another invocation, the global variable cache will still hold the value from the previous invocation. lambda_handler will see that cache_variable is not None and will skip the slow operation, directly returning the cached data instead. I hope this read might help you at some point of time in the future, and will come up sooner with complete article in my next post. Thanks #aws #cache #python #issue #lambda
Like Comment
To view or add a comment, sign in

4,168 followers

View Profile Connect

Halodoc Technology’s Post

More Relevant Posts

Explore topics