Data Engineer Things’ Post

View organization page for Data Engineer Things, graphic

37,157 followers

💨 The ultimate test of your Docker Image: Running in GitHub Actions 🖋️ Author: Peter Flook 🔗 Read the article here: https://lnkd.in/eDTaMPjm ------------------------------------------- ✅ Follow Data Engineer Things for more insights and updates. 💬 Hit the 'Like' button if you enjoyed the article. ------------------------------------------- #dataengineering #docker #github #data

The ultimate test of your Docker Image: Running in GitHub Actions

blog.det.life

To view or add a comment, sign in

More Relevant Posts

Gian Maria Ricci

Microsoft MVP - Developer Technologies Azure DevOps
7mo
Report this post
I'm creating two repo for Kernel Memory extension First one is ElasticSearch support https://lnkd.in/dA2pj3vW, we already have an ElasticSearch support for KM, but I use a different approach (dynamic template mapping) and also support keyword search. In extension package I'm starting to port some extension I wrote for KM, in current version (https://lnkd.in/dyXDP9v5) you have a search pipeline that allows you to pipe multiple search results with re-ranking. (cohere support) Also I'm porting the code to use a local python fast api server for embedding and reranking that you can use to run models directly from huggingface with very little effort. Elasticsearch is already on nuget, extension will be published once the code is ready for at least some minimal demos.

GitHub - alkampfergit/KernelMemory.Elasticsearch: Implementation of IMemoryDb for Microsoft Kernel Memory

github.com
Like Comment
To view or add a comment, sign in
Jose Torrado

Data Engineer @ Ford | Software Engineer | DevOps | Machine Learning
5mo Edited
Report this post
I have been really curious about Go - with all the hype around it, I want to see how it can level up my #dataengineering stack. I decided to learn enough Go this weekend to write a useful CLI based Task Tracker app that will write my daily standup updates for me. Sometimes I work on so many things in a day that I tend to forget! 📝 💡 A few of the highlights of my learning: - Simplicity: Go has a straightforward syntax that made it easy to learn (enough) and implement - Performance: Keep in mind it is a simple use case, but it compiles and runs extremely fast - Testing: I really love the built-in testing suite Go provides. Makes it a breeze to apply TDD and test code quickly Excited to implement a project more related to data engineering next. Thinking of a high performance ETL or real time processing pipeline - we'll see :) 🚀 Checkout the source code if you are interested: https://lnkd.in/eH_7pYsy #go #golang #cli #weekendproject

GitHub - JoseTorrado/todo-cli: CLI based To-Do application written in Go

github.com

1 Comment
Like Comment
To view or add a comment, sign in
Praveen Durairaju

Field CTO at Hasura | Solving data access for the AI world
3w
Report this post
As an Open Source maintainer, managing GitHub projects can be overwhelming 🐱 I built an AI Assistant with PromptQL to analyze GitHub repo data. I chose the popular Next.js repo from Vercel which had about ~3k open issues and used PromptQL to help prioritize issues. Here’s the magic: When I asked it to help prioritize the 10 most recent issues, PromptQL retrieved the data and created a query plan to: 1️⃣ Analyze GitHub issues and comments and get the title/body content. 2️⃣ Use a language model to classify the issue priority as high, medium and low. 3️⃣ Store the results in an artifact and summarize the findings. The assistant gave me actionable insights faster than I ever could on my own! If you’re an OSS maintainer like me, juggling hundreds of issues, this is definitely going to help with productivity. I work with multiple repositories across Hasura and would love to experiment with that as a next step. 💡 Check out the demo and let me know: What would YOU ask PromptQL to do for your GitHub projects? #PromptQL #AgenticRAG #AgenticAI #GitHub

2 Comments
Like Comment
To view or add a comment, sign in
David Caleb Chaparro Orozco

Data Engineer | Python Developer | Artificial Intelligence | Machine Learning | Big Data | Data Analytics |
7mo
Report this post
🚀💻🔍 Day 43: PySpark - Quickstart with DataFrame ⏰🔍 Today, I delved into the world of big data processing with PySpark by exploring the Quickstart guide for DataFrames. PySpark empowers us to tackle large-scale datasets efficiently using the Apache Spark framework. By following the Quickstart guide, I learned how to create, manipulate, and analyze DataFrames, which are distributed collections of data organized into named columns. Join me on this journey as I harness the power of PySpark to address big data challenges with ease and scalability. Stay tuned for more updates on my data exploration journey, and don't forget to check out my Daily Projects repository for further insights: https://lnkd.in/egFAbkK6 #PySpark #BigData #DataFrames #ApacheSpark #DataProcessing

GitHub - DavidCalebChaparroOrozco/Daily-Projects: This repository contains my daily projects, challenging me to create something new every day. From small utilities to full-fledged applications, they reflect my growth as a developer. Each project has a brief description of technologies, challenges and lessons. It is a record of my progress and a constantly evolving portfolio.

github.com
Like Comment
To view or add a comment, sign in
Lester Cerioli

Senior Go Software Engineer || Senior Python Developer || Senior Next Js Software Engineer || C++ Senior Software Engineer || Senior Data Engineer || Senior Devops Engineer
2mo
Report this post
🚀 New Project Alert: Data Migration from PostgreSQL to Azure Data Lake using Python and Flask 🔥 I’ve just completed a repository showcasing a streamlined data migration from PostgreSQL to Azure Data Lake using Python and Flask! This project highlights how you can efficiently extract data from your relational database, transform it into a structured format, and then load it directly into Azure Data Lake for scalable and secure storage. Here are some key benefits of integrating Azure Data Lake with your backend: 🔹 Unlimited Scalability: Azure Data Lake offers virtually unlimited storage and can handle both structured and unstructured data, ensuring your backend remains performant, no matter the data size. 🔹 Seamless Integration: With Python and Flask, it’s easy to integrate your backend with Azure’s powerful tools, allowing you to fetch, analyze, and process data on-demand. 🔹 Cost Efficiency: Azure Data Lake offers tiered storage pricing, meaning you can optimize costs based on your access and performance needs. 🔹 Advanced Analytics: Once in Azure Data Lake, your data can easily be accessed by other Azure services for machine learning, analytics, and reporting, unlocking new insights for your business. 🔹 Security & Compliance: Built-in security features ensure your data is encrypted, protected, and compliant with industry standards. Feel free to check out the repository here: #datalake #python #azure #dataanalysis

GitHub - LesterCerioli/CloudSuite-DataLake-Python312

github.com
Like Comment
To view or add a comment, sign in
sarah moussaoui

First-year Master's student in Intelligent Computer Systems.
3mo
Report this post
Just completed an in-depth course on Apache Airflow! I've been diving deep into the world of workflow orchestration, and I’m excited to share what I’ve learned: - Setting Up Airflow: Learned how to run Airflow in both Python environments and Docker. From initializing databases to setting up the web server and scheduler, I now have a solid understanding of how to configure and manage Airflow. - Airflow Concepts: Explored core concepts like DAGs (Directed Acyclic Graphs), task dependencies, and the task lifecycle. Understanding how these elements interact is crucial for building reliable workflows. - Airflow Architecture: Gained insights into the architecture, including how the webserver, scheduler, and workers interact with the database and DAGs. Knowing the backend processes helps in better managing and troubleshooting the system. - Advanced Features: Delved into more advanced topics like backfilling, catchup mechanisms, and setting up connections to external databases. I also learned about extending and customizing Docker images to optimize the Airflow environment for specific needs. - Real-World Applications: Implemented various operators and sensors, including AWS S3 Sensor Operator and PostgreSQL hooks, to connect, monitor, and manipulate data across different systems. These hands-on exercises solidified my understanding of how Airflow can be applied in real-world scenarios. This course has been a game-changer, and I’m excited to apply these new skills in upcoming projects.
Like Comment
To view or add a comment, sign in
Joe Birch

Senior Engineer II at Buffer, Google Developer Expert for Android
5mo Edited
Report this post
At Buffer, we're using GitHub Actions to prevent breaking changes in our #GraphQL API. I wrote a post on our blog about how we're doing this 👉

How We're Preventing Breaking Changes in GraphQL APIs at Buffer — and Why It's Essential for Our Customers

buffer.com
Like Comment
To view or add a comment, sign in
Mayowa Akinyele

Sr. Data Engineer @ Grover | Founder @ CoreDataEngineers.com
4mo
Report this post
Beginner Data Engineers interested in Docker 🎯 - Firstly, ask what problem is Docker trying to solve and what was the existing problem like - Start with the high level Architecture of Docker - ⁠What docker client, docker host, docker daemon is - Read about Dockerfile and it’s possible instruction arguments - Write a basic python code, even if its print("hello world") it doesn't matter, its not about application, its about understanding what Docker does. You can always extend later. - ⁠Write a basic Dockerfile pointing to that basic code - ⁠Read on what a Docker image is - ⁠Building an image based on the basic Dockerfile, and read on possible flag parameters a docker build command accept. - ⁠Cache layers and cache invalidation during build process, - Read on multi-build stage, concepts used to optimise Docker images - ⁠Read on what a Docker container is - ⁠Running a container and all it’s possible flag parameters using the docker run command - ⁠Executing into a container and all it’s possible flag parameter when using the docker exec command - Validate the Dockerfile instruction behaves has expected when you go into the container. Tweak the Dockerfile again, rebuild , exec into the container to really validate you understand. - ⁠Read on Docker volume and Mount bind and the difference - ⁠Pushing an image to a container registry (dockerhub) manually, you can extend to cloud based registry later - ⁠Use CI/CD to automate the entire build and push to a container registry( not necessary but top value you will get from this) - ⁠Read on Docker compose file and all its top level element and meaning - ⁠Read around common Docker compose commands and their flag parameters - ⁠Write a simple compose file with a basic code , even if its running something very basic code, the idea is to understand the process and not the application at first. - ⁠Run that docker compose file - ⁠After all this , go and check the popular official compose file, for example Airflow, you will be able to relate at least to what those field elements does . - ⁠Lastly, if you don’t do this, you will copy paste every time from people , when problem arises , your only bail out will always be chatGPT and asking around 📌 Everything said here can be found in the Docker official documentation here, you just have to create time to read and practice 🤩🤩🤩 - https://lnkd.in/dcAK96t3 - https://lnkd.in/d-2dxA9g - https://lnkd.in/dtGVxpJD - https://lnkd.in/d-nhrQMj - https://lnkd.in/dkr8FyMy - https://lnkd.in/dTbX9MFB - https://lnkd.in/dUv2BaNt - https://lnkd.in/dcenqt3E

Home

docs.docker.com

2 Comments
Like Comment
To view or add a comment, sign in
Austin Chia

Data Analyst | Freelance Technical Writer for Data & AI SaaS
4mo
Report this post
Numpy 2.0 is out. Here are some critical changes: 1. API and ABI changes 2. New DType API 3. Scalar promotion 4. Performance improvements 5. Windows compatibility 6. Improved documentation Has this version caused some of your code to break? Check out my full blog post on DataCamp about this here: https://lnkd.in/gVdBmjb7 Content managers and data people! Are you looking to have me write technical articles for your blog like this? DM me!

NumPy 2.0 Release: Key Changes and Migration

datacamp.com
Like Comment
To view or add a comment, sign in
Ridwan Suleiman ADEJUMO

Data Scientist || Technical Writer || Udemy Instructor
2mo
Report this post
The last thing you would want to get frustrated about on your first day as a new data scientist in an organization is trying to set up the resources and tools you will need on your PC for your assigned project. To make matters worse, your PC doesn’t even meet some of the hardware requirements needed. This is where you will see all the benefits of GitHub Codespaces, which provides you with a cloud-based environment that is instantly accessible and pre-configured. If you are new to Github Codespaces, check out this DataCamp article to learn more. https://lnkd.in/dh8ScuY9

Introduction to GitHub Codespaces

datacamp.com
Like Comment
To view or add a comment, sign in

37,157 followers

View Profile Connect

Data Engineer Things’ Post

More Relevant Posts

Explore topics