Exciting news: ipydatagrid is now part of Project Jupyter! Learn more on our blog. https://lnkd.in/gXA9TbXf #jupyter #ipydatagrid
Jupyter becoming the new Excel!
Skip to main content
Exciting news: ipydatagrid is now part of Project Jupyter! Learn more on our blog. https://lnkd.in/gXA9TbXf #jupyter #ipydatagrid
Jupyter becoming the new Excel!
Post doctoral researcher affiliated with Stellenbosch university, University of Cape Town and SAEON
3moTo view or add a comment, sign in
bRAG-langchain: A Step-by-Step Guide to Building Enterprise-Level RAG Systems 🚀📚 This project walks developers through 5 progressive Jupyter notebooks, guiding them from scratch to building, optimizing, and deploying an enterprise-level RAG system. It covers everything from basic setup to advanced techniques like multi-query, semantic routing, and reranking. 💥 Core Value of the Project : 1️⃣ Provides a comprehensive tutorial for implementing RAG systems, from beginner to advanced levels. 2️⃣ Built using the powerful @LangChainAI framework. 3️⃣ Includes real-world examples of advanced techniques. 🗝️ 5 Key Tutorial Notebooks Each notebook is designed to build on the previous one, increasing in complexity: 1️⃣ Basic Setup Overview 📖 📂 File: [1]_rag_setup_overview.ipynb 💻 Environment setup 🌱 Data loading and preprocessing 🛠️ Generating embeddings using OpenAI 🤖 Setting up vector databases (ChromaDB/Pinecone) 🗃️ Building a foundational RAG pipeline 🏗️ 2️⃣ Multi-Query Technique 🔍 📂 File: [2]_rag_with_multi_query.ipynb Implementing multi-query retrieval 🎯 Using multiple embedding models 🤝 Comparing the performance of single-query vs. multi-query systems 📊 3️⃣ Routing and Query Construction 🛤️ 📂 File: [3]_rag_routing_and_query_construction.ipynb Logical routing implementation 🧠 Semantic routing (e.g., classifying math/physics problems) 📐 Structured search patterns 🗺️ Integrating vector storage 📂 4️⃣ Indexing and Advanced Retrieval 🧩 📂 File: [4]_rag_indexing_and_advanced_retrieval.ipynb Multi-representation indexing 📚 Document summarization storage ✍️ Integration with ColBERT 🔎 Implementing RAPTOR for efficient retrieval 🚀 5️⃣ Retrieval and Reranking 🥇 📂 File: [5]_rag_retrieval_and_reranking.ipynb RAG-Fusion for multi-query generation 🌐 Reciprocal Rank Fusion (RRF) 🔄 Reranking with @cohere 📈 Advanced techniques like CRAG and Self-RAG 🔗 Github : https://lnkd.in/g5naxEJX
To view or add a comment, sign in
Many of the tutorials and demos I give on #anywidget (https://anywidget.dev) aim to make the web accessible as a platform for building user interfaces to data systems. They show how simple it is to get started but leave much to the imagination regarding what ~could~ be built with such an architecture. For my talk at SciPy Conference, I presented my vision of building meaningful interactive tools on top of emerging composable data systems, what I like to call "composable data vis." I shared a prototype... which I've since polished up and am now open sourcing! quak (https://lnkd.in/eFX6dPPF) 🦆 is a scalable data profiler for quickly scanning large tables. Cross-filter and sort millions of rows in real time. The core idea in quak is that all table state is expressed via database queries. User interactions produce SQL, executed lazily at the database level (via DuckDB) to refresh views. This dynamic SQL can also be accessed in Jupyter to materialize data subsets for further analysis. This design, extending from Mosaic (https://lnkd.in/emFCyQJk), not only makes quak fast but and allows defining complex queries through interaction, which would otherwise be tedious to code – all while keeping it #reproducible.
To view or add a comment, sign in
📊 Running RStudio in the Comfort of Containers The world of data science continually merges with container technology, and the latest tutorial from Rami Krispin through Towards Data Science is a testament to this union. This step-by-step guide showcases how to set up an RStudio server inside a container, employing Rocker's RStudio image from Docker Hub. This technique offers an alternative for RStudio, which does not support Docker natively, allowing users to work in a server environment deeply integrated with their local settings. Rami's tutorial is designed for those familiar with Docker commands and involves using Docker Desktop and an account on Docker Hub. It walks through the process, discussing the Rocker Project's provision of various R image setups, including `base-r`, `tidyverse`, and `geospatial`. A key feature highlighted in the guide is the use of the `--volume` flag, which creates a pathway for persistent storage, thus overcoming Docker's default ephemeral state and securing the user's workspace locally. Rami's methodical instructions bolster the conversion from local development setups to containerized environments, facilitating a user-friendly experience while safeguarding the continuance of workflows. The guide also touches on using Docker Compose for more complex scenarios that involve orchestrating multiple containers, aiming to streamline processes even further. Sources: Running RStudio Inside a Container - [Towards Data Science Article](https://lnkd.in/gm3dmghm) #DataScience #RStudio #Docker #Containers #TechTutorial #RockerProject #DevelopmentEnvironment #PersistentStorage #DockerCompose #WorkflowAutomation
To view or add a comment, sign in
🚀 Achievement Unlocked: Implemented a Stack Data Structure Using Linked List! 🧩 I'm thrilled to share that I've successfully completed the implementation of a stack data structure using a linked list! 📚🔗 Why Stacks? Stacks are fundamental data structures that follow the Last In, First Out (LIFO) principle. They are crucial in various applications like expression evaluation, backtracking algorithms, and memory management. Why Linked Lists? Linked lists provide dynamic memory allocation, making them an excellent choice for implementing stacks. They allow efficient insertion and deletion of elements, which aligns perfectly with the stack operations. Key Features of My Implementation: Push: Add an element to the top of the stack. Pop: Remove the top element from the stack. Peek: View the top element without removing it. IsEmpty: Check if the stack is empty. Display: Show all elements in the stack. Challenges Overcome: Handling edge cases such as popping from an empty stack. Ensuring efficient memory usage and avoiding memory leaks. Next Steps: Integrate this stack with other data structures and algorithms. Explore more complex use cases and optimizations. Feel free to check out my code on GitHub [insert GitHub link] and share your thoughts! Let's connect and discuss all things data structures and algorithms. #DataStructures #Algorithms #LinkedList #Stack #Coding #Programming #LearningJourney Special Thanks to Hope3 Foundation Hope3 Varsity Palani Vairavan Meenakshi Sundaram Manivannan Amrish K.S. MANI RR Siva Kumar
To view or add a comment, sign in
I've successfully completed the third module of the DataTalksClub Data Engineering Zoomcamp course. ✅ The main focus was on 💾 BigQuery and ways of importing bulk data. Some of the topics covered were: GCP Buckets Materializing external tables Partitioning and Clustering Query costs depending on the type of tables queried Machine learning models As always, I'd like to thank everyone for making this happen! 🚀 Looking forward to Module 4 and dbt Labs's dbt! 🚀 #dezoomcamp #dataengineering #bigquery #datatalksclub Check out the GitHub of the course:
To view or add a comment, sign in
Lightweight and extensible compatibility layer between dataframe libraries: Pandas, Polar and others https://lnkd.in/dKTG-kDG
To view or add a comment, sign in
Hello LinkedIn Connections! 👋 Here’s my 𝐓𝐚𝐬𝐤 𝟐 💡 “𝐂𝐨𝐝𝐞𝐀𝐥𝐩𝐡𝐚 𝙄𝙣𝙩𝙚𝙧𝙣𝙨𝙝𝙞𝙥” Successfully Completed in “𝙅𝙪𝙥𝙮𝙩𝙚𝙧 𝙇𝙖𝙗”.🤩 👩🏻💻𝐓𝐚𝐬𝐤 𝟐: 𝐓𝐚𝐤𝐞 𝐬𝐭𝐨𝐜𝐤 𝐩𝐫𝐢𝐜𝐞 𝐨𝐟 𝐚𝐧𝐲 𝐜𝐨𝐦𝐩𝐚𝐧𝐲 𝐲𝐨𝐮 𝐰𝐚𝐧𝐭 𝐚𝐧𝐝 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐬 𝐢𝐭𝐬 𝐩𝐫𝐢𝐜𝐞 𝐛𝐲 𝐮𝐬𝐢𝐧𝐠 𝐋𝐒𝐓𝐌. 𝐔𝐬𝐞 𝐨𝐧𝐥𝐲 𝐉𝐮𝐩𝐲𝐭𝐞𝐫 𝐧𝐨𝐭𝐞𝐛𝐨𝐨𝐤 𝐜𝐨𝐝𝐞. => 𝙄𝙩 𝙞𝙣𝙘𝙡𝙪𝙙𝙚𝙨 importing pandas, matplotlib, seaborn,numpy, etc. - .csv file reading - data visualisation concepts - histogram representation of data,etc. 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐠𝐨𝐚𝐥𝐬: - We'll be answering the following questions along the way: 1) What was the change in price of the stock over time? 2) What was the daily return of the stock on average? 3) What was the moving average of the various stocks? 4) What was the correlation between different stocks'? 5) How much value do we put at risk by investing in a particular stock? 6) How can we attempt to predict future stock behavior? (Predicting the closing price stock price of APPLE inc using LSTM) . . . 𝙃𝙚𝙧𝙚’𝙨 𝙡𝙞𝙣𝙠 𝙩𝙤 𝙢𝙮 𝙂𝙞𝙩𝙃𝙪𝙗 𝙍𝙚𝙥𝙤𝙨𝙞𝙩𝙤𝙧𝙮: - https://lnkd.in/dTY3vJxJ . . CodeAlpha #task2 #datascienceintern #learningnew #jupyterlab
To view or add a comment, sign in
Title: Exploring Graph Databases: Building a Product Recommendation System with Neo4j As a machine learning enthusiast, I'm excited to share my latest project - a personalized product recommendation system built using Neo4j, a powerful graph database, and Python. While traditional relational databases excel at handling structured data, graph databases offer a unique and powerful approach to model and analyze interconnected data. With this project, I aimed to gain hands-on experience with graph databases and understand their potential in delivering real-time, personalized recommendations. Leveraging Neo4j, a leading graph database, I developed a recommendation engine that models the intricate relationships between users, products, and categories as nodes and edges in a graph structure. This graph-based approach enabled efficient data retrieval and scalability, which are crucial for delivering recommendations in real-time. To power the recommendation engine, I implemented two collaborative filtering algorithms: 1. User-based Collaborative Filtering: Identifying similar users based on their product preferences to recommend items that like-minded users have interacted with or purchased. 2. Category-based Collaborative Filtering: Leveraging product categories to recommend popular items within a user's areas of interest. The project involved extensive data processing, integration with Google's BigQuery dataset, and showcased my skills in data manipulation, handling large datasets, and working with Python's data science ecosystem. While building this recommendation system, I gained valuable insights into the strengths and use cases of graph databases, particularly in handling highly interconnected data and performing complex queries efficiently. I'm thrilled to have completed this project and to share the results. Feel free to explore the GitHub repository (link: https://lnkd.in/erWU-sZP) to dive deeper into the code, algorithms, and implementation details. This project has been an enriching learning experience, allowing me to combine my knowledge of machine learning, graph databases, and Python to create a practical and scalable solution while expanding my understanding of cutting-edge technologies. #MachineLearning #RecommendationSystems #GraphDatabases #Neo4j #Python #DataScience #GoogleBigQuery
To view or add a comment, sign in
https://lnkd.in/dHD9PC8C Fast GraphRAG (GitHub Repo) Fast-Graphrag is an open-source framework for Retrieval Augmented Generation (RAG) that uses graphs to streamline agent-driven workflows.
To view or add a comment, sign in
🚀 Exciting Project: Real-Time Data Analysis Pipeline 🚀 I’m thrilled to share my latest project—a real-time data analysis pipeline using some amazing technologies! Here’s what I’ve been working on: 🔧 Technologies Used: Google Sheets: For managing data entry. Google BigQuery: For storing and analyzing data in real time. Apache Spark: For processing data efficiently. Python: For handling data workflows. What I’ve Achieved: Created a system that continuously syncs data from Google Sheets to BigQuery. Set up real-time data processing and analysis with Apache Spark. Enabled instant insights and predictions based on live data. Next Steps: Adding advanced machine learning models for better predictions. Improving data visualization with real-time dashboards in Power BI. Enhancing the system’s scalability and performance. Project Aim: To deliver immediate insights and support proactive decision-making as new data comes in. Check out the code and details on https://lnkd.in/diytNWNg to see how it all works! #DataScience #MachineLearning #BigData #ApacheSpark #GoogleBigQuery #DataEngineering #RealTimeAnalytics
To view or add a comment, sign in
17,051 followers
Create your free account or sign in to continue your search
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Analytical Chemist, Biologist, and Data Scientist
3moVery neat - looking forward to this being bundled!