How does MongoDB make building AI apps easier and faster? (Hint: It starts with unifying operational and vector data) Learn more here: https://lnkd.in/gGP_AJwE #LoveYourDevelopers #mongodb #nosql #sql #genai #ai #database #developer #llm #RAG #architect #vector #search #mongodbatlas
Chris Unterfranz’s Post
More Relevant Posts
-
From modernizing apps to building new AI-powered experiences, developers are key. Learn how we put developers first: https://lnkd.in/gGP_AJwE #LoveYourDevelopers MongoDB #mongodb #nosql #sql #genai #ai #database #developer #llm #RAG #architect #vector #search #mongodbatlas
To view or add a comment, sign in
-
So many cool and interesting features announced at #MSBuild for #AzureSQL. Make sure to check them out. This slide that Muazma Zahid put togheter is a great starting point. JSON, RegEx, Vectors, GraphQL, AI...oh my!!!! 🔥 🔥 🔥
Exciting updates from #MicrosoftBuild this week! We've introduced a variety of new Developer and AI features for SQL Databases 🚀. Here are the highlights: GA: Data API Builder - https://aka.ms/dab Public Preview: Copilot Capabilities - https://lnkd.in/gawQEg_N - Self-Help for managing and operating Azure SQL Database - Natural language to T-SQL conversion in Azure SQL Database JSON Data Type and Aggregates - https://lnkd.in/g3ZZdAjY Azure SQL Database Fabric Mirroring - https://lnkd.in/gQKr76A3 Private Preview: Vector Functions - https://lnkd.in/gVCe-aJU T-SQL Regular Expressions (RegEx) - https://lnkd.in/gQJAV4tM Kudos to the fantastic team for their hard work Joe Sack Jerry Nixon Umachandar Jayachandran Davide Mauri Pooja Kamath Abhiman Tiwari Idris Motiwala Anagha Todalbagi Brian Spendolini Katherine Lin Salvador Martinez Sonika Sharma Sanjay Mishra Asad Khan Shireesh Thota Bob Ward Anna Hoffman Aniruddh Munde and many more... 👏🌟 #ai #vectorsearch #azuresql #developers #sql
To view or add a comment, sign in
-
Exciting updates from #MicrosoftBuild this week!
Exciting updates from #MicrosoftBuild this week! We've introduced a variety of new Developer and AI features for SQL Databases 🚀. Here are the highlights: GA: Data API Builder - https://aka.ms/dab Public Preview: Copilot Capabilities - https://lnkd.in/gawQEg_N - Self-Help for managing and operating Azure SQL Database - Natural language to T-SQL conversion in Azure SQL Database JSON Data Type and Aggregates - https://lnkd.in/g3ZZdAjY Azure SQL Database Fabric Mirroring - https://lnkd.in/gQKr76A3 Private Preview: Vector Functions - https://lnkd.in/gVCe-aJU T-SQL Regular Expressions (RegEx) - https://lnkd.in/gQJAV4tM Kudos to the fantastic team for their hard work Joe Sack Jerry Nixon Umachandar Jayachandran Davide Mauri Pooja Kamath Abhiman Tiwari Idris Motiwala Anagha Todalbagi Brian Spendolini Katherine Lin Salvador Martinez Sonika Sharma Sanjay Mishra Asad Khan Shireesh Thota Bob Ward Anna Hoffman Aniruddh Munde and many more... 👏🌟 #ai #vectorsearch #azuresql #developers #sql
To view or add a comment, sign in
-
🚀 Vector Databases vs. Traditional Databases: A Quick Comparison 🚀 As data continues to grow in both volume and complexity, choosing the right type of database for your use case is crucial. Below is a comparison between Vector Databases and Traditional Databases to help you understand their differences and which one might be best for your needs. 💡 Key Takeaway: If you’re working with AI, machine learning models, or similarity-based queries, Vector Databases offer superior performance for tasks like search and recommendation. On the other hand, for traditional transactional use cases, Traditional Databases like MySQL or PostgreSQL continue to shine with strong consistency, relational integrity, and scalability. Which database solution do you prefer for your projects? Drop your thoughts in the comments below! 👇 #Database #VectorDatabases #MachineLearning #AI #DataScience #Technology #BigData #SQL #PostgreSQL #MySQL #DataManagement
To view or add a comment, sign in
-
🌟 Unlock the Power of MongoDB 🌟 🚀 Dive into a database that does it all: 🔍 Vector Search for AI-powered recommendations 🌊 Stream Processing for real-time insights ⚡ Operational and 🛒 Transactional for business-critical apps ✍️ Text Search to power seamless search experiences 📊 Analytical for data-driven decisions 🔗 Graph for connected data 🌍 Geospatial for location-based intelligence MongoDB: The ultimate multi-model database for modern applications. #MongoDB #Database #DataEngineering #DataScience #BigData #AI #VectorSearch #RealTime #GraphDatabase #GeospatialData #TextSearch #Analytics #MachineLearning #StreamProcessing #DevOps #DataOps #Innovation #ModernApps
To view or add a comment, sign in
-
We are thrilled to announce the launch of Super Sense, an AI-driven platform designed to revolutionize the way businesses harness their data. With cutting-edge real-time intelligence and seamless integration, Super Sense transforms complex data into clear, actionable insights. Empower your business to make smarter, faster decisions and unlock the full potential of your data with Super Sense. To learn more visit https://super-sense.ai #AI #ArtificialIntelligence #DataRetrieval #LLMs #Datasets #Databases #SQL #PostgreSQL #MongoDB #Strawberry #Datascience #DataAnalytics
To view or add a comment, sign in
-
If we have two Large datasets in spark we can use Sort Merge Join. Sort Merge Join: A Sort Merge Join (SMJ) is a common join strategy in Apache Spark, optimized for large datasets where both sides of the join can be sorted. It works by first sorting the datasets based on the join keys and then merging the sorted datasets to find matching keys. This approach minimizes shuffling, making it more efficient for large-scale joins compared to other join strategies like broadcast join or shuffled hash join. How it works: 1. Shuffle Phase (if needed): The datasets involved in the join are shuffled across nodes to partition them by the join keys. Each partition contains data for a specific range of keys. 2. Sort Phase: Within each partition, the datasets are sorted by the join key. 3. Merge Phase: Once sorted, the datasets are scanned simultaneously in a merge-like operation to find matching keys. Matching rows are combined, and results are returned. Use Cases: Joining large datasets where both sides are too large to broadcast. Joins involving keys that are not already sorted or need to be shuffled for proper partitioning. Supported Join Types: Sort Merge Join supports: Inner join Outer join (left, right, full) Semi join Anti join Performance Considerations: Pros: Efficient for large datasets. Scales well when data is evenly distributed. Handles data skew better than some other join types. Cons: Requires sorting, which can be computationally expensive. More disk and memory-intensive if the datasets are very large and do not fit in memory. Example in PySpark: # Creating DataFrames df1 = spark.createDataFrame([(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')], ["id", "name"]) df2 = spark.createDataFrame([(1, 'Engineering'), (2, 'HR'), (4, 'Finance')], ["id", "department"]) # Performing a join (Spark will choose Sort Merge Join if applicable) result = df1.join(df2, on="id", how="inner") result.show() When is Sort Merge Join Used in Spark? Spark automatically chooses SMJ when the datasets are large and do not fit in memory for broadcast joins. It is enabled by default if: The join keys are sortable. The data size is beyond the threshold for broadcast join (spark.sql.autoBroadcastJoinThreshold) You can fine-tune this behavior with configurations if needed. #Dataengineering #technology #bigdata #clouddataengineering #azurecloud #spark #apachespark #databricks
To view or add a comment, sign in
-
In Databricks SQL and Unity Catalog, you can create and manage AI functions with AI_QUERY and SQL Functions to make accessing models and standard prompting templates a breeze in SQL, but did you also know you can create embeddings and query them directly on your tables in SQL with VECTOR_SEARCH? Check this blog by Reilly Wild-Williams that walks through how to use all these tools to create AI-embedded SQL Dashboards and pipelines! https://lnkd.in/gx4GqnXr
Databricks SQL AI: Query a Vector Search Endpoint with only SQL
medium.com
To view or add a comment, sign in
-
Resolving OutOfMemory (OOM) Errors in PySpark: Best Practices 1️⃣ Adjust Spark Configuration (Memory Management) Increase Executor Memory: spark.conf.set("spark.executor.memory", "8g") Increase Driver Memory: spark.conf.set("spark.driver.memory", "4g") Set Executor Cores: spark.conf.set("spark.executor.cores", "2") Use Disk Persistence: df.persist(StorageLevel.DISK_ONLY) 2️⃣ Enable Dynamic Allocation Allow Spark to adjust executors: spark.conf.set("spark.dynamicAllocation.enabled", "true") spark.conf.set("spark.dynamicAllocation.minExecutors", "1") 3️⃣ Enable Adaptive Query Execution (AQE) Enable AQE to optimize query plans: spark.conf.set("spark.sql.adaptive.enabled", "true") 4️⃣ Enforce Schema for Unstructured Data Prevent schema inference overhead: df = spark.read.schema(schema).json("path/to/data") 5️⃣ Tune the Number of Partitions Repartition DataFrame: df = df.repartition(200, "column_name") 6️⃣ Handle Data Skew Dynamically Use salting for skewed joins: df1.withColumn("join_key_salted", F.concat(F.col("join_key"), F.lit("_"), F.rand())) 7️⃣ Limit Cache Usage for Large DataFrames Cache selectively, or persist to disk: df.persist(StorageLevel.MEMORY_AND_DISK) 8️⃣ Optimize Joins for Large DataFrames Use broadcast joins for smaller tables: df_join = large_df.join(broadcast(small_df), "join_key", "left") 9️⃣ Monitor Spark Jobs Use Spark UI to track memory usage and job execution. 🔟 Consider Partitioning Strategy Write partitioned data: df.write.partitionBy("partition_column").parquet("path_to_data") Seekho Bigdata Institute Karthik K.
To view or add a comment, sign in