Chris Unterfranz’s Post

Regional Director - Enterprise | The Leading Modern Data Platform for GenAI

6mo

How does MongoDB make building AI apps easier and faster? (Hint: It starts with unifying operational and vector data) Learn more here: https://lnkd.in/gGP_AJwE #LoveYourDevelopers #mongodb #nosql #sql #genai #ai #database #developer #llm #RAG #architect #vector #search #mongodbatlas

To view or add a comment, sign in

More Relevant Posts

Chris Unterfranz

Regional Director - Enterprise | The Leading Modern Data Platform for GenAI
6mo
Report this post
From modernizing apps to building new AI-powered experiences, developers are key. Learn how we put developers first: https://lnkd.in/gGP_AJwE #LoveYourDevelopers MongoDB #mongodb #nosql #sql #genai #ai #database #developer #llm #RAG #architect #vector #search #mongodbatlas
Like Comment
To view or add a comment, sign in
Davide Mauri

Principal Product Manager - Azure SQL, DEV+AI
8mo
Report this post
So many cool and interesting features announced at #MSBuild for #AzureSQL. Make sure to check them out. This slide that Muazma Zahid put togheter is a great starting point. JSON, RegEx, Vectors, GraphQL, AI...oh my!!!! 🔥 🔥 🔥
Muazma Zahid

Data and AI Leader at Microsoft | Advisor | Speaker
8mo

Exciting updates from #MicrosoftBuild this week! We've introduced a variety of new Developer and AI features for SQL Databases 🚀. Here are the highlights: GA: Data API Builder - https://aka.ms/dab Public Preview: Copilot Capabilities - https://lnkd.in/gawQEg_N - Self-Help for managing and operating Azure SQL Database - Natural language to T-SQL conversion in Azure SQL Database JSON Data Type and Aggregates - https://lnkd.in/g3ZZdAjY Azure SQL Database Fabric Mirroring - https://lnkd.in/gQKr76A3 Private Preview: Vector Functions - https://lnkd.in/gVCe-aJU T-SQL Regular Expressions (RegEx) - https://lnkd.in/gQJAV4tM Kudos to the fantastic team for their hard work Joe Sack Jerry Nixon Umachandar Jayachandran Davide Mauri Pooja Kamath Abhiman Tiwari Idris Motiwala Anagha Todalbagi Brian Spendolini Katherine Lin Salvador Martinez Sonika Sharma Sanjay Mishra Asad Khan Shireesh Thota Bob Ward Anna Hoffman Aniruddh Munde and many more... 👏🌟 #ai #vectorsearch #azuresql #developers #sql
Like Comment
To view or add a comment, sign in
Carlos Robles

Senior Product Manager at Microsoft Azure | DevOps, Cloud & Software development
8mo
Report this post
Exciting updates from #MicrosoftBuild this week!
Muazma Zahid

Data and AI Leader at Microsoft | Advisor | Speaker
8mo

Exciting updates from #MicrosoftBuild this week! We've introduced a variety of new Developer and AI features for SQL Databases 🚀. Here are the highlights: GA: Data API Builder - https://aka.ms/dab Public Preview: Copilot Capabilities - https://lnkd.in/gawQEg_N - Self-Help for managing and operating Azure SQL Database - Natural language to T-SQL conversion in Azure SQL Database JSON Data Type and Aggregates - https://lnkd.in/g3ZZdAjY Azure SQL Database Fabric Mirroring - https://lnkd.in/gQKr76A3 Private Preview: Vector Functions - https://lnkd.in/gVCe-aJU T-SQL Regular Expressions (RegEx) - https://lnkd.in/gQJAV4tM Kudos to the fantastic team for their hard work Joe Sack Jerry Nixon Umachandar Jayachandran Davide Mauri Pooja Kamath Abhiman Tiwari Idris Motiwala Anagha Todalbagi Brian Spendolini Katherine Lin Salvador Martinez Sonika Sharma Sanjay Mishra Asad Khan Shireesh Thota Bob Ward Anna Hoffman Aniruddh Munde and many more... 👏🌟 #ai #vectorsearch #azuresql #developers #sql
Like Comment
To view or add a comment, sign in
Karthiyayini A.

Senior Technical Lead | Full Stack Developer | AWS Certified | Expert in Laravel, PHP, Python, ML & Data Science | Concert IDC
4w
Report this post
🚀 Vector Databases vs. Traditional Databases: A Quick Comparison 🚀 As data continues to grow in both volume and complexity, choosing the right type of database for your use case is crucial. Below is a comparison between Vector Databases and Traditional Databases to help you understand their differences and which one might be best for your needs. 💡 Key Takeaway: If you’re working with AI, machine learning models, or similarity-based queries, Vector Databases offer superior performance for tasks like search and recommendation. On the other hand, for traditional transactional use cases, Traditional Databases like MySQL or PostgreSQL continue to shine with strong consistency, relational integrity, and scalability. Which database solution do you prefer for your projects? Drop your thoughts in the comments below! 👇 #Database #VectorDatabases #MachineLearning #AI #DataScience #Technology #BigData #SQL #PostgreSQL #MySQL #DataManagement

1 Comment
Like Comment
To view or add a comment, sign in
Chandan kale

Industry 4.0 l Industrial IoT | Microservices | Supply Chain Analytics l Operation Research l Data Engineering l Machine Learning | Cloud Computing | Computer Vision | Edge Computing | Gen AI
1mo
Report this post
🌟 Unlock the Power of MongoDB 🌟 🚀 Dive into a database that does it all: 🔍 Vector Search for AI-powered recommendations 🌊 Stream Processing for real-time insights ⚡ Operational and 🛒 Transactional for business-critical apps ✍️ Text Search to power seamless search experiences 📊 Analytical for data-driven decisions 🔗 Graph for connected data 🌍 Geospatial for location-based intelligence MongoDB: The ultimate multi-model database for modern applications. #MongoDB #Database #DataEngineering #DataScience #BigData #AI #VectorSearch #RealTime #GraphDatabase #GeospatialData #TextSearch #Analytics #MachineLearning #StreamProcessing #DevOps #DataOps #Innovation #ModernApps
Like Comment
To view or add a comment, sign in
Dot Corp.

2 followers
3mo Edited
Report this post
We are thrilled to announce the launch of Super Sense, an AI-driven platform designed to revolutionize the way businesses harness their data. With cutting-edge real-time intelligence and seamless integration, Super Sense transforms complex data into clear, actionable insights. Empower your business to make smarter, faster decisions and unlock the full potential of your data with Super Sense. To learn more visit https://super-sense.ai #AI #ArtificialIntelligence #DataRetrieval #LLMs #Datasets #Databases #SQL #PostgreSQL #MongoDB #Strawberry #Datascience #DataAnalytics
1 Comment
Like Comment
To view or add a comment, sign in
Daleep Singh

Data Engineering | Hadoop | Spark | Scala | Python | PySpark | SQL | Azure | Azure Databricks |Azure Data Factory |Microsoft Certified: DP-203 | DP-600 | Git | Solving Problems Through Technology | Tech Enthusiastic
1mo Edited
Report this post
If we have two Large datasets in spark we can use Sort Merge Join. Sort Merge Join: A Sort Merge Join (SMJ) is a common join strategy in Apache Spark, optimized for large datasets where both sides of the join can be sorted. It works by first sorting the datasets based on the join keys and then merging the sorted datasets to find matching keys. This approach minimizes shuffling, making it more efficient for large-scale joins compared to other join strategies like broadcast join or shuffled hash join. How it works: 1. Shuffle Phase (if needed): The datasets involved in the join are shuffled across nodes to partition them by the join keys. Each partition contains data for a specific range of keys. 2. Sort Phase: Within each partition, the datasets are sorted by the join key. 3. Merge Phase: Once sorted, the datasets are scanned simultaneously in a merge-like operation to find matching keys. Matching rows are combined, and results are returned. Use Cases: Joining large datasets where both sides are too large to broadcast. Joins involving keys that are not already sorted or need to be shuffled for proper partitioning. Supported Join Types: Sort Merge Join supports: Inner join Outer join (left, right, full) Semi join Anti join Performance Considerations: Pros: Efficient for large datasets. Scales well when data is evenly distributed. Handles data skew better than some other join types. Cons: Requires sorting, which can be computationally expensive. More disk and memory-intensive if the datasets are very large and do not fit in memory. Example in PySpark: # Creating DataFrames df1 = spark.createDataFrame([(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')], ["id", "name"]) df2 = spark.createDataFrame([(1, 'Engineering'), (2, 'HR'), (4, 'Finance')], ["id", "department"]) # Performing a join (Spark will choose Sort Merge Join if applicable) result = df1.join(df2, on="id", how="inner") result.show() When is Sort Merge Join Used in Spark? Spark automatically chooses SMJ when the datasets are large and do not fit in memory for broadcast joins. It is enabled by default if: The join keys are sortable. The data size is beyond the threshold for broadcast join (spark.sql.autoBroadcastJoinThreshold) You can fine-tune this behavior with configurations if needed. #Dataengineering #technology #bigdata #clouddataengineering #azurecloud #spark #apachespark #databricks
Like Comment
To view or add a comment, sign in
Donald Lutz
7mo
Report this post
How to use OpenAI GPT-4o to query your database? #llm #openai #gpt #database #sql https://lnkd.in/eFSdjptp

How to use OpenAI GPT-4o to query your database?

blog.getwren.ai
Like Comment
To view or add a comment, sign in
Cody Austin Davis

Lead Solutions Architect at Databricks | Start Up Advisor | Investor
4mo
Report this post
In Databricks SQL and Unity Catalog, you can create and manage AI functions with AI_QUERY and SQL Functions to make accessing models and standard prompting templates a breeze in SQL, but did you also know you can create embeddings and query them directly on your tables in SQL with VECTOR_SEARCH? Check this blog by Reilly Wild-Williams that walks through how to use all these tools to create AI-embedded SQL Dashboards and pipelines! https://lnkd.in/gx4GqnXr

Databricks SQL AI: Query a Vector Search Endpoint with only SQL

medium.com
Like Comment
To view or add a comment, sign in
Karthik K.

𝐅𝐨𝐮𝐧𝐝𝐞𝐫 & 𝐂𝐄𝐎 @𝐒𝐞𝐞𝐤𝐡𝐨 𝐁𝐢𝐠𝐝𝐚𝐭𝐚 𝐈𝐧𝐬𝐭𝐢𝐭𝐮𝐭𝐞, 𝐁𝐢𝐠𝐝𝐚𝐭𝐚 𝐓𝐫𝐚𝐢𝐧𝐞𝐫
1mo
Report this post
Resolving OutOfMemory (OOM) Errors in PySpark: Best Practices 1️⃣ Adjust Spark Configuration (Memory Management) Increase Executor Memory: spark.conf.set("spark.executor.memory", "8g") Increase Driver Memory: spark.conf.set("spark.driver.memory", "4g") Set Executor Cores: spark.conf.set("spark.executor.cores", "2") Use Disk Persistence: df.persist(StorageLevel.DISK_ONLY) 2️⃣ Enable Dynamic Allocation Allow Spark to adjust executors: spark.conf.set("spark.dynamicAllocation.enabled", "true") spark.conf.set("spark.dynamicAllocation.minExecutors", "1") 3️⃣ Enable Adaptive Query Execution (AQE) Enable AQE to optimize query plans: spark.conf.set("spark.sql.adaptive.enabled", "true") 4️⃣ Enforce Schema for Unstructured Data Prevent schema inference overhead: df = spark.read.schema(schema).json("path/to/data") 5️⃣ Tune the Number of Partitions Repartition DataFrame: df = df.repartition(200, "column_name") 6️⃣ Handle Data Skew Dynamically Use salting for skewed joins: df1.withColumn("join_key_salted", F.concat(F.col("join_key"), F.lit("_"), F.rand())) 7️⃣ Limit Cache Usage for Large DataFrames Cache selectively, or persist to disk: df.persist(StorageLevel.MEMORY_AND_DISK) 8️⃣ Optimize Joins for Large DataFrames Use broadcast joins for smaller tables: df_join = large_df.join(broadcast(small_df), "join_key", "left") 9️⃣ Monitor Spark Jobs Use Spark UI to track memory usage and job execution. 🔟 Consider Partitioning Strategy Write partitioned data: df.write.partitionBy("partition_column").parquet("path_to_data") Seekho Bigdata Institute Karthik K.

4 Comments
Like Comment
To view or add a comment, sign in

5,851 followers

1,662 Posts

View Profile Follow

Chris Unterfranz’s Post

More Relevant Posts

Explore topics