AI/ML Terms You May Not Know, Battle of the LLMs, Unstructured Data Meetups, and More!

Zilliz

Vector database trailblazer and creator of Milvus, the world's most widely-adopted open source vector database.

Published Jul 25, 2024

+ Follow

In this issue:

AI/ML Terms You May Not Know
Which LLM Knows Vector Databases the Best?
Voyage AI Embeddings and Rerankers for Search and RAG
ICYMI: Unstructured Data Talk Recaps from Around the World
Upcoming Events

AI/ML Terms You May Not Know

🔥Topic Modeling & BERTopic

Why is it relevant?

All forms of data are constantly coming into the world, so the need for tools to navigate it has become more crucial. BERTopic employs neural network-based techniques to uncover themes and patterns in large text data with unprecedented accuracy and depth.

What is Topic Modeling? AI VS Bread 🤖🍞

Topic modeling: a method for unearthing the latent themes or "topics" within a collection of documents. Examine the text within these documents to detect patterns and relationships that indicate the presence of these topics. 🔎

🍞Example: a document focused on artificial intelligence will likely contain terms like "large language models" and "ChatGPT," unlike a document centered on baking bread.

ChatGPT image prompt: Make a comparison photo between a document focused on artificial intelligence likely contain terms like "large language models" and "ChatGPT," and a document centered on baking bread.

What is BERTopic?

BERTopic: a novel topic modeling technique that simplifies the topic modeling process. It uses various embedding techniques and class-based TF-IDF (c-TF-IDF) to create dense clusters, allowing for easily interpretable topics while keeping important words in the topic descriptions. 💡

It approaches topic modeling in 4 steps on a high level:

▶️ Document embedding: Convert documents into embeddings using Bidirectional Encoder Representations from Transformers (BERT).

▶️ Dimensionality Reduction: Compresses embeddings into a lower-dimensional space.

▶️ Clustering: Group these embeddings to gather similar documents in one category.

▶️ Topic Extraction: Extract topic names using a class-based variation of TF-IDF.

An overview of the BERTopic library; Image source: [

Read the full guide

Which LLM Knows Vector Databases Best?

We asked LLMs to explain what a vector database is to someone who isn’t familiar with it. Which LLM do you think “won” this response? Let us know in the comments below and see who we picked! ⬇️

Here are the results from various LLMs using the same prompt:

Anthropic Claude (Claude 3.5 Sonnet):

ChatGPT (GPT 4o):

Google Gemini:

Vector databases are super smart librarians! 🤓

Voyage AI Embeddings and Rerankers for Search and RAG

Voyage AI provides various customized embedding models across many domains to carry out efficient RAG techniques. These models are connected with vector databases like Milvus by Zilliz to store and retrieve vector embeddings related to the generated query.

Zilliz partnered with Voyage AI to streamline the conversion of unstructured data into searchable vector embeddings on Zilliz Cloud. Voyage AI embedding models integrated in Zilliz Cloud Pipelines are voyage-code-2, voyage-law-2, and voyage-large-2-instruct. These models are for specific code, law, finance, and multilingual domains.

Crafting RAG with Zilliz Cloud Pipelines and Voyage AI

See the step-by-step on using this integration and Cohere (the LLM) to build a RAG application: Read Blog

Watch the meetup talk with Voyage AI CEO Tengyu Ma on “cutting-edge embeddings and rerankers for search and RAG”

ICYMI: Unstructured Data Talk Recaps from Around the World

We had a packed schedule in July 🌞with fun AI events all over the world. See the recaps and register for next month’s events! ⬇️⬇️⬇️

Recap: July 16 SF @ GitHub

▶️ Garbage In, Garbage Out: Why poor data curation is killing your AI models (and how to fix it)

▶️ It's your unstructured data: How to get your GenAI app to production (and speed up your delivery)

Upcoming Events

August 1: Unstructured Data Processing from Cloud to Edge (virtual)

Learn why you should add a Cloud Native Vector Database to your Data and AI platform. Tim Spann will cover a quick introduction to Milvus, vector databases, and unstructured data processing. By adding Milvus to your architecture, you can scale out and improve your AI use cases through RAG, real-time search, multimodal search, recommendation engines, fraud detection, and many more emerging use cases.

Save Your Spot

August 2-4: AI-focused Touring Hackathon & Tech Fair (in-person)

🌟 TechStars Startup Weekend Personal AI Hackathon: Fri-Sat with the pitches 🗣️ and a Tech Fair 🎪 on Sunday showcasing local AI startups.

Christy Bergman will be there on Friday to answer your Milvus questions and kickstart the hackathon!

August 5: San Francisco Unstructured Data Meetup (in-person)

Join us in San Francisco for a meetup on August 5! There will be food, refreshments, networking, and cool AI talks.

▶️ Using Ray Data for Multimodal Embedding Inference with Christy Bergman , Developer Advocate at Zilliz

▶️ Building the Future of Neural Search: How to Train State-of-the-Art Embeddings with Aamir Shakir , Co-founder, Mixedbread

▶️ A Different Angle: Retrieval Optimized Embedding Models with Marqo

Save your spot: https://lu.ma/3q2brqp8

August 8: Building an Agentic RAG locally with Milvus, Ollama, and Llama Agents (virtual)

With the recent release of Llama Agents, we can now build agents that are async first and run as their own service. During this webinar, Stephen Batifol will show you how to build an Agentic RAG System using Llama Agents and Milvus.

Save Your Spot

August 13: South Bay Unstructured Data Meetup (in-person)

Stay tuned for the speaker line up coming soon!

Save your spot: https://lu.ma/tutkha5k

In the next newsletter, we’ll cover Softmax and Local Sensitivity Hashing (L.S.H.). Start reading now:

▶️ Decoding Softmax: Understanding Its Functions and Impact in AI

▶️ Local Sensitivity Hashing (L.S.H.): A Comprehensive Guide

Thanks for tuning in and see you next month! 👋

Zilliz Universe

2,706 followers

+ Subscribe

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

4mo

Understanding vector databases is crucial for optimizing LLM performance, especially when dealing with high-dimensional data and semantic search. Models like OpenAI's GPT-4 and Google's PaLM are known for their robust integration with vector databases, facilitating efficient similarity searches and context retrieval. This capability enhances their utility in various applications, from recommendation systems to advanced information retrieval. Considering the rapid advancements in vector embeddings and indexing techniques, how do you envision the evolution of LLMs influencing the optimization and scalability of vector databases in the near future? What innovative approaches might emerge to further enhance the efficiency of these integrations?

AI/ML Terms You May Not Know, Battle of the LLMs, Unstructured Data Meetups, and More!

Zilliz

Vector database trailblazer and creator of Milvus, the world's most widely-adopted open source vector database.

AI/ML Terms You May Not Know

Which LLM Knows Vector Databases Best?

Voyage AI Embeddings and Rerankers for Search and RAG

ICYMI: Unstructured Data Talk Recaps from Around the World

Recommended by LinkedIn

Upcoming Events

August 1: Unstructured Data Processing from Cloud to Edge (virtual)

August 2-4: AI-focused Touring Hackathon & Tech Fair (in-person)

August 5: San Francisco Unstructured Data Meetup (in-person)

August 8: Building an Agentic RAG locally with Milvus, Ollama, and Llama Agents (virtual)

August 13: South Bay Unstructured Data Meetup (in-person)

Zilliz Universe

2,706 followers

More articles by this author

Insights from the community

Others also viewed

I Let AI Analyze My Medium Stats…and Here’s What I Found

How is Artificial Intelligence Making Analytics Smarter?

Understanding Retrieval-Augmented Generation (RAG) in AI

Data Intelligence and the Customization of LLMs: The New AI Frontier

Synthetic Data Generation for AI Projects

From RAG’s to Riches: Building a Generative AI feature

Impact on Business User: DigiXT GenAI features provide faster, more accurate decision-making.

From Pixels to Profits: How Synthetic Image Generation Changes Everything

Today is AI-Appreciation Day!

Unlocking Business Intelligence: Key Trends Shaping 2024

Explore topics

AI/ML Terms You May Not Know

Which LLM Knows Vector Databases Best?

Voyage AI Embeddings and Rerankers for Search and RAG

ICYMI: Unstructured Data Talk Recaps from Around the World

Recommended by LinkedIn

Upcoming Events

August 1: Unstructured Data Processing from Cloud to Edge (virtual)

August 2-4: AI-focused Touring Hackathon & Tech Fair (in-person)

August 5: San Francisco Unstructured Data Meetup (in-person)

August 8: Building an Agentic RAG locally with Milvus, Ollama, and Llama Agents (virtual)

August 13: South Bay Unstructured Data Meetup (in-person)

Zilliz Universe

2,706 followers

The Latest on Zilliz Cloud: Serverless, Migration Service, Fivetran Connector, Multi-replica, New AWS Region (Tokyo) and More!

Sep 18, 2024

Create a Movie Recommendation Chatbot with Zilliz Cloud & AI/ML Terms You May Not Know

Aug 8, 2024

Try Milvus 2.4 Features in Zilliz Cloud, Learn about vector embeddings, Check out an AI podcast, and more!

Jun 29, 2024

AWS GenAI Competency Partnership, Beginner Guides, and Lots of AI events!

May 22, 2024

Zilliz X Intel Vision: Zilliz CEO Panel Talk and Announcement!

Apr 18, 2024

Milvus 2.4 is here, Latest RAG articles, Zilliz Cloud on Azure Marketplace, and SO many March and April virtual and in-person events!

Mar 21, 2024

New Upgrades to Zilliz Cloud, a look into Cardinal Vector Search Engine, Meetup recap, Intel and Milvus collaboration, and Valentine’s Day poems

Feb 21, 2024

LlamaIndex now integrates with Zilliz Cloud Pipelines, Learn about Vector Databases, Open Source Advent Winners, New contest, and Events

Jan 17, 2024

Open Source Advent is Ending, Pipelines are now on Zilliz Cloud, Evaluate Vector Databases, Looking at 2024, and Events!

Dec 20, 2023

Advent of Code, Milvus 2.3 Beta, Zilliz Cloud on Azure, Discord, and December Events

Nov 30, 2023

Insights from the community

Others also viewed

I Let AI Analyze My Medium Stats…and Here’s What I Found

How is Artificial Intelligence Making Analytics Smarter?

Understanding Retrieval-Augmented Generation (RAG) in AI

Data Intelligence and the Customization of LLMs: The New AI Frontier

Synthetic Data Generation for AI Projects

From RAG’s to Riches: Building a Generative AI feature

Impact on Business User: DigiXT GenAI features provide faster, more accurate decision-making.

From Pixels to Profits: How Synthetic Image Generation Changes Everything

Today is AI-Appreciation Day!

Unlocking Business Intelligence: Key Trends Shaping 2024

Explore topics