Build Your Own Multimodal Image Search Demo, Choose the Right AI Model for your GenAI Application, and More!
In this issue:
🔎 Multimodal RAG with Milvus
Create your own multimodal image search demo powered by…
🐦 Milvus for efficient retrieval
👁️ Visualized BGE model for precise image processing and matching
🔁 GPT-4o for advanced ranking
See how the final result would look like through the interactive demo here.
Build your own with the tutorial here.
🫵 AI Models for Your GenAI Apps
❌ MYTH: It doesn't matter what embedding model you use.
✅ FACT: To get optimal and accurate search results, choose an embedding model that is training on similar data to create your embeddings. Pay attention to if it's designed for image, search or another type of unstructured data.
Some examples:
▶️ Jina AI / jina-embeddings-v2-base-en
Specialized embedding model for English text and long documents; support sequences of up to 8192 tokens
▶️ Voyage AI / voyage-large-2
Voyage AI's general-purpose text embedding model optimized for retrieval quality (e.g., better than OpenAI V3 Large). It is also ideal for tasks like summarization, clustering, and classification.
▶️ Cohere / embed-multilingual-v3.0
Tailored for multilingual text and is a member of Cohere's newly released Embed V3 model family. It supports 100+ languages and can be used to search within a language (e.g., search with a French query on French documents) and across languages (e.g., search with a Chinese query on Finnish documents).
Milvus 🤝 Cohere
Create a question-answering system based on the SQuAD dataset using Milvus as the vector database and Cohere as the embedding system.
Steps (follow the code here):
1. Prepare the dataset
In this example, we use the Stanford Question Answering Dataset (SQuAD) as our truth source for answering questions. This dataset comes in the form of a JSON file and we use pandas to load it in.
2. Create a collection
Within Milvus, we need to set up a collection and index it.
3. Insert data
Once we have the collection set up we need to start inserting our data. This is done in three steps
1️⃣ reading the data,
2️⃣ embedding the original questions, and
3️⃣ inserting the data into the collection we’ve just created on Milvus.
In the example, the data includes the original question, the original question’s embedding, and the answer to the original question.
4. Answer Questions
Once all the data is inserted into the Milvus collection, we can ask the system questions by taking our question phrase, embedding it with Cohere, and searching with the collection.
Recommended by LinkedIn
Example - performing a similarity search using Cohere embeddings
In this article, we embed the query ‘Who founded Wikipedia’ and use it to search a Milvus collection.
👥 Upcoming Events
Sept 9: The AI Alliance + The Unstructured Data Meetup (in-person)
Join us and The AI Alliance for an SF meetup on Sept 9 at GitHub! Spots are limited, save yours below.
▶️ Industrial Problem-Solving through Domain-Specific Models and Agentic AI: A Semiconductor Manufacturing Case Study with Christopher Cuong T. Nguyen & Shruti Raghavan from AITOMATIC
▶️ Introduction to Llama 3.1 with Amit Sangani from Meta
▶️ AI Alliance Working group for Materials and Chemistry (WG4M) by Jed Pitera from IBM
Sept 10-11: CIVO Navigate Europe (in-person)
See Zilliz Developer Advocate Stephen Batifol at the following sessions:
📈Sept 10th: Scaling Generative AI Solutions with Open-Source and K8s. We'll have a look at how Milvus makes it possible to do Vector Search at Billions+ scale.
👥Sept 11th: Panel Discussion about Berlin Tech Community with Sophia McKee Nele Uhlemann Benazir Khan and Kadir Keles
Reach out to him for a free ticket while supplies last!
Sept 12: Voxel51 AI, Machine Learning and Computer Vision Meetup (virtual)
Join at 10:00 AM PT virtually and listen to talks by The Julia Language, Voxel51, and Zilliz.
“It’s in the Air Tonight. Sensor Data in RAG” with Tim Spann , Developer Advocate, Zilliz
Sept 12: AI & Tech Talks Zoom HQ (in-person)
Join Zilliz and other companies at Zoom HQ in San Jose for the second event in their developer meetup series. They’re bringing together developers, products managers, and AI enthusiasts to hear from industry leaders and dive into some of the most exciting developments in AI and technology. Frank Liu , Head of AI/ML at Zilliz, will be speaking!
Sept 12: AI breaks Privacy: How PrivateGPT Fixes It (virtual)
AI tools store your prompts and documents, so compliance becomes a risk. Daniel Gallego Vico , Co-Founder of PrivateGPT, will talk about the:
Principal Developer Advocate | Zilliz (creators of Milvus, world's most popular open-source vector database) | AIM Stack for AI with Milvus vector database
3moSome really cool stuff coming next week. I cant wait