Hybrid Search: The Next Frontier Beyond Vector Search!
We’re diving deeper into both full-text and vector search, exploring how they both can be used to build and deploy generative, real-time AI applications.
In this highly competitive AI era, automation and data is king. The ability to efficiently automate the process of search and retrieval of information from vast repositories has become crucial. As technology advances so do the methods of information retrieval, leading to the development of various search mechanisms.
With generative AI models becoming the center of attraction, applications need solid search and retrieval techniques. Among these, if the old full-text search has the trust factor, vector search on the other hand is emerging as the advanced search technique.
Today, we will explore both full-text and vector search, and see how these can be used in today's digital landscape.
What is full-text search?
Full-text search is a powerful technique for finding specific information within large amounts of text data. Unlike simple keyword searches, which only look for exact matches, full-text search analyzes the entire text of documents and understands the context of your query. This allows it to find relevant results, even if the query doesn't use the exact keywords you searched for.
Here's how it works
What is vector search?
Vector search is the most pressing need for most of the generative AI applications. It retrieves contextually relevant information by understanding machine and human language, understanding the meaning of what users want in return for his/her query. This approach is in high-demand and receiving high praise from generative AI industry experts and organizations. Vector databases use this approach to retrieve the semantically correct information for the users queries.
For example, users don’t need to know exact words while retrieving the information — even if they know some similar words, vector search can retrieve the near accurate results. This is especially useful wherever information search needs a human touch, like an eCommerce application.
By aligning more closely with the way humans think and communicate, it opens up new possibilities for more natural and efficient interactions between users and AI systems. As this technology continues to evolve, its impact is expected to grow, further cementing its role as a cornerstone of modern information retrieval strategies in the generative AI industry.
Vector search boasts impressive feats:
But like anything else,vector search has its quirks. Training the models and calculating those fancy vectors can be computationally expensive. And while it excels at understanding meaning, sometimes a precise keyword search is all you need.
How vector search works
Here's a simplified explanation of how vector search works:
Note: SingleStore provides direct support for Dot Product and Euclidean Distance using the vector functions DOT_PRODUCT and EUCLIDEAN_DISTANCE, respectively. Cosine Similarity is supported by combining the DOT_PRODUCT and SQRT functions.
Full-text search vs. vector search: Who wins?
While full-text search excels at precision and speed, and vector search unlocks semantic understanding, a hybrid approach emerges as the true champion. Imagine a search that understands your precise keywords like "red shoes" but also finds those comfy crimson sneakers you didn't mention. This combination delivers highly relevant results — even when you don't use perfect phrasing.
Think of it as the best of both worlds: accuracy meets serendipity, ensuring you never miss out on hidden gems just because they weren't spelled out exactly. In essence, hybrid search transcends limitations — pushing the boundaries of information retrieval to deliver an experience that's both precise and pleasantly surprising.
SingleStore supports hybrid search
In the realm of information retrieval, a new force has emerged: hybrid search. SingleStore is leading the way, empowering developers to craft rich AI and analytical applications that harness the combined strengths of vector search and full-text search.
What does that mean for you when building AI applications? You’re no longer forced to choose between robotic precision and nuanced understanding. SingleStore bridges this divide, enabling you to unlock the full potential of search and deliver truly meaningful experiences.
SingleStore revs up information retrieval with indexed vector search. This game changing feature seamlessly blends lightning-fast vector search, precise full-text search and cutting-edge indexing techniques — all powered by Approximate Nearest Neighbor (ANN) search. Get ready to experience 100-1,000x faster search and accuracy when navigating the vast seas of data.
Full-text search with SingleStore
Activate your free SingleStore trial to see how full-text search works — follow along with these steps. Once you sign up, create a workspace.
Let’s get started with SQL Editor.
Start running the following SQL queries in your SQL Editor.
First, create a database and table that includes a FULLTEXT index on the columns you want to search.
CREATE DATABASE fulltext_search;
USE fulltext_search;
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title, body)
);
Next, insert some example data into the table you've created.
Recommended by LinkedIn
INSERT INTO articles (title, body) VALUES
('The Power of Big Data', 'Harnessing big data for insights, innovation, and decision making.'),
('Robotics in Everyday Life', 'The increasing presence and impact of robots in daily activities.'),
('Genetic Engineering: Pros and Cons', 'The ethical and practical considerations of genetic modification.'),
('Nanotechnology: A Small Revolution', 'The potential and challenges of advancements in nanotech.'),
('The Art of Podcasting', 'Exploring the surge in popularity of podcasting as a medium.'),
('The Impact of 5G Technology', 'Understanding how 5G will transform connectivity and communication.');
('Mental Health in the Digital Age', 'Addressing mental health challenges in an increasingly digital world.'),
('The Future of Online Education', 'How online learning platforms are reshaping education.'),
('E-Sports: More Than Just Games', 'The rise of e-sports as a major form of entertainment.'),
('Electric Planes: Taking Off Soon?', 'Examining the feasibility and challenges of electric aircraft.'),
('The Science of Sleep', 'Understanding the importance and mechanics of sleep for health.'),
('AI in Agriculture', 'How artificial intelligence is revolutionizing farming practices.'),
('The Ethics of Surveillance Tech', 'Debating the moral implications of surveillance technologies.');
If you have just inserted data and want to ensure the full-text index is up-to-date before querying, you can execute the OPTIMIZE TABLE command with the FLUSH option.
OPTIMIZE TABLE articles FLUSH;
After inserting the content, you can perform a full-text search using the MATCH AGAINST syntax to retrieve relevant articles based on a search term.
SELECT id, title, body
FROM articles
WHERE MATCH(title, body) AGAINST('search term');
If I add my search term as ‘ethical’ and search for the relevant information/document, I get the following result.
Vector search with SingleStore
We will use our SQL Editor, creating a new database and table with a vector field.
CREATE DATABASE VectorSearchTutorial;
We will switch to the newly created database.
USE VectorSearchTutorial;
Assume you're working with text data where each text entry has been converted to a vector using some text embedding process.
CREATE TABLE vector_data (
id INT PRIMARY KEY AUTO_INCREMENT,
text VARCHAR(255),
vector BLOB
);
Insert some text data along with its corresponding vector representation into the table. You would typically generate these vectors using an external tool or library that produces vector embeddings from text data.
INSERT INTO vector_data (text, vector)
VALUES
('Sample text 1', JSON_ARRAY_PACK('[0.1, 0.2, 0.3, 0.4]')),
('Sample text 2', JSON_ARRAY_PACK('[0.5, 0.6, 0.7, 0.8]')),
('Sample text 3', JSON_ARRAY_PACK('[0.9, 0.1, 0.8, 0.2]'));
Create a query vector representing the text you want to search for. Then use a vector similarity function like DOT_PRODUCT to compute the similarity between the query vector and the vectors in your table.
SET @query_vector = JSON_ARRAY_PACK('[0.15, 0.26, 0.36, 0.46]');
SELECT id, text,
DOT_PRODUCT(vector, @query_vector) AS similarity
FROM vector_data
ORDER BY similarity DESC
LIMIT 3;
The query result will be as follows:
To calculate the Euclidean distance between vectors in SingleStore, you can use the EUCLIDEAN_DISTANCE function, which is designed for this purpose.
SET @query_vector = JSON_ARRAY_PACK('[0.15, 0.26, 0.36, 0.46]');
SELECT id, text,
EUCLIDEAN_DISTANCE(vector, @query_vector) AS euclidean_distance
FROM vector_data
ORDER BY euclidean_distance ASC
LIMIT 3;
The query result will be as follows:
You can store vector data in SingleStore easily.
You can run a query to find the similarity scores
You should see the retrieved similarity data that matches the query and respective scores.
A complete hands-on tutorial of using SingleStore as a vector database and retrieving similar data using cosine similarity can be found in our recent article, “A Deep Dive into Vector Databases.”
SingleStores latest new features for vector search
We are thrilled to announce the arrival of SingleStore Pro Max One of the highlights of the release includes vector search enhancements.
Two important new features have been added to improve vector data processing, and the performance of vector search.
Indexed ANN vector search facilitates creation of large-scale semantic search and generative AI applications. Supported index types include inverted file (IVF), hierarchical navigable small world (HNSW) and variants of both based on product quantization (PQ) – a vector compression method. The VECTOR type makes it easier to create, test, and debug vector-based applications. New infix operators are available for DOT_PRODUCT (<*>) and EUCLIDEAN_DISTANCE (<->) to help shorten queries and make them more readable.
Digital Data Analytics | GenAI COE | Diversity and Inclusion Leadership Council |DevOps|Cloud|
9moOur ranking system should include business knowledge, not just limited to the search.
Project Manager at Wipro
9moWow, HybridSearch sounds like the ultimate tag team for information retrieval. Can't wait to learn more!