MongoDB Indexing - How to Index Arrays, Nested Data and GeoLocations?
Hey everyone, back with another MongoDB blog, this time for people who want to learn what MongoDB indexes are and how you can build various types of indexes in MongoDB.
You might already know about indexes in databases such as clustered indexes and non clustered indexes. What you might not know that there are more types of Indexes, most of which are supported by MongoDB out of the box..!
If you still don't subscribe to this newsletter, click on the title name and you'd find the Subscribe button ❤️
Quick Background on Indexing
Feel free to skip this if you already know about Indexing.
Database indexes are special data structures used to store location for your data so that your read queries can perform faster with in-memory lookup rather than going to disk and checking each record. The indexes prune your lookups so that you get to your result records in as less time as possible.
Keep in mind, building indexes means keeping your index consistent with your disk. This means that writes will now be slower than before because they have to update at 2 places, and data structures used for indexing might take time to update (balancing trees).
Types of Indexes in MongoDB
Let's see the various types present in Open Source MongoDB, that means, which are supported by the database out of the box and does not relate to their managed service - MongoDB Atlas or any other cloud service.
Single Field Indexes
Let's say you have a Users collection and you would always need to find users by their age. That is the only query, therefore you'll be building a single field index.
db.Users.createIndex( { age: 1 } )
As you see, we create an index and give it a direction (1) which says it should be an ascending index. Why important? We'll cover that later in compound indexes, but it creates a Sorted Index using B+Trees internally on the field age, and only maintains those documents where this field is present.
As it is sorted, you would know that we can use Binary Search to find the required values in O(logn) which would give us the location of the record we need to fetch from the database.
Yes, B+Trees are BSTs which use Binary Search as a Tree Traversal and not in an Array.
Similarly, if we need all user between a range of ages, we can find the starting and end point, and then travel iteratively to access all records one by one.
Remember, B+Trees also have a linked list type of relationship with all leaf nodes, hence traversal is O(n)
Compound Indexes
Now, let's say we need to have a different query where we need to find all users by their age as well as country. Now, we can't create 2 indexes one on each of the field. Why so? Because a query can always use only 1 index at a time.
Think of the Tree, when you traverse the tree and reach at the lead node, you can't switch to another tree as you won't have context for the AND clause (country AND age).
To do this, we create a single index with multiple fields, saying that we will include both age and country in it.
db.Users.createIndex( { age:1, country:1 } )
Now, the order of these are very important. Let's take another example where need to query on name of the User too (in the same query) -
db.Users.createIndex( { age:1, country:1, name:1 } )
Now, the magic of this index is that it can solve 3 types of queries which can include these columns - [age, country, name] or [age, country] or [age].
Do you see a pattern here? Any prefix of this index can also be used to answer some queries. Why is it so? This is because when we build this index using B+Trees, the first field (age) will have its own B+Tree, then the leaf nodes will point to root of another B+Tree (based on country) and then name. This way, the tree under the leaf of [age=10 and country=India] will only contain those names which already satisfy these 2 conditions..!
Recommended by LinkedIn
This means, you should try to club fields together to create compound indexes when possible, rather than single indexes.
There are more complicated things you should take care when you are building indexes such as ESR rules, order of these fields, index hinting to MongoDB and more..! For these, I recommend you to subscribe to my newsletter (of which this article is a part) and you will not miss the updates when that comes out..!
MultiKey Indexes
Every thought how you can index Arrays and nested objects inside Arrays in MongoDB? Well, MultiKey indexes are here just for the same purpose..! Let's say each user has multiple payment methods such as Visa, Mastercard, PayPal, etc all in an array in the object -
{
"id": "user_1",
"paymentMethods": [
{
"type": "VISA",
"details": ...
},
{
"type": "MASTERCARD",
"details": ...
},
{
"type": "PAYPAL",
"details": ...
}
]
}
Now let's say you need to find how many users use Mastercard. You would need to query the nested objects inside the array of paymentMethods such as -
db.Users.find( { "paymentMethods.type": "MASTERCARD" } )
Now, if we don't create an index for this, queries would need to check each record in the database, we don't want that. Also, we can't index the array such as -
db.Users.createIndex( { paymentMethods: 1} )
as this means we need to query on paymentMethods and not the internal data. Over here we need a Multi-Key Index, which means, that MongoDB would be indexing each element of the array separately and each item get's it's own index key. This helps in effectively querying array data and knowing that which records consist of a certain element..!
db.Users.createIndex( { "paymentMethods.type": 1} )
MongoDB automatically identifies this as an Array field and creates a MultiKey index, where you do not have to do anything extra. 😇
Geo Indexes
If your application has Geo Locations, you can index those fields with a Geo index. MongoDB provides a 2D-sphere index for you to index your points on an Earth like sphere. This helps you to identify points close by, intersection of geometries and so much more on the Earth like coordinate system (Lat Long).
You can create a 2dsphere index like this -
db.places.createIndex( { place : "2dsphere" } )
Remember that you need to index your points as a GeoJSON point -
{
"place": {
"type": "Point",
"coordinates": [-73.856077, 40.848447]
}
}
Or as a legacy pair of Lat Long -
{
"place": [-73.856077, 40.848447]
}
The important thing to remember is that you have to add the Longitude first and then the Latitude..!
Conclusion
As we say, learning about Indexes is very important to understand how they work and why we would need them in which scenario. Key Takeaways -
I request you to Like ❤️ and Share this post with your network so that they also learn something new this time..!
Senior Developer at Capgemini | Java | Spring Boot | Microservices | Azure | Data Structure and Algorithm | Object Oriented Design | Architecture Design
2yIts written that "B+ tree is a BST" but in general, B tree is a M-ary Search Tree, BST is subset of B tree and B+ tree is a B tree + added feature (leaf nodes are linked). Isn't it? or have I understood wrong?
SDE 2 @ M2P | Serving Notice Period | Ex-Dell | Engineering Grad'21
2yDoes Mongo have a write amplication issue with indexing? Just like another popular 🐘SQL database.