An experience with Elastic Search
It has been quite some time; I had not written much and had been thinking about some of the new endeavors that have come about. About 2 years back I took a new job with a multinational technology company as the Director of Product Engineering R&D for their Analytics Platform backed products. This product (along with a few others) in recent years has been part of their new, strategic, corporate level initiative to make Software and EAI (Enterprise Asset Intelligence) front and center for the organization. Our CEO on one of the Mad Money shows with Jim Cramer, uniquely mentioned the product as part of company’s core strategy. Ah! in short, I love the company, I love the people I work with!!!
As part of the initiatives to modernize the product, the plan was to build a fresh, state of the art application architecture to enable the growth that was being anticipated for the products. The new architecture included the usual suspects like Microservices, Containerization, REST, Angular etc. Along with these changes there was also a push for changing the Application Storage from Cassandra into Elastic Search. Migrating to ES (Elastic Search) would have been a huge risk considering this would have resulted in a radical shift from relation data model like to a pure document data structure. This also meant that the entire Application layer would now be changing, meaning, everything had to be rebuilt from the scratch. After careful consideration, several discussions with various teams, architects, technical team members and POCs, as the potential risks outweighed the benefits in the long run, I approved this critical architectural change.
Some of the key highlights of my experiences on the road to transition from Cassandra to Elastic Search could potentially serve as good pointers for others who may be considering or even may be in the process of transition.
· Identifying and doing all the homework before choosing what to (what not to) use as the document type for ES indexes (keep in mind the doc type may become obsolete soon)
· How to solve the situations where we traditionally use joins in relational DB. This is a great puzzle to be solved, at the end of the day we are creating metrics, need to connect disparate pieces of information to generate KPIs.
· Having the right level of training to the development community or acquiring the talent with the skills. This can be debilitating in the short run and can potentially prove to be a spoiler in the longer run.
· Identifying the number of shards and nodes to be used for the indexes and how to distribute the shards among the nodes etc. This is probably one of the most crucial design aspects of the Elastic Search architecture.
· Naming conventions for indexes and maintenance can get challenging. In the beginning it may feel like we would only have a few indexes (because of denormalization) but soon we would realize that the number grows, and it has the potential of becoming a maintenance nightmare.
· How to manage the older data, how to handle this problem while not impacting the performance but while keeping the ability to readily access the older data. There are cost, performance and availability implications to this.
· Making the best use of JOLT transformations to prepare the responses in the middle tier best suited the business application. This would play critical role in how the information is represented back to the application layer. In situations where you would need perform double operations with ES, this can be a saving grace.
· Unfortunately, the name does not say it all, there can be painful experiences with and pitfalls in using Painless scripting. This if not used wisely, can be a performance hog and tracing and debugging can be tedious.
· How to utilize the scripted Metrics aggregations, this is one of the very powerful features in aggregations and calculations. But can also be very expensive if not done right.
· Elastic Search touts its ability to cache the queries/data as one of the most prominent ES features. Writing DSLs/queries for caching is in important aspect of utilizing this feature well, otherwise this could become a performance hazard.
· This may sound simple or a no brainer, but Parameter ordering in the queries can make big differences in performance. Choosing the broader filtering parameters first in order can reduce the data sets being processed and thus boost the performance.
Figure 1-High Level Architecture
It has been a great journey with a number of learnings during this period for me and my team. One of the most important considerations would be is to look at the level of expertise that exists on the team and the availability of expertise within the organization in surrounding teams. In my experience the elastic search DSL query writing skills could became a bottleneck for the throughput of the team even post transition for all feature development. The lack of skills also may play a vital role in some of the mistakes made initially which by the way will prove to be very expensive to remedy at a later stage.
From an architectural standpoint, moving into Elastic Search has proven to the one of the best architectural choices made. The architecture is now able to scale better with the new market needs as they arrive. Also, from a performance perspective, the new architecture has been able to maintain the level of optimal performance in terms of response times to our user community.
Kudos to all our team members who worked very hard on this re-architecture initiative producing a brand-new product architecture, framework and the UX architecture for our product. We released these newly minted products to the market recently and have received a great response. I hope the information shared was useful to the potential teams and individuals embarking on ES journey. Wish the readers all the best in their endeavors with Elastic Search!
Any views and statements expressed in this article are of purely individual nature. This article and any part of it should in no way be considered as representative of any organizations I have worked for in the past, working for currently or may work for in future.
Business Development and Lead Generation Specialist
5yInteresting write up on the benefits of elasticsearch. How long has the system been implemented at this point?
SVP, HiTech at Tech Mahindra
5yGreat insights Pramod