Email Notification Logs: Story of building a scalable email delivery tracking system

Email Notification Logs: Story of building a scalable email delivery tracking system

Introduction

This blog discusses the journey of building Email Notification Logs functionality in Jira Service Management with the goal of extending it to all Jira family products (like JSW and JWM). This functionality provides a comprehensive view (or logs) of email delivery to the customer’s mailbox and detailed reasoning for any failures. This capability plays a crucial role in bringing transparency to the system, ultimately leading to faster issue identification and resolution.

As the tech lead on this project, my roles and responsibilities include ensuring the success of every stage of the software development lifecycle. This starts with product ideation, shaping feature aesthetics and functionality, and moves on to tech solution designing and finally to execution while prioritizing top-notch quality.

Brief about Email Notification logs

Jira Service Management by Atlassian guarantees that end users stay updated on account activity, such as customer invites, and Jira requests activity, such as new request creation, comments addition, status change, etc through email notifications. These notifications are essential for keeping end users and Jira agents aligned which is required for keeping the business continuity. Therefore, it is imperative to have the ability to track failed email notifications. Earlier, email tracking functionality was missing, and customers had to reach out to Atlassian support to get the problems resolved, which takes significant time and incurs a huge support load. Thus, as part of Email Notification Logs, we have bridged this key gap.


Quick Glance on failure logs

Now that we understand the feature, we are ready to dive deep into the design of the email log system

Architecture Design

The architecture is built on an event-based design following the CQRS pattern. It stitches together three important parts:

  • Data collection,
  • Data processing and storage,
  • Serving read queries.


Email Delivery Tracking Architecture


Let's walk through the above architecture and learn how we were able to build and deliver it quickly by leveraging platform components and adhering to best design practices.

1. Data Collection

Steps 1-3 in the Architecture diagram

Data from multiple sources is collected to build notification tracking functionality, which is later published to the event bus platform, designed to allow service-to-service, decoupled communication through events. These sources are

  1. Internal system failures, i.e. failures at product level JSM and JSW, to capture internal failures. Examples of such failures can be the issue itself being deleted or the JSM project being deleted.
  2. External system failures, i.e. failures from the email service provider (aka ESPs like Sparkpost) to capture the final delivery status, contribute to the majority of email failures. Examples of such failures can be a mailbox full, bounce, or an email address added to a suppression list due to consistent soft bounce.

2. Data Processing amp; Storage

Steps 4-5 in the Architecture diagram

Events from the event bus are consumed by the new Kotlin-based Logs service, which triages events, enriches them, and stores them in DB.

The data is stored in a platform datastore built on top of Amazon DynamoDB, which offers excellent features such as data residency, reliability, and scalability. This ensures that developers only need to focus on building incredible features and can safely leave all the heavy lifting to the platform.

3. Serving Read Queries

Steps 6-7 in the above diagram

All read queries come via Graphql gateway (another platform component ) to the Logs service where after request validations and appropriate permission checks, data is fetched from DB and served back to the users. As enrichment is performed asynchronously during write operation, it results in low latency reads, leading to a better user experience. FE is built in React and includes support for infinite scrolling functionality. This feature automatically loads the next page as the user reaches the end of the current page, eliminating the need to manually click on a "next" button.

Architecture Scalability amp; Extensibility

Why worry about Scalability & Extensibility?

As Notification logs is a requirement not just for JSM but even for other Jira family products like Jira Software. Thus it is crucial to design a system with scalability and extensibility in mind.

To provide a quick snapshot of our scale, we manage and send over 8 million notifications daily. Out of total volume around 4% fail at ESPs due to reasons like mailbox not being available, email on the suppression list, etc.

For scalability, Architecture follows the CQRS design pattern, which makes it easy to scale read and write separately. We use the web-worker model here. The consumer node processes incoming write events leveraging the async worker model and stores them in the platform datastore. On the other hand, Web nodes are responsible for handling incoming read requests with an API layer written in Grapqhl.

For extensibility, Low level design keeps a generic schema across both Read & Write layers i.e.

  • Read or Graphql schema
  • Write or DB schema

Both Graphql and DB schema contain fields like Project ID, Issue ID, Notification Type, Notification Subtype, etc. These fields are common to all Jira family products, making it product-agnostic and, therefore, easily extensible.

Summary

In summary, this article has explored how, in Atlassian, we developed a highly scalable email delivery tracking system in JSM that can be leveraged by other products in the Jira family. This was achieved by following industry-standard processes and techniques, which, in a nutshell, include the CQRS pattern and adopting a generic schema for easy extensibility. Finally, the advantage of using NoSQL DB is that it is a great choice for storing and reading data irrespective of its size.

Last, I Hope you enjoyed reading this article. Stay tuned for more insightful tech articles.



Ashish Jaiswal

Senior Engineering Manager at Atlassian

7mo

Very proud. Excellent team work. Thanks and Kudos to everyone involved in making this happen. 👏

Tarun Kumar Jaiswal

Vice President - Cloud Engineering @Sprinklr (NYSE: CXM)

7mo

Great article Deep!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics