Understanding API performance jargons
Cover image generated with Microsoft Designer

Understanding API performance jargons

Power of 2

Memory allocation often uses sizes that are powers of 2 for efficiency. When handling large volumes of API data, memory management can be crucial, and using memory sizes that are powers of 2 can help optimize performance. To obtain correct calculations, it is critical to know the data volume unit using the power of 2. A byte is a sequence of 8 bits. An ASCII character uses one byte of memory (8 bits)

Example: Allocating 2^10 (1024) bytes of memory instead of 1000 bytes.


Latency

Latency refers to the time it takes for a data packet to travel from the source to the destination. Below are the few latency numbers that every CS guy should know:

Important Latency Numbers

Key Components of Latency:

  1. Propagation Delay: The time it takes for a signal to travel through the medium (e.g., fiber optic cable, wireless). This depends on the distance and the speed of the signal in the medium.
  2. Transmission Delay: The time it takes to push all the packet's bits onto the wire. This depends on the packet size and the transmission rate of the network.
  3. Processing Delay: The time taken by network devices (such as routers and switches) to process the packet header and decide where to forward the packet.
  4. Queuing Delay: The time a packet spends waiting in queue before it can be transmitted. This happens in network devices when there is congestion.

Total latency is the sum of all these individual delays:

Total Latency = Propagation Delay+Transmission Delay+Processing Delay+Queuing Delay


Availability Numbers:

Availability numbers are metrics used to quantify the reliability and uptime of a system. High availability is the ability of a system to be continuously operational for a desirably long period of time. High availability is measured as a percentage, with 100% means a service that has 0 downtime. Most services fall between 99% and 100%.

Common Availability Metrics:

  • 99% Availability: Often referred to as "two nines," translates to approximately 3.65 days of downtime per year.
  • 99.9% Availability: Known as "three nines," equates to about 8.76 hours of downtime per year.
  • 99.99% Availability: Called "four nines," means roughly 52.56 minutes of downtime per year.
  • 99.999% Availability: "Five nines," represents around 5.26 minutes of downtime per year.
  • 99.9999% Availability: "Six nines," corresponds to approximately 31.5 seconds of downtime per year.

A service level agreement (SLA) is a commonly used term for service providers. This is an agreement between you (the service provider) and your customer, and this agreement formally defines the level of uptime your service will deliver. Cloud providers Amazon, Google and Microsoft set their SLAs at 99.9% or above. Uptime is traditionally measured in nines. The more the nines, the better.


Response time:

Response time refers to the amount of time it takes for a system to process a request and return a response to the client. It is a critical performance metric that affects the user experience and the perceived efficiency of the application.

Components of API Response Time:

  1. Network Latency: The time taken for the request to travel from the client to the server and for the response to travel back to the client. Network latency depends on the physical distance between the client and server, network congestion, and the speed of the network.
  2. Server Processing Time: The time taken by the server to process the request.
  3. Queue Time: The time the request spends waiting in a queue before being processed by the server. This can happen when the server is under heavy load and cannot immediately handle the incoming request.
  4. Serialization/Deserialization: The time taken to convert the data to and from formats suitable for transmission over the network (e.g., converting objects to JSON).

Measuring API Response Time:

Response time is typically measured using tools and metrics such as:

  • Average Response Time: The mean time taken to respond to all requests over a period.
  • Median Response Time: The midpoint response time, which helps in understanding the typical experience excluding outliers.
  • 95th/99th Percentile Response Time: These metrics indicate that 95% or 99% of requests were handled within a specific time frame, highlighting the upper end of response times.

Example:

If an API receives a request to fetch user details, the response time will include the time taken for:

  • The request to reach the server (network latency).
  • The server to parse the request and validate input.
  • Retrieving user details from the database.
  • Formatting the response.
  • Sending the response back to the client (network latency).


Throughput:

Throughput is a key performance metric used to measure the amount of data processed by a system or network over a given period of time. It is particularly relevant in the context of APIs, networks, databases, and other data processing systems. Throughput is typically expressed in units such as requests per second (RPS), transactions per second (TPS), or bits per second (bps).

Key Aspects of Throughput:

  1. Measurement
  2. Importance
  3. Factors Affecting Throughput
  4. Improving Throughput

Examples:

  • API Throughput: An API service might measure throughput as 5000 requests per second, indicating that it can handle 5000 API calls every second.
  • Database Throughput: A database might achieve a throughput of 1000 transactions per second, meaning it can process 1000 database transactions every second.
  • Network Throughput: A network connection might have a throughput of 100 Mbps, indicating it can transfer 100 megabits of data per second.

Relationship with Other Metrics:

  • Throughput vs. Latency: While throughput measures the volume of work processed over time, latency measures the time taken to process a single request. High throughput and low latency are often both desirable but can sometimes be in tension.
  • Throughput vs. Response Time: Response time includes the entire duration from the request being made to the response being received. High throughput can help reduce response time by efficiently handling more requests in parallel.


References:



suraj kumar

Engineering Manager | Full Stack | System Design | DSA

5mo

Great post, Gaurav! You've nailed the key API performance jargons in an easy-to-digest way. The power of 2 concept is spot on for memory allocation efficiency, especially with large data volumes. Your breakdown of latency types is excellent, though a quick note on how physical distance between servers affects latency could add more depth. The explanation of availability numbers is clear and could benefit from real-world examples, like streaming services, to make it even more relatable. Response time is critical for user experience, and mentioning tools like New Relic for real-time monitoring would be a practical addition. Lastly, your take on throughput is perfect, and adding insights on scaling, both horizontally and vertically, could offer practical tips. Overall, a fantastic guide for understanding API performance. Keep sharing these valuable insights!

Sukritee Singh

Frontend Engineer | React Developer| Next js Developer | Javascript Developer

5mo

Really good content !

To view or add a comment, sign in

More articles by Gaurav B.

Insights from the community

Others also viewed

Explore topics