Shoutout to the amazing NAVER Corp Engineering Team (김영진 and 이모원Moweon Lee) for sharing this in-depth breakdown of how this dominant force in South Korea’s internet industry solved their analytics challenges with StarRocks!
NAVER manages 20+ PB of data in its Apache Iceberg Lakehouse, powering real-time insights across 200+ interconnected services in search, e-commerce, AI, and more.
The Challenge
NAVER’s ClickHouse-based analytics system initially provided fast aggregation but struggled as their needs evolved:
⚠️ Fixed Dimensions – Lack of JOINs forced rigid denormalized tables, limiting analytical flexibility.
⚠️ Scalability Bottlenecks – Balancing data across nodes required manual intervention, making scaling inefficient and resource-intensive.
⚠️ Limited Data Upserts & Deletes – Merge-on-read severely degraded performance, restricting support for real-time mutable data.
Why StarRocks?
To overcome these limitations, NAVER benchmarked Trino, Pinot, Druid, and StarRocks based on key criteria like multi-table JOINs, upsert performance, federated analytics, and scalability. After extensive testing, StarRocks emerged as the best solution due to:
✅ Native multi-table JOINs – Eliminates the need for denormalized tables, enabling flexible, real-time analytics.
✅ Federated Analytics – Seamless integration with Apache Iceberg, Hive, and other data sources.
✅ Superior Aggregation Performance – Matched or exceeded ClickHouse’s speed while handling dynamic workloads.
✅ Cloud-Native Scalability – Decoupled storage-compute architecture simplifies horizontal scaling in Kubernetes.
✅ Efficient Real-Time Upserts – Enables fast updates without impacting query performance, a key requirement for NAVER.
Further Optimizations
NAVER optimized query performance using StarRocks’ materialized views, achieving:
🔹 6x faster performance on multi-table and aggregated queries.
🔹 Automated Query Rewrite & Refresh – No manual SQL modifications required for optimizations.
The Impact
✅ Real-Time Interactive Queries – Engineers can now directly query raw data using SQL, improving analytics flexibility.
✅ Superior JOIN Performance – StarRocks’ native JOIN engine enables multi-table analytics at scale.
✅ Unified Query Platform – Supports a hybrid analytics ecosystem, integrating Iceberg, Hive, and real-time data.
✅ Seamless Scalability – Kubernetes-native design ensures linear scaling, optimizing resource efficiency and cost.
👉 Read the full story here: https://lnkd.in/gGt_g4Yq
#DataAnalytics #DataEngineering #RealTimeAnalytics #DataLakeAnalytics #DataLake