🚀 Key Lessons for Kubernetes Newcomers: From CKA/S Exam to Production!
When you're prepping for the CKA /CKS exam, you’ll notice a common pattern: most workloads are simple Deployments with just one replica. Sure, that’s great for learning, but the real world? It’s a whole different ball game. Let me share some must-know tips for those stepping into production environments! 🌟
---
1️⃣ Stateless vs. Stateful Apps
In the exam, everything feels like a breeze: stateless apps, one replica, no headaches. 🌀 But in production, things get real:
- Databases? Yep, they need storage.
- Logs? They’re not vanishing into thin air.
- App state? It’s not going to manage itself.
💡 Pro Tip: Master stateful workloads before going live. Your future self will thank you. 😉
---
2️⃣ Local Disks: The Good, the Bad, and What You Need to Know
Sometimes, local disks are the best choice for your workloads, and here’s why:
🔥 Speed: Local storage delivers high IOPS and low latency, making it a top pick for performance-critical applications. Think of databases or high-throughput systems that can’t tolerate slow storage.
📉 Network Load: If your app generates a mountain of logs 🏔️, keeping logs on local disks ensures your network stays smooth. No one wants logs to hog bandwidth!
🔒 Isolation: With local disks, each pod gets its own dedicated space. This prevents the risk of data from one pod interfering with another—critical for scenarios requiring strict separation of logs or state.
---
🛑 But Wait… There’s a Catch!
Scaling apps with local disks isn’t as straightforward as it seems. Imagine this:
- You’ve deployed 3 replicas of your app.
- You manually create PVs and PVCs for your deployment.
What happens? 🤔
🚩 Only one pod comes up. The others are stuck in a Pending state. Why?
> PVCs are bound to a single node. Other nodes in the cluster cannot access the PVCs tied to local storage.
If this is your first time working with local disks in Kubernetes, this behavior might leave you scratching your head or at least it did for me. 😵
that's one of the reasons sometimes we have to use statfulSet
The statfulset controller is going to create a pv and pvc for you ( provider's driver )
just neet to create a template with volumeClaimTemplates?
---
🛠️ Local Disks + CSI Drivers = A Winning Combo
If you’re set on using local disks (and for good reason!), you’ll need a CSI driver to get the most out of Kubernetes. Why?
But it’s not just about managing snapshots and replication—dynamic provisioning isn’t automatically enabled either. Whether you're using local disks or cloud-based storage, Kubernetes requires a provider's driver (like CSI) to handle dynamic provisioning of volumes. Without it, Kubernetes won’t know how to automatically create or manage the storage resources you need.
For someone like me who works primarily on-prem 🏢, choosing the right CSI driver is essential. I recommend these two:
1️⃣ OpenEBS:
- Great for complex setups where you need fine-grained control
- Perfect for scenarios like multi-tier storage or different volume backends (e.g., local disks, cloud storage, etc.).
2️⃣ Longhorn:
- It’s lightweight, integrates well with Kubernetes, and is ideal for scenarios where you want to get started quickly without diving into deep configurations.
---
The advantage of using one of the CSI drivers mentioned earlier is that if one of your nodes goes down, you won't lose your data.your data should be replicated or backed up across nodes, ensuring availability even if one node fails.
Let's try it and see what happens. My pod and volumes are now assigned to the second master node
My volumes are spread across three nodes
i'm going to take down my second machine in which my app dmc is running
and voila !
---
3️⃣ Shared Storage ≠ All Problems Solved
Using shared storage like NFS can make life easier. But don’t get too comfortable—it’s not all rainbows 🌈 and butterflies 🦋.
Imagine deploying a canary version of your app where every pod writes to the same shared log file, such as money_transactions.log. At first glance, this seems fine, but when multiple pods simultaneously write to the same log file, you run into concurrency issues. Without proper synchronization, logs from different pods can get mixed up, making it unclear which log line came from which pod. This could lead to corrupted logs, with data from one pod overwriting or jumbled with data from another pod. 📉
If you're thinking of adding pod names to the filenames to solve this (e.g., pod1_money_transactions.log), it might sound like a good workaround. But here’s the problem: pods aren’t static. As they restart or scale, they might get new names or identifiers, causing the log filenames to change unexpectedly. This can quickly lead to log management chaos, with files accumulating in unexpected ways.
So while NFS might seem like an easy solution for shared storage, it introduces complexities that need to be addressed—especially when you’re dealing with dynamic environments like Kubernetes.
Oh, and here’s a shocker for newcomers since we are talking about logs:
> A PV’s size is not a hard limit.
Yep, you read that right. 🤯 If your physical disk is 500TB and you give a client an 80GB PV, they could use the entire 500TB unless you enforce quotas. CSI drivers won’t save you here—plan accordingly! 🛡️
For example, the nfs-drb-resources volume should be 25G.
Let's access the container and check its size.
As you can see, it’s 48G, which is the total size of the exposed NFS directory.
Recommended by LinkedIn
Be careful and consider using a different mechanism to limit space usage.
---
4️⃣ Use StatefulSets for Stateful Workloads
When dealing with persistent storage or node-affinity needs (hello, local disks 👋), StatefulSets > Deployments. Why?
- 🧩 Unique Volumes: Each pod gets its own PVC automatically.
- 📦 Volume Stability: Pods can restart and still access their data.
- 🤝 Better Fit: Perfect for databases, message queues, or any app needing its own space.
Avoid the common pitfall of using Deployments with local storage. StatefulSets are your friend! 🌟
---
For your sanity, always back up and take snapshots—because you never know when things might go south! 🚨
for this example, I’ll focus on installing the Kubernetes Snapshot API, which allows us to work with snapshots without being tied to a specific CSI driver. While I’ll be using Longhorn under the hood for my setup, the concepts here are broadly applicable to other CSI drivers too.
With the Snapshot API, you can:
- Standardize snapshot management across different storage backends.
- Use Kubernetes-native VolumeSnapshot resources for creating and managing snapshots.
- Maintain flexibility in your choice of storage provider.
Even though Longhorn offers powerful features like volume replication and an intuitive UI, leveraging the Kubernetes Snapshot API ensures a consistent workflow, regardless of the underlying storage system. As far as I remember, the concept of Kubernetes snapshots was not included in the CKA exam, so I’m going to discuss it in detail.💡
Let see how the snapshot works
Volume snapshot CRD
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: showroom-drs-log-snapshot
spec:
volumeSnapshotClassName: longhorn
source:
persistentVolumeClaimName: showroom-drs-log
tada !!
and the backup is created
let's see the volumesnapshot details
as you can see the volumeSnapshotContent created
volumeSnapshot CRD (ReadyToUse: True)
the `snapshotHandle` is a field in the VolumeSnapshotContent resource that uniquely identifies the snapshot on the underlying storage system. It acts as a reference to the actual snapshot created by the CSI driver (e.g., Longhorn). This handle is crucial for managing the lifecycle of the snapshot, such as restoring volumes or deleting the snapshot.
The CSI snapshotter will call the CreateSnapshot gRPC method on the provider's driver, in this case, Longhorn. Longhorn will handle the creation of volume snapshots and reconcile their state.
CLuster once snapshot is ready to use
**Restoring
Create a new pvc
cat pvc_restore.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: showroom-drs-log-restored
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi # Adjust the size based on the original snapshot size
volumeMode: Filesystem
dataSource:
name: showroom-drs-log-snapshot # Name of the snapshot you want to restore from
kind: VolumeSnapshot
storageClassName: longhorn # Use the appropriate StorageClass
this is before creating the new pvc screenshot
the new restored pvc is created succesfully
CSI Snapshot CRD Relationships (Restore)
Restoring and reconciling state
the volumesnapshot are namespaced but not the volumes snapShotContent so if you delete the namespace be ready to create another volumesnapshot
This is Just the Beginning...
What we've covered here is just the tip of the iceberg. From mastering stateful applications to leveraging CSI drivers for storage, these are foundational skills, but they're only scratching the surface of what you need to know in a real-world Kubernetes environment.
As you continue your journey, networking becomes just as crucial. For example, in a production setup, companies often implement different VLANs for isolating traffic based on type—whether it's for management, application, or storage traffic. And that’s just one of the many networking complexities you'll encounter. From service meshes to multi-cluster communication, networking in Kubernetes can get highly intricate.
Beyond networking and storage, there are also considerations around security, resource management, and high availability that are key to running scalable and reliable applications in production.
So, while this article focused on storage, remember that the world of Kubernetes is vast, and there's always more to learn. Stay curious, and keep exploring the challenges that real-world environments bring. The more you know, the better prepared you'll be to tackle whatever comes your way in production. The journey has just begun!