Containerization is revolutionizing how applications are being planned, developed and deployed. While Kubernetes is very useful in aspects like scalability, portability, and management, it lacks support for storing state. What this means is that when a Container crashes, kubelet (node agent that runs on each node) will restart it, but the files will be lost – the Container starts with a clean state. Second, when running Containers together in a Pod it is often necessary to share files between those Containers. To allow the data to exist when the container restarts or terminates, there is a need to use a storage mechanism that manages data outside of the container.
This article provides some of the tools that help solve such storage problems that are inherent in Kubernetes and Docker.
“You can cut all the flowers but you cannot keep Spring from coming.”
― Pablo Neruda
OpenEBS is the leading open-source project for container-attached and container-native storage on Kubernetes. This tool adopts Container Attached Storage (CAS) approach, where each workload is provided with a dedicated storage controller. It implements granular storage policies and isolation that enable users to optimize storage for each specific workload. OpenEBS runs in userspace and does not have any Linux kernel module dependencies (docs.openebs.io, 2019).
OpenEBS allows you to treat your persistent workload containers, such as DBs on containers, just like other containers. OpenEBS itself is deployed as just another container on your host and enables storage services that can be designated on a per pod, application, cluster or container level.
Features of OpenEBS
Containerized Storage for Containers
As mentioned above, OpenEBS follows a Container Attached Storage architecture. Therefore, Volumes provisioned through OpenEBS are always containerized. What ensues is that each volume has a dedicated storage controller that increases the agility and granularity of persistent storage operations of the stateful applications.
Backup and Restore
Backup and restore of OpenEBS volumes work with the recent Kubernetes backup and restore solution such as HeptIO Ark. OpenEBS incremental snapshot capability saves a lot bandwidth and storage space as only the incremental data is used for backup to object storage targets such as S3. This makes OpenEBS such a powerful tool.
Prometheus Metrics for Workload Tuning
Monitoring of the Container Attached Storage Metrics is possible because OpenEBS volumes are instrumented for granular data metrics such as volume IOPS, throughput, latency and data patterns. With such monitoring possibilities, Stateful applications can be tuned for better performance by observing the traffic data patterns on Prometheus and adjusting the storage policy parameters without worrying about neighbouring workloads that are using OpenEBS.
Snapshots and Clones
Copy-on-write snapshots are a key feature of OpenEBS. What is sweeter is that operations on snapshots and clones are performed in completely Kubernetes native method using the standard kubectl command. No more commands while using OpenEBS for this. Moreover, the snapshots are created instantaneously and the number of snapshots that can be created is limitless.
This is a feature that implements High Availability because OpenEBS synchronously replicates the data volume replicas. This feature becomes especially useful to build highly available stateful applications using local disks on cloud providers services such as Google Kubernetes Engine and others.
Avoid Cloud Lock-in
When dealing with Stateful applications, the data is written to cloud provider storage infrastructure. Since each cloud provider has different implementation of their storage infrastructures, this results in the cloud lock-in of the Stateful applications. What OpenEBS does is that the data is written to the OpenEBS layer and it acts as the data abstraction layer. With this abstraction layer in place, it therefore means that data can be moved across Kubernetes layers eliminating the cloud lock-in issue.
Unlike traditional storage systems, Metadata of the volume is not centralized and is kept local to the volume. Volume data is synchronously replicated at least on two other nodes. So losing any node results in the loss of volume replicas present only on that node. Therefore in the event of a node failure, the data on other nodes continues to be available at the same performance levels
More on OpenEBS can be found at OpenEBS Site
From Portworx site, Portworx is a software defined storage overlay that allows you to
- Run containerized stateful applications that are highly-available (HA) across multiple nodes, cloud instances, regions, data centers or even clouds
- Migrate workflows between multiple clusters running across same or hybrid clouds
- Run hyperconverged workloads where the data resides on the same host as the applications
- Have programmatic control on your storage resources
Portworx is a compnay that has specialized in providing container-based products ranging from storage, monitoring, Distaster Recovery and Security. We shall be focusing on their Enterprise container storage solution known as PX-Store. This product provides cloud native storage for applications running in the cloud, on-prem and in hybrid/multi-cloud environments.
It provides the reliability, performance and data protection you’d expect from an enterprise storage company, but delivered as a container, and managed 100% via Kubernetes and other leading container platforms. Source: (https://portworx.com/products/introduction, 2019).
Features of Portworx/PX-Store
- This solution is not opensource but enterprise
- Container-optimized volumes with elastic scaling (no application downtime)
- Storage aware class-of-service (COS) and application aware I/O tuning
- Multi-writer shared volumes (across multiple containers) tuning
- Failover across nodes/racks/AZ
- Aggregated volumes for storage pooling across hosts
- Local Snapshots – On-demand and schedule based
- OpenStorage SDK – plugs into CSI, Kubernetes and Docker volumes
- Local data center synchronous replication for HA. This feature is only available for server hosts (bare metal).
- Scanning drives for media errors, alerting and repairing
- Support for Hyperconverged and Dedicated storage architectures
- Volume consistency groups
Check out more on Portworx website
Publicly released in November 2016, Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments. It is a Production ready File, Block and Object Storage. Rook turns storage software into self-managing, self-scaling, and self-healing storage services. Essentially, Rook allows putting Ceph into containers, and provides cluster management logic for running Ceph reliably on Kubernetes.
Features of Rook
- Starting Rook in your cluster is as simple as two kubectl commands
- Rook automates deployment, bootstrapping, configuration, scaling, rebalancing. This makes the work of a cluster admin easier.
- Rook allows deploying a Ceph cluster from a yaml, just like Kubernetes.
- Rook integrates deeply into cloud native environments
- It provides a seamless experience for scheduling, lifecycle management, resource management, security, monitoring, and user experience.
- Rook uses the facilities provided by the underlying cloud-native container management, scheduling and orchestration platform to perform its duties.
- The latest major version of Ceph is now supported by the Rook operator. This plainily means that all the latest features, functionality, and stability available in Ceph Nautilus are also available in Rook.
- Automation and Upgrades: This removes the difficult and error-prone manual scenarios from the hands of the user. When the user adds new nodes and devices to the cluster, Rook will automatically detect them and include them in the Ceph cluster if needed, as well as rebalance the data around the newly expanded cluster.
Rook is a wonderful beast and you can check out and learn more about it in Rook’s site. Rook on!
StorageOS is cloud native, persistent storage for containers. Benefit from application-centric container storage that moves with your application. This platform delivers you an enterprise-class, cloud native storage solution that you can deploy anywhere with the performance, data services and policy management you’d expect of any enterprise storage solution.
From Kubernetes website, StorageOS runs as a Container within your Kubernetes environment, making local or attached storage accessible from any node within the Kubernetes cluster. Data can be replicated to protect against node failure. Thin provisioning and compression can improve utilization and reduce cost. At its core, StorageOS provides block storage to Containers, accessible via a file system.
Features of StorageOS
- StorageOS makes it easy to build stateful containerized apps with fast, highly available persistent storage
- Built to run with any stateful application, on any infrastructure with any orchestrator and as a container
- StorageOS easily integrates with your favorite platforms such as Kubernetes, OpenShift or Docker. Apart from that, it fully implements the Container Storage Interface (CSI).
- Natively Secure: StorageOS does not depend on secondary products to secure application data.
- StorageOS is Agile: It is deployed as a container, StorageOS runs alongside other applications.
- Consistently available: Synchronous replication ensures high consistency data model
- Platform Agnostic: Supports Docker, Kubernetes and others (through CSI). Moreover, it integrates with both distributions (such as OpenShift) as well as cloud based K8S as a service.
- Application-centric: Volumes are provisioned and defined for use by containers allowing enterprises to containerize a broader set of applications that include databases and other stateful workloads.
- StorageOS can scale horizontally or vertically
- Data Security: Improve data security with automated encryption – enterprises control the keys.
- StorageOS Breaks Cloud Lock-In. It does that by having greater app portability.
There is much more about StorageOS. Please visit their site and explore more.
5. Rancher Longhorn
Longhorn is a 100% open-source project and a platform providing persistent storage implementation for any Kubernetes cluster. At its core, Longhorn is a distributed block storage system for Kubernetes using containers and microservices. It can be installed on an existing Kubernetes cluster with one kubectl apply command or using Helm charts. Once it is running, Longhorn adds persistent volume support to the Kubernetes cluster. What is amazing about Longhorn is that it creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
Outstanding Features of Longhorn
- Enterprise-grade distributed storage with no single point of failure
- Incremental snapshot of block storage
- Backup to secondary storage (NFS or S3-compatible object storage) built on efficient change block detection
- Recurring snapshot and backup
- Automated non-disruptive upgrade. You can upgrade the entire Longhorn software stack without disrupting running volumes!
- Intuitive GUI dashboard
- Being a distributed storage system, it is inherently simpler because distributed block storage is inherently simpler than other forms of distributed storage such as file systems
- Allows you to pool local disks or network storage mounted in compute or dedicated storage hosts
- It assign multiple storage “frontends” for each volume.
There is more about Longhorn. Find it all on their github space
6. GlusterFS Heketi
gluster-kubernetes is a project to provide Kubernetes administrators a mechanism to easily deploy GlusterFS as a native storage service onto an existing Kubernetes cluster. Here, GlusterFS is managed and orchestrated like any other app in Kubernetes. This is a convenient way to unlock the power of dynamically provisioned, persistent GlusterFS volumes in Kubernetes.
Heketi provides a RESTful management interface which can be used to manage the life cycle of GlusterFS volumes. With Heketi, cloud services like Kubernetes can dynamically provision GlusterFS volumes with any of the supported durability types.
Features of GlusterFS
- It is simple to interact with because GlusterFS is managed and orchestrated like any other app in Kubernetes
- Provides persistent GlusterFS volumes in Kubernetes.
- Managing GlusterFS Heketi in your cluster is as simple as two kubectl commands
Get more about GlusterFS Heketi can be found on the links below
You can find other useful guides on our blog depending on what you would wish to explore and learn. The following are just a few of them. Cruise around and get more on your plate. Thank you for reading through.