Big Data

Demystifying persistent storage: As containers die, data lives on

Monday, November 21, 2016

According to a recent survey from Portworx, IT managers depend on containers to help improve their agility, reduce their costs and improve system performance. Yet their number two concern with container storage - loss of data.

Containers were developed to be stateless, ephemeral, lightweight tools, only megabytes in size to enable quick application launch. However, in designing containers this way we create a problem - what do we do with data which needs to persist after the container goes away?

To compound the problem the scale of container management deployments with tools like Docker, Mesosphere and Kubernetes have been growing at a rapid rate. People are now executing hundreds of nodes in their clusters (in some cases even thousands!) making it even more difficult to manage data under the unpredictable life cycles of containers.

This challenge has not gone unaddressed. To make persisting data easier containers have started supporting directory mounts, data volume containers and, most recently, storage plug-ins. These new solutions enable persistent data. However, as the number of container deployments grow, managing and moving their associated data can still be tricky.

Building persistent storage

As a best practice, it is recommended you separate data management from containers. In doing so you can ensure your data lives well beyond a container's lifecycle.

Here there are three methods to ensure container data persistence:

1.) Storage plug-ins

2.) Data volume containers

3.) Build a local directory mount into your container as a data directory at launch.

Storage plug-ins are the most reliable and manageable option for persistent storage. A standard volume plug-in will allow you to create, delete and mount persistent volumes as well as support commands from container management applications. In fact, Docker, Mesosphere and Kubernetes all offer storage volume plug-ins. Many storage companies have also built in incremental capabilities to their container APIs to further simplify the container management process. These plug-ins offer capabilities like the ability to manage and consume volumes from any management host, consume existing storage or differentiate storage offerings with multiple instances.

Data volume containers allow users to manage data inside and between various containers. This container doesn’t run an application but instead serves as an entry point from which other containers can access data volume. Data volumes can be shared among containers and can persist even after the container itself is deleted. While setup of this method is relatively simple, ongoing management becomes complex. As containers are deleted the data they leave behind can become orphaned. Orphaned volumes are not often cleaned up in garbage collection per the container manager. Data volume containers can be directly accessed by the host so orphaned data can be collected as needed. During this process data access privileges can become corrupt, leaving potentially sensitive data vulnerable.

The final option, directory mounts, tie the host itself to the container. This maintains the data structure from the host to the container allowing for a persistent and reusable format. Directory mounts can then be accessed to read and write, which also leaves gaps for security threats. Because the directory mount can be given access to a host system’s directory, the container also holds the ability to delete or change content. This vulnerability means not only could someone with malicious intent delete an entire data volume but they also have the ability to manipulate data through these access points.

Containers vs. Virtual Machines

Although container use is exploding, another common tool used by app developers are virtual machines. Server virtualization provides a variety of benefits, among them the fact that by default they’re persistent. VM managers include software, firmware and hardware and their own unique instance of OS making the VM multiple gigabytes large. As a result they do not deploy quickly and are not easy to move through development pipelines.

However VMs do remain relevant because they enable the consolidation of applications onto a single system ushering in cost savings through reduced footprint, faster server provisioning, and improved disaster recovery. Development also benefits from this physical consolidation because greater utilization on larger, faster servers frees up subsequently unused servers to be repurposed for QA, development, or lab gear.

VMs and containers differ on quite a few dimensions, but primarily because containers provide a way to virtualize an OS for multiple workloads to run on a single OS instance, whereas with VMs, the hardware is being virtualized to run multiple OS instances. Containers’ speed, agility and portability make them yet another tool to help streamline software development.

“Virtual machines are not dead. They will still thrive as applications continue to need that architecture. Providing persistent storage certainly narrows the gap and enables even more applications to adopt a container architecture.” Josh Atwell, Developer Advocate at NetApp SolidFire.

Conclusion

The verdict is still out on which option, or options, will prove to be the most widely adopted. For app developers the benefits of containers are clear - lightweight, agile containers allow “sharable” content to packaging containers with their dependencies.

Container storage plug-ins certainly provide a more reliable and consistent path to data persistence especially as companies like NetApp build their APIs with incremental capabilities to manage and consume data volumes. While early in their introduction (Docker announced the first plug-in in mid-2015), initial feedback suggests storage plug-ins are the simple method for persistent storage we’ve been looking for.

In choosing your method to achieve persistent storage consider each of these options along with their advantages and limitations. By fully understanding this organizations can prepare address any limitations and enable best practices for data persistence and performance.

This content is made possible by a guest author, or sponsor; it is not written by and does not necessarily reflect the views of App Developer Magazine's editorial staff.