Testing software updates with production traffic

Thursday, July 18, 2019

How developers can testing software updates with production traffic, ultimately can help in reducing downtime and increasing productivity.

Test and development cycles have significantly changed under the DevOps model. To remain competitive, software developers must continually release new application features. They’re sometimes pushing out code updates as fast as they are writing them. This is a significant change from how software and dev teams traditionally operated. It used to be that teams could test for months, but these sped-up development cycles require testing in days or even hours.

This shortened timeframe means that bugs and problems are sometimes pushed through without the testing that’s required, potentially leading to network downtime. Adding to these challenges, a variety of third-party components must be maintained in a way that balances two opposing forces: changes to a software component may introduce unexplained changes in the behavior of a network service, but failing to update components regularly can expose the software to flaws that could impact security or availability.

The Perils of Poor Testing

On average, it costs four to five times as much to fix a software bug after release as it does to fix it during the design process. The average cost of network downtime is around $5,600 per minute, according to Gartner analysts. And IDC has estimated that for the Fortune 1000, the average total cost of unplanned application downtime per year is as much as $1.25 to $2.25 billion. In short, downtime is expensive.

There is more than money at stake here. There’s also the loss of productivity that can result when your employees are unable to do their work due to an outage. There are the recovery costs of determining what caused the outage and then fixing it. And on top of all of that, there’s also the risk of brand damage wreaked by irate customers who expect your service to be up and working for them at all times. And why shouldn’t they be irate? You promised them a certain level of service, and this downtime has broken their trust.

Bugs in software can create immediate problems, but they can cause longer-term security issues. These flaws can be exploited later, particularly if they weren’t detected early on. The massive Equifax breach, in which the credentials of more than 140 million Americans were compromised, and the Heartbleed bug are just two examples. In the case of the Heartbleed bug, a vulnerability in the OpenSSL library caused a significant potential for exploitation by bad actors.

Now that continuous integration and delivery are the order of the day, developers make changes to the code that trigger a pipeline of automated tests. The code then gets approved and pushed into production. A staged rollout begins, which allows new changes to be pushed out quickly.

This process, too, is beholden to automated test infrastructure. That’s risky; automated tests are looking for specific issues, but they can’t know everything that could possibly go wrong. So then, things go wrong in production. The recent Microsoft Azure outage and Cloudflare’s Cloudbleed vulnerability are examples of how this process can go astray and lead to availability and security consequences.

The Production Traffic Difference

What developers need today is a method to spot possible flaws and security concerns prior to release, with speed and precision and without the need to roll back or stage. By simultaneously running live user traffic against the current software version and the proposed upgrade, users would see only the results generated by the current production software unaffected by any flaws in the proposed upgrade.

Meanwhile, administrators would be able to see how the old and new configurations respond to actual usage. This would allow teams to keep costs down, while also ensuring both quality and security, and the ability to meet delivery deadlines – which ultimately helps boost return-on-investment. For the development community, building and migrating application stacks to container and virtual environments would become more transparent during development and more secure and available in production when testing and phasing in new software.

Software teams that test updates with production traffic are empowered to:

Verify upgrades and patches in real-world use scenarios
Inspect and correct flaws more quickly using packet capture and logging
Recommend upgrades of commercial software by lowering risk and measuring performance benefits
Report quickly on differences in software versions, including metadata, content and application behavior and performance

Testing Software in Production

The DevOps approach helps push through new versions of software faster, but this can backfire if unseen bugs slip through. Standard testing methods only search for known issues; a better way to test is to compare differences between software versions before they are released. Testing software updates in production make this possible. Doing so saves money, rework, reputation and customer trust.

This content is made possible by a guest author, or sponsor; it is not written by and does not necessarily reflect the views of App Developer Magazine's editorial staff.