Databases are critical to business success in e-commerce websites and mobile apps. Database bottlenecks, outages, and performance problems can put your digital business at risk.
When we talk about database performance we tend to think of indexes, SQL design, lock contention, and the like. But a lot of the most serious bottlenecks - the ones that make you miss release deadlines, crash the site, and lose your best team members after repeated all-nighters - are actually cultural, not technical. They’re driven by the siloed responsibilities often created around the database. And they have far-reaching effects that are more costly than outages alone.
This is because your database's ultimate performance, and its impact on your engineering productivity overall, goes hand-in-hand with the way you structure your engineering team. One way to think about this is that teams are systems too. Not in the de-humanizing, “automate everything” kind of way, but in the way you think about how dependencies and variability in your processes can limit team performance.
It’s Not Just About Downtime
Any time you have a system performance discussion, the cost of downtime inevitably comes up - mostly by vendors making claims about lost revenue that may or may not be supportable. The issue is not whether reducing downtime per year by five minutes is essential to the business. It’s more about expanding the conversation beyond avoiding a bad outcome, to achieving good outcomes. It’s about the upside of a better performing engineering team. After all, fixing the bad outcome is limited to the size of the potential downside - how much better than five-nines can you get? The real value you can create is focusing on things like making developers more productive, increasing time-to-delivery of major IT initiatives, or reducing cycle time for continuous delivery ship-measure-iterate cycles.
By understanding the flow of work and communications in your organizations and teams, you can have this kind of impact. It’s not about tools, it’s about people and how they work together toward a common goal. It’s not about just making the DBA more productive, it’s about making 80 developers more productive because the DBA isn’t in the critical path for all of them. What gets in the way are organizational cultures that create silos and more specialization within teams, which stifles knowledge sharing, and ultimately leads to more bottlenecks.
Siloed Cultures and Specialization
A team can be more productive when it creates and cultivates specialization, as tasks are grouped and handled by those best equipped to execute them flawlessly. DBAs are a great example, as they have specific knowledge and experience that others on the team don’t have: designing, building, and maintaining scalable database servers. However, the problem is that by encouraging specialization, you have now created a dependency on that specialist that, as you scale, can create bottlenecks and drive down productivity.
DBAs - as most organizations' single go-to specialist - often become increasingly reactive. Developers constantly interrupt them with requests for help with, say, optimizing a query. If the DBA helps, a potentially bad query doesn’t get shipped; but if the DBA is busy fixing an already-bad query in production, the developer might go ahead and deploy, potentially introducing the next production problem and guaranteeing the DBA will be more reactive tomorrow than today.
As a result, DBAs spend more time resolving today’s problems than planning for (and avoiding) issues in the future. Not only does this impact productivity across the development team, it introduces risk of system failure in the future as the DBAs’ more forward-looking tasks (e.g. planning storage, monitoring performance, creating disaster recovery plans, etc.) are usurped by the tyrannical immediate issue. As the ratio of developers to DBAs increases, the problem becomes worse, as developer requests queue up and DBA backlog grows. At this point, DBAs have become one of the critical bottlenecks in the process.
But you can avoid this scenario with better knowledge sharing and understanding other constraints.
Tight Feedback Loops Shatter Silos
Knowledge sharing needs to move beyond simply sharing best practices or after-action reviews, to include access to the same systems, tools, and data across the team. It needs to involve real-time collaboration using detailed data available to everyone, such that there is a clear understanding of the nature and scope of an issue. And when a team shares information and successfully collaborates, a database issue can reach a quick, painless resolution.
Because DBAs occupy multiple positions of hand-off, interaction, and information sharing between different teams, they present a nice test case for how improved knowledge sharing can make a difference. Their duties and knowledge are specialized, which makes it tempting to centralize the burden on them instead of sharing or offloading it with developers. Using the earlier example of query optimization, by allowing developers access to production databases equipped with performance monitoring, they can test and fix their own queries, and remove the dependency on the DBA.
This doesn’t absolve the DBA of the responsibility for database server performance, but simply creates joint accountability between the developer and the DBA. It’s no longer just a database issue, it’s a system performance issue, for which the team is collectively responsible. The key is making sure all parties are clear on the process to follow, and that they have access to the tools and data they need when they need it. By doing this across your engineering team, you shrink feedback loop time and accelerate your delivery cycles.
Identify Your Constraints
Eliyahu Goldratt’s seminal work on the Theory of Constraints, provides a good framework for thinking about engineering team performance. His novel, The Goal, about business process optimization, is centered around a factory that’s always losing money and missing deadlines and is analogous to how the development process works. His discussion of dependencies or dependent events and statistical variations (in start times, stop times, durations) offers a helpful way to think about and identify constraints within your process that are causing bottlenecks. A siloed culture, specialization, and lack of knowledge sharing are just a few examples.
Other common constraints include the lack of democratized access to production performance data, and the way you’re reacting to outages and other system problems. All of which reduce the effectiveness of every single person on your team and hampers overall business performance. Spotting and breaking down silos in your team or organization is a great starting point to eliminate your bottlenecks.
READ MORE: http://www.vividcortex.com/...