Posted 9/21/2016 8:05:43 AM by RICHARD HARRIS, Executive Editor
Slow application performance, errors and outages are the bane of a development team’s day. When such issues occur, developers have traditionally combed through their code, often with the aid of an Application Performance Monitoring(APM) tool, to identify the bug causing the problem. Yet, for applications running on cloud platforms like Amazon Web Services (AWS), debugging the code is not enough to find root cause of many issues. Developers need to go above and beyond the traditional code-level troubleshooting with APM tools, by also investigating the underlying infrastructure where an issue root cause may reside.
Cloud platforms are highly dynamic in nature - they adjust to changes in workload demand by creating and destroying hosts, and then distribute related tasks through these nodes which work together as a “service”. This means that the hosts running the code, as well as all of the other middleware components that the code utilizes for its operations like web servers and databases are in constant flux. Further, a variety of systems such as automated configuration management systems like Puppet or Chef, or distributed process scheduling like Apache Kafka are often employed to facilitate these dynamic processes. Lastly, many cloud-based applications outsource tasks or processing via API to external third party services, which can also be the cause of issues. In short, to identify the root cause of an issue, all of these infrastructure areas need to be examined in addition to the code that is running the application.
Datadog just released new capabilities which links application code to the performance of all the infrastructure components which the code connects to. Importantly, these linkages are made contextually - so, code performance will always be correlated to the exact hosts, databases, APIs and other components that were utilized by an application even as these hosts continuously cycle through in a cloud environment.
The end result is a monitoring system which allows developers to not only troubleshoot their code’s performance, but also to confirm or rule out another component as the cause of an issue. As Albert Wang, Product Manager for Datadog puts it, “A developer may look over every line of code umpteen times trying to find a problem only to later realize that an issue was caused by a configuration error in a database host that sprung into existence that morning. Making code performance tooling ‘infrastructure-aware’ is critical for a developer working on a cloud platform where an issue’s root cause may lie in any number of areas”.
Read More https://www.datadoghq.com...