Enterprise

Why Just Monitoring Your Server Is Not Enough

Monday, February 16, 2015

Server monitoring tools have traditionally been one of the first building blocks in any IT infrastructure. Viewed as the control panel for infrastructure effectiveness, much like the speedometer and temperature gauges on your car, these tools capture resource usage information from your OS performance API or performance counters such as CPU, Disk, Memory, Network at a system and per process level. They provide visibility into which resources are being consumed across the infrastructure, by which applications and services. This certainly seems like a critical component. So, how could server monitoring be dead?

Quite simply, times have changed. While the value of collecting server monitor data is just as useful today as ever, the tools people use to collect this information have evolved significantly. Solutions solely focused on server monitoring in isolation are a dying breed.

Looking Back

In the past, server monitoring tools were standalone applications or services focused exclusively on collecting server resource usage metrics. They would then visualize these data points in isolation and give visibility into how infrastructure was being utilized.

Now, don’t get me wrong. There are still plenty of tools out there that primarily focus on server monitoring such as Server Density, Scout, and even that old dinosaur NAGIOS, which is still (surprisingly) fairly widely used. And many successful server monitoring innovators have already been gobbled up in the land grab. Some recent examples include the likes of Cloudkick (acquired by Rackspace in 2010) and StackDriver (acquired by Google in 2014).

Charging Ahead

However, monitoring system infrastructure resources in isolation is not particularly useful anymore. A more valuable approach is to look at server metrics in the context of application performance metrics, usage trends and application load. This provides a better understanding of actual end-user experiences with applications and whether a spike in CPU is having an adverse effect on response time. For example, an online retailer certainly would not want customers waiting for the “complete payment” page to load.

In addition, the proliferation of a DevOps culture that removes silos changes the game. A DevOps dynamic requires the entire team to become more focused on how different parts of the environment ultimately affect the user experience. The days of pure Ops teams solely concerned with how much server capacity is available are coming to an end.

As a result, forward-thinking organizations are rolling up server monitoring into other services – correlating the server data with application performance metrics, system-level log data, app usage trends, and data on application load. A primary example of this is application performance monitoring (APM). Many APM tools now include server monitoring as part of their performance management offerings.

Similarly, cloud providers often offer some basic, out-of-the-box monitoring for their infrastructure – likely why Rackspace and Google made the acquisitions mentioned earlier. Cloudwatch on AWS, which includes both server monitoring, stats on any AWS service, and log data, is another prime example. Many cloud providers are looking to roll up server monitoring data with service performance stats along with a logging interface to provide a single view across any service run within their infrastructure so that users can easily correlate from multiple data points.

Serving Value

The value in server monitoring data comes with real-time correlation with application performance metrics and log data (which in turn can capture usage, load and further metrics from across an infrastructure). This provides an accessible way to get the full picture and figure out the impact of server resources on the end-user experience. Correlation is king!

Now is the time to connect server monitoring information with application performance data, system logs, activity logs, etc. to more easily troubleshoot and diagnose issues. This type of valuable data found in logs can be used for more than just troubleshooting development issues. Logs contain important statistics that can help determine whether a server is losing CPU cycles to a run away program or an errant process is chewing up available memory. Using logs to collect and analyze server metrics today provides a much more complete picture of your system capacity alongside application performance.

Read more: https://logentries.com/

This content is made possible by a guest author, or sponsor; it is not written by and does not necessarily reflect the views of App Developer Magazine's editorial staff.