By Brandon Knitter

Author’s Note: This is a 9 part Monitoring Series. I recommend reviewing them in the order they are published for cohesiveness.

  1. What Makes Us Monitor?
  2. Principles of Monitoring

No article or blog post series would be complete without a few complex graphics, so let’s try to over complicate things and make a few two- and three- dimensional charts. Grab your protractor, compass and slide rule!

When considering monitoring start with monitoring from two different axes. One axis represents the architectural layers of modern systems and applications from the underlying hardware all the way through to the end-user experience. The other axis represents data generated all the way through data consumption. These axes help us define how we will design the start to our monitoring.

While there is no “sweet spot” on this graph, as we go from the lower left to the upper right the monitoring data transforms from collection to consumption. Likewise, the architectural layers evolve from underlying raw power to end-user behavior and usage. It is not uncommon for one to focus more on the lower left as this is where most systems have the most control. The upper right is nearly uncontrollable either because the end-user is unpredictable or the consumer of monitoring data can derive some unexpected results.

While the actual layers and data flow will be different from implementation to implementation, let’s take a quick look at the most common architectural layers and monitoring data pipelines. Note that these are not perfect layers and one layer does not necessarily require the layer before it.

The architectural layers of monitoring can commonly be expressed as follows:

  • Facilities — The physical building, racks, electrical power, cooling, and physical security. This is becoming much less relevant with public cloud-based deployments as this is something is managed by the cloud provider. In private cloud-based deployments, this is still very relevant.
  • Hardware — The physical hardware such as computers, storage arrays, and network equipment. This is common in traditional data centers as well as in private clouds.
  • Virtualization — The virtualized versions of the Hardwarecomponents across compute, storage, and network. All of these components are commonly virtualized in today’s modern deployments.
  • Services — The shared and commonly used components of a modern system or application. This includes datastores, network features, and in some cases compute services. If these services are managed for you by a cloud-provider then these are not commonly monitored by the consumer; regardless of provider, the metrics that come from services is almost always very relevant and should be integrated into your monitoring solution.
  • Application/Services — These can be anything from individual microservices to front-end web applications. Depending on what type of component and function its services, this is will define the monitoring metrics you will gather.
  • User Experience (server-side) — This defines the user experience from the server’s perspective. For instance, monitoring at this layer can provide tracking of the number of queries per second aggregated by all users, or the amount of time it took to process a given user’s request.

User Experience (client-side) — Like the above User Experience layer (metrics from the server’s point of view), the experience can also be measured from the client’s point of view. Metrics such as time to first byte, response time, and total bytes transferred help determine the end-users experience from the user’s perspective.

The monitoring data pipeline can commonly be expressed as follows:

  • Gathering — The collection of monitoring details either from a centralized point or, as a much-preferred method, from distributed points of collection.
  • Aggregation — The combination of collected monitoring data points from many points of generation. Aggregation is required both to centralize the data as well as to combine certain data points in order to make a relatively representative set such as summaries or rollups.
  • Reporting — Displays the aggregated data for preformatted or ad-hoc consumption. In most cases, this is where data will be explored in order to investigate an outage, user behavior or some form of user impact.
  • Analysis — The automated or manual analysis of aggregated monitoring. In many cases, this will provide insights into variations from a baseline such as a standard deviation.
  • Alerting — This presents an anomaly or update to an operator or interested party. Common alerts are failures of a system or application and can be delivered via email, text/SMS or old skool pager.
  • Archiving — Taking all of the data gathered and storing it offline for both accountabilities as well as future research. In most cases the offline storage of this data is coupled with an online summarization, this provides some amount of real-time ad-hoc access.

But this only tells us part of the picture, the infrastructure picture. There is still an entire application architecture to consider. Take a simplified view of a modern componentized system, perhaps one built on the principles of a microservice architecture as depicted below.

What we end with is a rather complicated set of relationships and systems. Indeed the nodes in the graph above represent components of the architecture layer in the previously described monitoring layers, but the edges have significant meaning individually and as a whole.

For this purpose we define the components that make up a modern system by breaking down the monitoring across these as follows:

  • Usage — Measured as a number of usages such as queries per second, throughput, latency or many other metrics.
  • Transaction Flow — Represents anything from a user’s real-time request to batch processing of multiple-step processing such as order fulfillment or a manufacturing process. This is typically expressed at runtime as a data flow and ties the components together for that single transaction.
  • Business Process — If a process has multiple states (in addition to multiple-step), this represents those valid states and that the flow of the process continues properly. This is easily confused with Transaction Flow where a sequence of dependencies is measured, Business Process instead focuses on the state of a transaction rather than the flow.
  • Dependencies — Any relationship either coming into or going out of, a node. The dependency can be real-time changes or functional requirements. These dependencies can be within the application’s ownership or external such as a third-party provider. Regardless of the dependency, the direction matters, whether it be forward, reverse, both or non-directional. In many cases, dependencies are defined as critical or non-critical.

These higher levels of monitoring represent the most mature in the industry and are most commonly found in large complex systems that require this out of need more so than desire. With encapsulated technologies such as containers, PaaS and now Serverless, these monitoring components are the areas of primary focus for monitoring that many systems and applications focus on because many of the other components are outsourced and there simply is no form of visibility. Someone does have that monitor those lower architecture layers, but it’s not you.

With all of this complexity, it’s no wonder it’s hard to consider where to start.

If you take the two-dimensional graph shown at the beginning of this post, and then add a third axis you may start to see the complexity of monitoring as we represent all of the dimensions

But is that really going to be helpful? Instead, it may be easier to go back to the Principles of Monitoring and define the types of metrics, the tools that collect and provide those metrics and let the users of those tools define what will be monitored and where. Let’s look at that