By Brandon Knitter

Author’s Note: Far too often I’ve run into systems in my career where monitoring is an afterthought. At best, monitoring is bolted on after the system has been released to the consumer. Indeed, the focus of monitor must begin much sooner in the development cycle, providing insights and integration during the software engineering efforts.

This blog series is designed to take the reader through the background and methodologies I employ when implementing a monitoring solution. Starting with the impetus and foundation for monitoring, through the design and implementation, and ending up on the consumption; this series is a guidebook for your next monitoring implementation.

I encourage the reader to consume this material sequentially, resisting the urge to skip ahead. Many of the early foundational components such as the principles and dimensions are built upon and explain why monitoring is implemented in the way I propose. Enjoy the series!

You’ve written the best app in the world, you’ve tested the hell out of it, and now you’ve deployed it. It’s an amazing success!! Or at least you hope it is. But how do you know if it’s the next thing since sliced bread, or it is a burnt piece of toast?

Monitoring is nothing new, we’ve done this since we were cavemen tending to a fire, poking at the flames to make sure it was as hot as we thought it could be. Millions of years later (and some impressive brain growth) we still monitor for the same thing: are things as hot as they can be.

It’s no secret that the most expensive part of any high-tech solution is the engineering component, so it’s no wonder why would you want to ensure that your investment is paying off. Monitoring does just that.

Whether you are monitoring the engineering process itself all the way to the people that use the application or the system that support it, monitoring has become ubiquitous and synonymous with success. The primary reason: you can’t declare success for something you can’t measure.

In this modern-day of KPIs and OKRs, monitoring has become an integral part of everything we do in high-tech. In some cases, the monitoring solution itself will exceed the engineering investment, but many times we strive for something smaller, lighter-weight and lower impact.

Monitoring, as defined by Merriam Webster’s dictionary, is “to watch, keep track of, or check usually for a special purpose” and I think this is quite appropriate. In the systems world, you want to watch your application, you want to keep track of changes, you want to base your checks on some expected result, and all of this is for the “special” purpose of ensuring your application is as hot as it can be. If your application isn’t, you may need to poke at it a little.

We must be careful, though, that when monitoring that we don’t incur an overhead. As early as 1956 Werner Heisenberg stated that observing something can affect the outcome and therefore there is no observable event that is not altered by the observation itself. He was referring to physics and quantum mechanics of course…so perhaps that’s a little deep for a blog post! Still, the observer effect is a well-heeded cautionary tale in our industry.

Simply put, if you are going to watch something you need to make sure you don’t create an unexpected or undesired impact. For example, monitoring shouldn’t affect processing by altering data or the outcome. This was formalized in 2008 by a few folks at University of Colorado, Boulder and has since become an industry-standard guard rail. While prohibiting data manipulation during monitoring is strictly adhered to, in many cases a slight or unnoticeable slowness is acceptable if the monitoring serves a valuable outcome. It’s a tradeoff and design consideration.

In this series, we will explore the different aspects of monitoring. We will dive into what is required of monitoring of a modern system and when it may or may not be appropriate. We will cover where we should monitor and where is it simply not of value, and how the data that is gathered is utilized. We will explore integration and tooling, but we will fall short of technology or product recommendations. And finally, we will bring it all together in a case study and reference architecture while offering one process that may work for a smooth roll-out.


References:

Definition: Monitor
One formal definition of Monitoring
https://www.merriam-webster.com/dictionary/monitor

Observer Effect
A good description of the definition of the expected impact of observing something, anything.
https://en.wikipedia.org/wiki/Observer_effect_(information_technology)

Heisenburg Effect
The observer effect as applied to information technology
https://en.wikipedia.org/wiki/Heisenbug

Observer Effect and Measurement Bias in Performance Analysis
Formalized explanation of the observer effect and the tongue-in-cheek reference to bug introduction by watching something
http://scholar.colorado.edu/cgi/viewcontent.cgi?article=1971&context=csci_techreports