By Mark McCullough
Do you know what time it is? Does your computer? Do you care? No. Probably not. You need to.
Time is hard. Doing it reliably is even harder. Fortunately, there is a stable program to maintain time, but a lot of myths persist because people think that time is like any other service.
First, time is hard. While 150 years ago, a difference of several minutes might only have been barely noticeable, today, even a one-second differential can have a critical impact. Time servers supposedly serving accurate time to the world (stratum-1) have been shown to be wildly wrong.¹ Even seemingly harmless hardware driver decisions can introduce problems that can impact the quality of time sources.²
It’s easy to check what time your computer thinks it is. It’s much harder to check if your computer is actively synchronizing time. Most people treat the computer time daemon (typically ntpd) as a system with a primary and a backup, only two nodes ever needed at most. Nothing could be further from the truth. Instead, time is a quorum cluster setup where two servers mean that the client can almost never synchronize time, and the more servers in the cluster that are actively syncing, the more accurate your time.
Let’s look at that for a moment. Every time the source has a certain amount of jitter, meaning that even if you sync to only one system, you aren’t going to be able to get to the exact same time as that system. A jitter of over a second is not uncommon, especially when queried over more complex networks. Each additional time source beyond the second server creates a smaller and smaller window where all the computed times overlap, meaning that the client has a more precise understanding of what time it is. There is a cost to going too many servers, but typical recommendations by time experts are in the range of five to seven-time sources per server.
To make life even more complex, just because a server is listed as a potential source doesn’t mean that your computer will use it as a source. If the time servers don’t have enough overlap of agreed-upon time, the client can’t trust the server and rejects it as a time source. You need to check that you have enough servers that agree close enough to each other to use for synchronization. Using ntpd, the client ntpq -np will print the status of each peer in the first column. No, space doesn’t mean that things are fine, it means that the server has been discarded as not valid for some reason. Look for symbols like a plus, (“+“), asterisk (“*“) or the letter O (“o“), those are the good server lines.
So, why should you care about accurate time? Isn’t it enough to just have all systems in sync? Do you not use any cloud resources? Do you never communicate with anyone else? Today, differences of even a couple seconds can matter, complicating troubleshooting, or even calling into question the veracity of your logs. That few tenths of a second difference today makes it hard to know which of two events happened first and may grow to many seconds in just a couple months if you aren’t careful. Time is an important part of your infrastructure, and you can’t safely ignore it.