What makes an effective microservices logging strategy?
An effective microservices logging strategy can hinge on the size and scale of the system in question. For example, a microservices-oriented architecture composed of 20 microservices is less of a logging burden when compared to one composed of 200 microservices.
Developers who hope to introduce a successful microservices logging strategy need to craft a plan that lays out where the logging takes place and how it affects other areas of the system. Typical logging can stress a system in three ways: I/O, storage and analytic computation on the CPU.
Before a team deploys a microservices logging strategy, it must consider the potential stresses on the system, and what it might mean to additional development. Let’s examine ways to alleviate system stresses and some alternatives to traditional microservices logging strategies.
Log data on the machine
The easiest way to introduce logging on a microservices-oriented architecture is to have each microservice collect and store its logging data on the machine where it runs. This is probably the easiest and simplest approach to logging, but it’s also one fraught with danger.
Storage on a local machine significantly reduces I/O latency because all the activity takes place at a singular location. There are few, if any, trips out to the network. While logging data on the local machine helps improve performance, there is a tradeoff.
Increased storage places a significantly higher burden on the host machine’s CPU. Higher levels of activity result in more logging, which in turn creates more log data stored in the system and raises CPU utilization levels. A host machine can be maxed out in no time under this scenario.
Luckily, there are alternatives to this microservices logging strategy.
A logging service can help alleviate concerns of CPU utilization and reduced I/O latency.
A logging system’s main benefit is that the storage and work is moved off the system and onto third-party resources. All the microservice needs to do is take a trip out to the network to send log entries.
While this doesn’t seem like a big deal on smaller architectures, it can be problematic if there are 200 microservices that run in a high-availability, multi-replica environment. In a situation like this, many trips to the network from many origins can cause a bottleneck and bring other network communication to a grinding halt.
In this case, there is another alternative that developer teams should consider.
The collector strategy
A collector strategy essentially shifts how log entries are sent in and out of the network.
Instead of sending each log entry out to a logging service, they are sent to a central collector that resides on a machine elsewhere. In most cases this machine is at least in the same data center of the microservices-oriented architecture. In a best-case scenario, the machine is located on the same data center rack that hosts the other microservices components.
Cloud-hosted microservices users will have to consult with their provider on identifying the best place to host the collector.
The collector does as its name implies. It collects all the log entries emitted from the architecture and then forwards the entries onto a logging service at a prescribed interval. Once the entries arrive, the collector flushes the old log entries from the system to backup storage.
One major benefit to this microservices logging strategy is that the collector absorbs the network latency incurred when it sends the log entries onto the logging service. Also, because the collector is close to the other components of the microservices-oriented architecture, it reduces latency between the architecture and the collector.
However, there are still risks associated with a log collector. For example, if the central log collector fails, all logging activity comes to a standstill.
So, how can developers avoid this risk?
Developers can create a cluster of collectors that resides behind a common load-balancer to alleviate failed central log collector concerns.
A benefit of the load-balanced log collector strategy is that if one collector fails, the others will remain operational and allow logging to continue. But there is a tradeoff.
This strategy requires a team to support a set of collectors on their network, and in turn, adds more expenses related to the increase of virtual machines. Also, the logging environment becomes more complex and requires more legwork from other microservices in the environment.
Overall, the main crux of the microservices logging conundrum is scale. If you run a smaller architecture without a lot of logging, keep the logging activity on the same microservices elements. However, if you run an architecture with a lot of microservices, a more sophisticated logging strategy makes more sense, despite some potential drawbacks with cost and storage.