Observability and OpenTelemetry in Microservices Architecture
Hello, I'm Wakatchi ( @wakatchi_tech )
Microservices architecture makes it difficult when problems occur.
I received this question.
Microservices architecture is increasingly being adopted not only by companies that rely on cutting-edge technology as a core business, but also in core systems in all industries. On the other hand, many issues have emerged in the operation of microservice architectures, such as failures in dividing microservices, lack of consideration when migrating from monoliths, and the use of distributed transactions.
Please refer to this article for information on utilizing distributed transactions.
TCCパターンとSagaパターンでマイクロサービスのトランザクションをまとめてみた
マイクロサービスのトランザクション設計である、TCCやSagaパターンをまとめました。オーケストレーションとコレオグラフィで分類し、各手法の特性や分類および実装時の留意点を詳説しています。分散システムの安定運用を支えるアプローチを解説する必読の設計ガイドです
One of the most frequently cited challenges is that distributed architectures tend to be more difficult to troubleshoot. In serverless and container-based microservices architectures, many microservices work together to provide services to users. Therefore, when a failure or performance problem occurs, it is necessary to be able to trace the sequence of actions of each microservice. Therefore, the concept of observability, which tracks the movement of microservices and makes them observable, becomes important.
Observability is achieved through distributed tracing. Distributed tracing allows you to visualize how services call each other and where the bottlenecks are based on the requests that are actually being executed.
This time, we will introduce OpenTelemetry, which is being standardized by CNCF as a way to achieve observability in microservices architecture.
Microservice architecture and observability
According to the CNCF Declaration, a cloud-native architecture is defined as a system that achieves observability.
Cloud-native technologies give organizations the ability to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Common examples of this approach include containers, service meshes, microservices, immutable infrastructure, and declarative APIs.
https://github.com/cncf/toc/blob/master/DEFINITION.md
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, these enable engineers to make impactful changes frequently and predictably with minimal effort.
Observability consists of three pillars: logs, metrics, and traces. The method of collecting and measuring these three pillars is called telemetry, and the data is called ``telemetry data.'' The system for collecting and measuring telemetry data is being standardized at the CNCF.
In a microservices architecture, it is recommended to implement observability mechanisms from the beginning of system development. It is said that CI/CD should be introduced from the early stage (in the case of Scrum, the first sprint) as a practice of agile development, but observability should also be introduced, and CI/CD/CM (continuous monitoring) should be implemented from the early stage of development. It would be good to make it happen.
CI/CD is well known, but CM (continuous monitoring) is not so well known.
OpenTelemetry to gain observability
OpenTelemetry is a CNCF project created by integrating two projects, OpenTracing and OpenCensus, and provides a cloud-native application observability framework (a collection of tools, APIs, and SDKs). Measure, generate, collect, and export telemetry data (logs, metrics, traces) for analysis to visualize application performance and behavior. Also, OpenTelemetry is cross-platform and supports various languages.
Although it is a bit old, OpenTelemetry is positioned as ASSESS in the observability technology radar published by CNCF in September 2020, so it seems that it will become popular in the future.
CNCF End User Technology Radar: Observability, September 2020 | CNCF
Today, CNCF is publishing the second of our quarterly CNCF End User Technology Radars; the topic for this Technology Radar is observability. In June, we launched the CNCF End User Technology Radar…
As mentioned in the Observability Technology Radar, once an observability tool is in place, it tends to be difficult to change or integrate with another tool. This is because observability is not a core business for many companies, making it difficult to explain the cost-benefit of their investments. To this end, the system for collecting telemetry data should be as standardized as possible.
Observability-enabled products require telemetry data (logs, metrics, traces) for analysis in order to visualize information. Traditionally, telemetry data has been provided by open source projects or commercial vendors. However, the lack of standardization results in a loss of data portability and increases the burden on users of observability products.
The OpenTelemetry project solves these problems by providing a single solution that is independent of commercial vendors. The OpenTelemetry project has broad industry support and adoption from cloud vendors, commercial observability vendors, and end users.
Reduced dependencies with OpenTelemetry
Many cloud and commercial product vendors have announced support for OpenTelemetry, and it is expected that OpenTelemetry will become a standard tool for telemetry data collection in the future. For example, AWS supports X-Ray, and commercial products such as New Relic support OpenTelemetry. We hope that more vendors will support OpenTelemetry in the future.
Consider adopting OpenTelemetry to reduce dependence on cloud and commercial vendors when gaining observability for your microservices architecture.
summary
Although it is simple, I tried to touch on observability, which is a little more modest than the fancy words such as cloud native and microservices. What did you think?
There is still little information available, and it may take some time for it to become widespread.
When operating a microservices architecture, we recommend verifying log monitoring, metrics monitoring, and distributed tracing from the beginning of system development, which tend to be postponed. Although we recognize that observability is important, we tend to put it off as it does not directly impact application development. Recognizing that microservices cannot be operated without observability, why not consider OpenTelemetry as a tool for this purpose?
Observability is essential for implementing microservices architecture!
We hope that this article will be of some help to you towards the introduction of microservices.
In this blog, we will actively share concrete construction examples and use cases for observability.
Thank you for reading to the end!