Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Spotify | TuneIn | RSS
In this episode, we’re talking all about OpenTelemetry. Also, Allen lays down some knowledge, Joe plays director and Outlaw stumps the chumps.
See the full show notes at https://www.codingblocks.net/episode216
News
- Thanks for the reviews Lanjunnn and scott339!
- Allen made the video on generating a baseball lineup application just by chatting with ChatGPT (youtube)
What is OpenTelemetry?
- An incubating project on the CNCF – Cloud Native Computing Foundation (cncf.io)
- What does incubating mean?
- Projects used in production by a small number of users with a good pool of contributors
- Basically you shouldn’t be left out to dry here
- Projects used in production by a small number of users with a good pool of contributors
- So what is Open Telemetry? A collection of APIs, SDKs and Tools that’s used to instrument, generate, collect and export telemetry data
- This helps you analyze your software’s performance and behavior
- It’s available across multiple languages and frameworks
It’s all about Observability
- Understanding a system “from the outside”
- Doesn’t require you to understand the inner workings of the system
- The goal is to be able to troubleshoot difficult problems and answer the “Why is this happening?” Question
- To answer those questions, the application must be properly “Instrumented”
- This means the application must emit signals like metrics, traces, and logs
- The application is properly instrumented when you can completely troubleshoot an issue with the instrumentation available
- That is the job of OpenTelemetry – to be the mechanism to instrument applications so they become observable
- List of vendors that support OpenTelemetry: https://opentelemetry.io/ecosystem/vendors/
Reliability and Metrics
- Telemetry – refers to the data emitted from a system about its behavior in the form of metrics, traces and logs
- Reliability – is the system behaving the way it’s supposed to? Not just, is it up and running, but also is it doing what it is expected to do
- Metrics – numeric aggregations over a period of time about your application or infrastructure
- CPU Utilization
- Application error rates
- Number of requests per second
- SLI – Service Level Indicator – a measurement of a service’s behavior – this should be in the perspective of a user / customer
- Example – how fast a webpage loads
- SLO – Service Level Objective – the means of communicating reliability to an organization or team
- Accomplished by attaching SLI’s to business value
Distributed Tracing
To truly understand what distributed tracing is, there’s a few parts we have to put together first
- Logs – a timestamped message emitted by applications
- Different than a trace – a trace is associated with a request or a transaction
- Heavily used in all applications to help people observe the behavior of a system
- Unfortunately, as you probably know, they aren’t completely helpful in understanding the full context of the message – for instance, where was that particular code called from?
- Logs become much more useful when they become part of a span or when they are correlated with a trace and a span
- Span – represents a unit of work or operation
- Tracks the operations that a request makes – meaning it helps to paint a picture of what all happened during the “span” of that request/operation
- Contains a name, time-related data, structured log messages, and other metadata/attributes to provide information about that operation it’s tracking
- Some example metadata/attributes are: http.method=GET, http.target=/urlpath, http.server_name=codingblocks.net
- Distributed trace is also known simply as a trace – record the paths taken for a user or system request as it passes through various services in a distributed, multi-service architecture, like micro-services or serverless applications (AWS Lambdas, Azure Functions, etc)
- Tracing is ESSENTIAL for distributed systems because of the non-deterministic nature of the application or the fact that many things are incredibly difficult to reproduce in a local environment
- Tracing makes it easier to understand and troubleshoot problems because they break down what happens in a request as it flows through the distributed system
- A trace is made of one or more spans
- The first span is the “root span” – this will represent a request from start to finish
- The child spans will just add more context to what happened during different steps of the request
- Some observability backends will visualize traces as waterfall diagrams where the root span is at the top and branching steps show as separate chains below – diagram linked below (opentelemetry.io)
- The first span is the “root span” – this will represent a request from start to finish
To be continued…
Resources We Like
- OpenTelemetry Website (opentelemetry.io)
Tip of the Week
- Attention Windows users, did you know you can hold the control key to prevent the tasks from moving around in the TaskManager. It makes it much easier to shut down those misbehaving key loggers! (verge.com)
- Does your JetBrains IDE feel sluggish? You can adjust the heap space to give it more juice! (blogs.jetbrains.com)
- Beware of string interpolation in logging statements in Kotlin, you can end up performing the interpolation even if you’re not configured to output the statement types! IntelliJ will show you some squiggles to warn you. Use string templates instead. Also, Kotlin has “use” statements to avoid unnecessary processing, and only executes when it’s necessary. (discuss.kotlinlang.org)
- Thanks to Tom for the tip on tldr pages, they are a community effort to simplify the beloved man pages with practical examples. (tldr.sh)
- Looking for some new coding music? Check out these albums from popular guitar heroes!
- Portals from Kirk Hammett (music.apple.com)
- Terminal Velocity from John Petrucci (music.apple.com)