github链接:https://github.com/jaegertracing/jaeger

官网:https://www.jaegertracing.io/

Jaeger: open source, end-to-end distributed tracing

Monitor and troubleshoot transactions in complex distributed systems

Cloud Native Computing Foundation incubating project.

Uber published a blog post, Evolving Distributed Tracing at Uber, where they explain the history and reasons for the architectural choices made in Jaeger. Yuri Shkuro, creator of Jaeger, also published a book Mastering Distributed Tracing that covers in-depth many aspects of Jaeger design and operation, as well as distributed tracing in general.

Why Jaeger?

As on-the-ground microservice practitioners are quickly realizing, the majority of operational problems that arise when moving to a distributed architecture are ultimately grounded in two areas: networking and observability. It is simply an orders of magnitude larger problem to network and debug a set of intertwined distributed services versus a single monolithic application.

Problems that Jaeger addresses

It is used for monitoring and troubleshooting microservices-based distributed systems, including:

  • Distributed context propagation
  • Distributed transaction monitoring
  • Root cause analysis
  • Service dependency analysis
  • Performance / latency optimization

Kubernetes and OpenShift

Features

  • Discover architecture of the whole system via data-driven dependency diagram.
  • View request timeline and errors; understand how the app works.
  • Find sources of latency and lack of concurrency.
  • Highly contextualized logging.
  • Use baggage propagation to:

    • Diagnose inter-request contention (queueing).
    • Attribute time spent in a service.
  • Use open source libraries with OpenTracing integration to get vendor-neutral instrumentation for free.

Features

  • OpenTracing compatible data model and instrumentation libraries

  • Uses consistent upfront sampling with individual per service/endpoint probabilities
  • Multiple storage backends: Cassandra, Elasticsearch, memory.
  • Adaptive sampling (coming soon)
  • Post-collection data processing pipeline (coming soon)

Technical Specs

Span

A span represents a logical unit of work in Jaeger that has an operation name, the start time of the operation, and the duration. Spans may be nested and ordered to model causal relationships.

Trace

A trace is a data/execution path through the system, and can be thought of as a directed acyclic graph of spans

Query

Query is a service that retrieves traces from storage and hosts a UI to display them

参考:

the OpenTracing standard

Components

Jaeger can be deployed either as all-in-one binary, where all Jaeger backend components run in a single process, or as a scalable distributed system, discussed below. There two main deployment options:

  1. Collectors are writing directly to storage.
  2. Collectors are writing to Kafka as a preliminary buffer.

Illustration of direct-to-storage architecture

Illustration of architecture with Kafka as intermediate buffer

This section details the constituent parts of Jaeger and how they relate to each other. It is arranged by the order in which spans from your application interact with them.

Jaeger client libraries

Jaeger clients are language specific implementations of the OpenTracing API. They can be used to instrument applications for distributed tracing either manually or with a variety of existing open source frameworks, such as Flask, Dropwizard, gRPC, and many more, that are already integrated with OpenTracing.

An instrumented service creates spans when receiving new requests and attaches context information (trace id, span id, and baggage) to outgoing requests. Only ids and baggage are propagated with requests; all other information that compose a span like operation name, logs, etc. are not propagated. Instead sampled spans are transmitted out of process asynchronously, in the background, to Jaeger Agents.

The instrumentation has very little overhead, and is designed to be always enabled in production.

Note that while all traces are generated, only a few are sampled. Sampling a trace marks the trace for further processing and storage. By default, Jaeger client samples 0.1% of traces (1 in 1000), and has the ability to retrieve sampling strategies from the agent.

Agent

The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the collectors away from the client.

Collector

The Jaeger collector receives traces from Jaeger agents and runs them through a processing pipeline. Currently our pipeline validates traces, indexes them, performs any transformations, and finally stores them.

Jaeger’s storage is a pluggable component which currently supports CassandraElasticsearch and Kafka

Ingester

Ingester is a service that reads from Kafka topic and writes to another storage backend (Cassandra, Elasticsearch)

Monitoring Jaeger

Jaeger itself is a distributed, microservices based system. If you run it in production, you will likely want to setup adequate monitoring for different components, e.g. to ensure that the backend is not saturated by too much tracing data

Metrics

By default Jaeger microservices expose metrics in Prometheus format. It is controlled by the following command line options:

  • --metrics-backend controls how the measurements are exposed. The default value is prometheus, another option is expvar, the Go standard mechanism for exposing process level statistics.
  • --metrics-http-route specifies the name of the HTTP endpoint used to scrape the metrics (/metrics by default).

Each Jaeger component exposes the metrics scraping endpoint on one of the HTTP ports they already serve:

Component Port
jaeger-agent 14271
jaeger-collector 14269
jaeger-query 16687
jaeger-ingester 14270

Logging

Jaeger components only log to standard out, using structured logging library go.uber.org/zap configured to write log lines as JSON encoded strings, for example:

{"level":"info","ts":1517621222.261759,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}

The log level can be adjusted via --log-level command line switch; default level is info.  

Build Telemetry for Distributed Services之Jaeger的更多相关文章

  1. Build Telemetry for Distributed Services之Open Telemetry简介

    官网链接:https://opentelemetry.io/about/ OpenTelemetry is the next major version of the OpenTracing and  ...

  2. Build Telemetry for Distributed Services之OpenCensus:C#

    OpenCensus Easily collect telemetry like metrics and distributed traces from your services OpenCensu ...

  3. Build Telemetry for Distributed Services之OpenTracing实践

    官网:https://opentracing.io/docs/best-practices/ Best Practices This page aims to illustrate common us ...

  4. Build Telemetry for Distributed Services之Open Telemetry来历

    官网:https://opentelemetry.io/ github:https://github.com/open-telemetry/ Effective observability requi ...

  5. Build Telemetry for Distributed Services之OpenTracing简介

    官网地址:https://opentracing.io/ What is Distributed Tracing? Who Uses Distributed Tracing? What is Open ...

  6. Build Telemetry for Distributed Services之OpenTracing项目

    中文文档地址:https://wu-sheng.gitbooks.io/opentracing-io/content/pages/quick-start.html 中文github地址:https:/ ...

  7. Build Telemetry for Distributed Services之Elastic APM

    官网地址:https://www.elastic.co/guide/en/apm/get-started/current/index.html Overview Elastic APM is an a ...

  8. Build Telemetry for Distributed Services之OpenCensus:Tracing2(待续)

    part 1:Tracing1 Sampling Sampling Samplers Global sampler Per span sampler Rules References

  9. Build Telemetry for Distributed Services之OpenTracing指导:C#

    官网链接:https://opentracing.io/guides/ 官方微博:https://medium.com/opentracing Welcome to the OpenTracing G ...

随机推荐

  1. 大海航行靠舵手 华为云靠什么征服K8S?

    Kubernetes 是Google开源的容器集群管理系统或者称为分布式操作系统.它构建在Docker技术之上,为容器化的应用提供资源调度.部署运行.服务发现.扩容缩容等整一套功能,本质上可看作是基于 ...

  2. mybatise 设置全局变量实例

    前言 在平时的工作中有时候是需要在配置文件中配置全局变量的,因为这些东西是不会变的,并且每个mapper都传参的话也显得有点繁琐,还好mybatis本身是支持全局变量的,今天工作中用到了,记录一下. ...

  3. 【python】获取目录下的最新文件夹/文件

    直接上代码 def new_report(test_report): lists = os.listdir(test_report) #列出目录的下所有文件和文件夹保存到lists print(lis ...

  4. JS基本数据类型和引用数据类型区别

    1.栈(stack)和堆(heap) stack为自动分配的内存空间,它由系统自动释放:而heap则是动态分配的内存,大小也不一定会自动释放 2.数据类型 JS分两种数据类型: 基本数据类型:Numb ...

  5. Nginx中ngx_http_fastcgi_module

    转发请求到 FastCGI 服务器器,不不⽀支持 php 模块⽅方式指令:15.1 fastcgi_pass设置 fastcgi 服务器器的地址.地址可以指定为域名或 IP 地址,以及端⼝口Synta ...

  6. 从客户端中检测到有潜在危险的 request.form值

    这里只说ASP.NET MVC的解决方法,ASP.NET已经不碰了. 给控制器加上[ValidateInput(false)]特性即可忽略含有HTML标记的内容. [HttpPost] [Valida ...

  7. 018_linux驱动之_阻塞和非阻塞

    阻塞操作     是指在执行设备操作时若不能获得资源则挂起进程,直到满足可操作的条件后再进行操作. 被挂起的进程进入休眠状态,被从调度器的运行队列移走,直到等待的条件被满足.   非阻塞操作   进程 ...

  8. scapy2 爬取全站,以及使用post请求

    前情提要: 一:scrapy 爬取妹子网 全站  知识点: scrapy回调函数的使用 二: scrapy的各个组件之间的关系解析 Scrapy 框架 Scrapy是用纯Python实现一个为了爬取网 ...

  9. CSP-S模拟测试 88 题解

    T1 queue: 考场写出dp柿子后觉得很斜率优化,然后因为理解错了题觉得斜率优化完全不可做,只打了暴力. 实际上他是可以乱序的,所以直接sort,正确性比较显然,贪心可证,然后就是个sb斜率优化d ...

  10. c实现循环链表

    解决约瑟夫环问题核心步骤: 1.建立具有n个节点.无头的循环链表 2.确定第一个报数人的位置 3.不断从链表中删除链节点,直到链表为空 #include <iostream> #inclu ...