Build Telemetry for Distributed Services之OpenTracing项目
中文文档地址:https://wu-sheng.gitbooks.io/opentracing-io/content/pages/quick-start.html
中文github地址:https://github.com/opentracing-contrib/opentracing-specification-zh
参考资料:
Metrics, tracing 和 logging 的关系
The OpenTracing Semantic Specification
Version: 1.1
Document Overview
This is the “formal” OpenTracing semantic specification. Since OpenTracing must work across many languages, this document takes care to avoid language-specific concepts. That said, there is an understanding throughout that all languages have some concept of an “interface” which encapsulates a set of related capabilities.
Versioning policy
The OpenTracing specification uses a Major.Minor
version number but has no .Patch
component. The major version increments when backwards-incompatible changes are made to the specification. The minor version increments for non-breaking changes like the introduction of new standard tags, log fields, or SpanContext reference types. (You can read more about the motivation for this versioning scheme at Issue specification#2)
The OpenTracing Data Model
Traces in OpenTracing are defined implicitly by their Spans. In particular, a Trace can be thought of as a directed acyclic graph (DAG) of Spans, where the edges between Spans are called References.
For example, the following is an example Trace made up of 8 Spans:
Causal relationships between Spans in a single Trace
[Span A] ←←←(the root span)
|
+------+------+
| |
[Span B] [Span C] ←←←(Span C is a `ChildOf` Span A)
| |
[Span D] +---+-------+
| |
[Span E] [Span F] >>> [Span G] >>> [Span H]
↑
↑
↑
(Span G `FollowsFrom` Span F)
Sometimes it’s easier to visualize Traces with a time axis as in the diagram below:
Temporal relationships between Spans in a single Trace
––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time
[Span A···················································]
[Span B··············································]
[Span D··········································]
[Span C········································]
[Span E·······] [Span F··] [Span G··] [Span H··]
Each Span encapsulates the following state:
- An operation name
- A start timestamp
- A finish timestamp
- A set of zero or more key:value Span Tags. The keys must be strings. The values may be strings, bools, or numeric types.
- A set of zero or more Span Logs, each of which is itself a key:value map paired with a timestamp. The keys must be strings, though the values may be of any type. Not all OpenTracing implementations must support every value type.
- A SpanContext (see below)
- References to zero or more causally-related Spans (via the SpanContext of those related Spans)
Each SpanContext encapsulates the following state:
- Any OpenTracing-implementation-dependent state (for example, trace and span ids) needed to refer to a distinct Span across a process boundary
- Baggage Items, which are just key:value pairs that cross process boundaries
References between Spans
A Span may reference zero or more other SpanContexts that are causally related. OpenTracing presently defines two types of references: ChildOf
and FollowsFrom
. Both reference types specifically model direct causal relationships between a child Span and a parent Span. In the future, OpenTracing may also support reference types for Spans with non-causal relationships (e.g., Spans that are batched together, Spans that are stuck in the same queue, etc).
ChildOf
references: A Span may be the ChildOf
a parent Span. In a ChildOf
reference, the parent Span depends on the child Span in some capacity. All of the following would constitute ChildOf
relationships:
- A Span representing the server side of an RPC may be the
ChildOf
a Span representing the client side of that RPC - A Span representing a SQL insert may be the
ChildOf
a Span representing an ORM save method - Many Spans doing concurrent (perhaps distributed) work may all individually be the
ChildOf
a single parent Span that merges the results for all children that return within a deadline
These could all be valid timing diagrams for children that are the ChildOf
a parent.
[-Parent Span---------]
[-Child Span----]
[-Parent Span--------------]
[-Child Span A----]
[-Child Span B----]
[-Child Span C----]
[-Child Span D---------------]
[-Child Span E----]
FollowsFrom
references: Some parent Spans do not depend in any way on the result of their child Spans. In these cases, we say merely that the child Span FollowsFrom
the parent Span in a causal sense. There are many distinct FollowsFrom
reference sub-categories, and in future versions of OpenTracing they may be distinguished more formally.
These can all be valid timing diagrams for children that “FollowFrom” a parent.
[-Parent Span-] [-Child Span-]
[-Parent Span--]
[-Child Span-]
[-Parent Span-]
[-Child Span-]
The OpenTracing API
There are three critical and inter-related types in the OpenTracing specification: Tracer
, Span
, and SpanContext
. Below, we go through the behaviors of each type; roughly speaking, each behavior becomes a “method” in a typical programming language, though it may actually be a set of related sibling methods due to type overloading and so on.
When we discuss “optional” parameters, it is understood that different languages have different ways to construe such concepts. For example, in Go we might use the “functional Options” idiom, whereas in Java we might use a builder pattern.
Tracer
The Tracer
interface creates Span
s and understands how to Inject
(serialize) and Extract
(deserialize) them across process boundaries. Formally, it has the following capabilities:
Start a new Span
Required parameters
- An operation name, a human-readable string which concisely represents the work done by the Span (for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation). The operation name should be the most general string that identifies a (statistically) interesting class of
Span
instances. That is,"get_user"
is better than"get_user/314159"
.
For example, here are potential operation names for a Span
that gets hypothetical account information:
Operation Name | Guidance |
---|---|
get |
Too general |
get_account/792 |
Too specific |
get_account |
Good, and account_id=792 would make a nice Span tag |
Optional parameters
- Zero or more references to related
SpanContext
s, including a shorthand forChildOf
andFollowsFrom
reference types if possible. - An optional explicit start timestamp; if omitted, the current walltime is used by default
- Zero or more tags
Returns a Span
instance that’s already started (but not Finish
ed)
Inject a SpanContext
into a carrier
Required parameters
- A
SpanContext
instance - A format descriptor (typically but not necessarily a string constant) which tells the
Tracer
implementation how to encode theSpanContext
in the carrier parameter - A carrier, whose type is dictated by the format. The
Tracer
implementation will encode theSpanContext
in this carrier object according to the format.
Returns a SpanContext
instance suitable for use as a reference when starting a new Span
via the Tracer
.
Note: required formats for injection and extraction
Both injection and extraction rely on an extensible format parameter that dictates the type of the associated “carrier” as well as how a SpanContext
is encoded in that carrier. All of the following formats must be supported by all Tracer implementations.
- Text Map: an arbitrary string-to-string map with an unrestricted character set for both keys and values
- HTTP Headers: a string-to-string map with keys and values that are suitable for use in HTTP headers (a la RFC 7230). In practice, since there is such “diversity” in the way that HTTP headers are treated in the wild, it is strongly recommended that Tracer implementations use a limited HTTP header key space and escape values conservatively.
- Binary: a (single) arbitrary binary blob representing a
SpanContext
Span
With the exception of the method to retrieve the Span
’s SpanContext
, none of the below may be called after the Span
is finished.
Retrieve the Span
s SpanContext
There should be no parameters.
Returns the SpanContext
for the given Span
. The returned value may be used even after the Span
is finished.
Overwrite the operation name
Required parameters
- The new operation name, which supersedes whatever was passed in when the
Span
was started
Finish the Span
Optional parameters
- An explicit finish timestamp for the
Span
; if omitted, the current walltime is used implicitly.
With the exception of the method to retrieve a Span
’s SpanContext
, no method may be called on a Span
instance after it’s finished.
Set a Span
tag
Required parameters
- The tag key, which must be a string
- The tag value, which must be either a string, a boolean value, or a numeric type
Note that the OpenTracing project documents certain “standard tags” that have prescribed semantic meanings.
Log structured data
Required parameters
- One or more key:value pairs, where the keys must be strings and the values may have any type at all. Some OpenTracing implementations may handle more (or more of) certain log values than others.
Optional parameters
- An explicit timestamp. If specified, it must fall between the local start and finish time for the span.
Note that the OpenTracing project documents certain “standard log keys” which have prescribed semantic meanings.
Set a baggage item
Baggage items are key:value string pairs that apply to the given Span
, its SpanContext
, and all Spans
which directly or transitively reference the local Span
. That is, baggage items propagate in-band along with the trace itself.
Baggage items enable powerful functionality given a full-stack OpenTracing integration (for example, arbitrary application data from a mobile app can make it, transparently, all the way into the depths of a storage system), and with it some powerful costs: use this feature with care.
Use this feature thoughtfully and with care. Every key and value is copied into every local and remote child of the associated Span, and that can add up to a lot of network and cpu overhead.
Required parameters
- The baggage key, a string
- The baggage value, a string
Get a baggage item
Required parameters
- The baggage key, a string
Returns either the corresponding baggage value, or some indication that such a value was missing.
SpanContext
The SpanContext
is more of a “concept” than a useful piece of functionality at the generic OpenTracing layer. That said, it is of critical importance to OpenTracing implementations and does present a thin API of its own. Most OpenTracing users only interact with SpanContext
via references when starting new Span
s, or when injecting/extracting a trace to/from some transport protocol.
In OpenTracing we force SpanContext
instances to be immutable in order to avoid complicated lifetime issues around Span
finish and references.
Iterate through all baggage items
This is modeled in different ways depending on the language, but semantically the caller should be able to efficiently iterate through all baggage items in one pass given a SpanContext
instance.
NoopTracer
All OpenTracing language APIs must also provide some sort of NoopTracer
implementation which can be used to flag-control OpenTracing or inject something harmless for tests (et cetera). In some cases (for example, Java) the NoopTracer
may be in its own packaging artifact.
Optional API Elements
Some languages also provide utilities to pass an active Span
and/or SpanContext
around a single process. For instance, opentracing-go
provides helpers to set and get the active Span
in Go’s context.Context
mechanism.
OpenTracing Project Organization
OpenTracing is a set of standard APIs that consistently model and describe the behavior of distributed systems. There are three constituencies that care about this standard:
- Tracing tool maintainers: “Instrumenting all software” is an unreasonable goal for any one particular tracing or monitoring project or vendor. OpenTracing essentially amortizes this work.
- Software developers who build and deploy applications: These developers want to use whatever tracing and observability tools that work best within their organization’s infrastructure, and they want to choose those tools independent of whatever third-party (open-source) software packages they happen to have built around.
- Software developers who contribute to widely-used software: Inasmuch as this software needs to exist within a microservices deployment, it must integrate with whatever tracing tool(s) its diverse users already depend on.
OpenTracing’s project organization makes room for each of these constituencies.
Contributed OpenTracing Support
Repositories under github.com/opentracing-contrib pertain to specific open-source software packages and projects. Each may have its own owners and internal policies regarding PRs, review requirements, and committer management. You can learn more about OpenTracing contributions via the opentracing-contrib meta-repository.
Semantic Conventions
The OpenTracing Specification describes the overarching language-neutral data model and API guidelines for OpenTracing. That data model includes the related concepts of Span Tags and (structured) Log Fields; though these terms are defined in the specification, there is no guidance there about standard Span tags or logging keys.
Those semantic conventions are described by this document. The document is divided into two sections: first, tables listing all standard Span tags and logging keys; then guidance about how to combine these to model certain important semantic concepts.
Versioning
Changes to this file affect the OpenTracing specification version. Additions should bump the minor version, and backwards-incompatible changes (or perhaps very large additions) should bump the major version.
Standard Span tags and log fields
Span tags table
Span tags apply to the entire Span; as such, they apply to the entire timerange of the Span, not a particular moment with a particular timestamp: those sorts of events are best modelled as Span log fields (per the table in the next subsection of this document).
Span tag name | Type | Notes and examples |
---|---|---|
component |
string | The software package, framework, library, or module that generated the associated Span. E.g., "grpc" , "django" , "JDBI" . |
db.instance |
string | Database instance name. E.g., In java, if the jdbc.url="jdbc:mysql://127.0.0.1:3306/customers" , the instance name is "customers" . |
db.statement |
string | A database statement for the given database type. E.g., for db.type="sql" , "SELECT * FROM wuser_table" ; for db.type="redis" , "SET mykey 'WuValue'" . |
db.type |
string | Database type. For any SQL database, "sql" . For others, the lower-case database category, e.g. "cassandra" , "hbase" , or "redis" . |
db.user |
string | Username for accessing database. E.g., "readonly_user" or "reporting_user" |
error |
bool | true if and only if the application considers the operation represented by the Span to have failed |
http.method |
string | HTTP method of the request for the associated Span. E.g., "GET" , "POST" |
http.status_code |
integer | HTTP response status code for the associated Span. E.g., 200, 503, 404 |
http.url |
string | URL of the request being handled in this segment of the trace, in standard URI format. E.g., "https://domain.net/path/to?resource=here" |
message_bus.destination |
string | An address at which messages can be exchanged. E.g. A Kafka record has an associated "topic name" that can be extracted by the instrumented producer or consumer and stored using this tag. |
peer.address |
string | Remote “address”, suitable for use in a networking client library. This may be a "ip:port" , a bare "hostname" , a FQDN, or even a JDBC substring like "mysql://prod-db:3306" |
peer.hostname |
string | Remote hostname. E.g., "opentracing.io" , "internal.dns.name" |
peer.ipv4 |
string | Remote IPv4 address as a . -separated tuple. E.g., "127.0.0.1" |
peer.ipv6 |
string | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., "2001:0db8:85a3:0000:0000:8a2e:0370:7334" |
peer.port |
integer | Remote port. E.g., 80 |
peer.service |
string | Remote service name (for some unspecified definition of "service" ). E.g., "elasticsearch" , "a_custom_microservice" , "memcache" |
sampling.priority |
integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. |
span.kind |
string | Either "client" or "server" for the appropriate roles in an RPC, and "producer" or "consumer" for the appropriate roles in a messaging scenario. |
Log fields table
Every Span log has a specific timestamp (which must fall between the start and finish timestamps of the Span, inclusive) and one or more fields. What follows are the standard fields.
Span log field name | Type | Notes and examples |
---|---|---|
error.kind |
string | The type or “kind” of an error (only for event="error" logs). E.g., "Exception" , "OSError" |
error.object |
object | For languages that support such a thing (e.g., Java, Python), the actual Throwable/Exception/Error object instance itself. E.g., A java.lang.UnsupportedOperationException instance, a python exceptions.NameError instance |
event |
string | A stable identifier for some notable moment in the lifetime of a Span. For instance, a mutex lock acquisition or release or the sorts of lifetime events in a browser page load described in the Performance.timing specification. E.g., from Zipkin, "cs" , "sr" , "ss" , or "cr" . Or, more generally, "initialized" or "timed out" . For errors, "error" |
message |
string | A concise, human-readable, one-line message explaining the event. E.g., "Could not connect to backend" , "Cache invalidation succeeded" |
stack |
string | A stack trace in platform-conventional format; may or may not pertain to an error. E.g., "File \"example.py\", line 7, in \<module\>\ncaller()\nFile \"example.py\", line 5, in caller\ncallee()\nFile \"example.py\", line 2, in callee\nraise Exception(\"Yikes\")\n" |
Modelling special circumstances
RPCs
The following Span tags combine to model RPCs:
span.kind
: either"client"
or"server"
. It is important to provide this tag at Span start time, as it may affect internal ID generation.peer.address
,peer.hostname
,peer.ipv4
,peer.ipv6
,peer.port
,peer.service
: optional tags that describe the RPC peer (often in ways it cannot assess internally)
Message Bus
A message bus is asynchronous, and therefore the relationship type used to link a Consumer Span and a Producer Span would be Follows From (see References between Spans for more information on relationship types).
The following Span tags combine to model message bus based communications:
message_bus.destination
: as described in the table abovespan.kind
: either"producer"
or"consumer"
. It is important to provide this tag at Span start time, as it may affect internal ID generation.peer.address
,peer.hostname
,peer.ipv4
,peer.ipv6
,peer.port
,peer.service
: optional tags that describe the message bus broker (often in ways it cannot assess internally)
Database (client) calls
The following Span tags combine to model database calls:
db.type
,db.instance
,db.user
, anddb.statement
: as described in the table abovepeer.address
,peer.hostname
,peer.ipv4
,peer.ipv6
,peer.port
,peer.service
: optional tags that describe the database peerspan.kind
:"client"
Captured errors
Errors may be described by OpenTracing in different ways, largely depending on the language. Some of these descriptive fields are specific to errors; others are not (e.g., the event
or message
fields).
For languages where an error object encapsulates a stack trace and type information, log the following fields:
- event=
"error"
- error.object=
<error object instance>
For other languages, or when above is not feasible:
- event=
"error"
- message=
"..."
- stack=
"..."
(optional) - error.kind=
"..."
(optional)
This scheme allows Tracer implementations to extract what information they need from the actual error object when it’s available.
Build Telemetry for Distributed Services之OpenTracing项目的更多相关文章
- Build Telemetry for Distributed Services之OpenTracing实践
官网:https://opentracing.io/docs/best-practices/ Best Practices This page aims to illustrate common us ...
- Build Telemetry for Distributed Services之OpenTracing简介
官网地址:https://opentracing.io/ What is Distributed Tracing? Who Uses Distributed Tracing? What is Open ...
- Build Telemetry for Distributed Services之OpenTracing指导:C#
官网链接:https://opentracing.io/guides/ 官方微博:https://medium.com/opentracing Welcome to the OpenTracing G ...
- Build Telemetry for Distributed Services之Open Telemetry简介
官网链接:https://opentelemetry.io/about/ OpenTelemetry is the next major version of the OpenTracing and ...
- Build Telemetry for Distributed Services之Open Telemetry来历
官网:https://opentelemetry.io/ github:https://github.com/open-telemetry/ Effective observability requi ...
- Build Telemetry for Distributed Services之Jaeger
github链接:https://github.com/jaegertracing/jaeger 官网:https://www.jaegertracing.io/ Jaeger: open sourc ...
- Build Telemetry for Distributed Services之OpenCensus:C#
OpenCensus Easily collect telemetry like metrics and distributed traces from your services OpenCensu ...
- Build Telemetry for Distributed Services之Elastic APM
官网地址:https://www.elastic.co/guide/en/apm/get-started/current/index.html Overview Elastic APM is an a ...
- Build Telemetry for Distributed Services之OpenCensus:Tracing2(待续)
part 1:Tracing1 Sampling Sampling Samplers Global sampler Per span sampler Rules References
随机推荐
- 数组和集合转化 array与list
package com.chen.test; import java.io.Serializable;import java.util.Arrays;import java.util.List;imp ...
- SQL Join 理解
对各种连接的理解,可以参照文章. 下面是对连接结果表条数统计的思考:假设有主表Ta有5条记录,从表Tb有4条记录 Ta corss join Tb, 结果为2表做笛卡尔积,5*4=20条 /*下面其它 ...
- Octave(1)
size(A)返回矩阵A的大小: >> A=[ ; ; ]; >> size(A) %返回矩阵A 的大小 ans = >> size(A,) %返回A的第一维度大小 ...
- JVM之Java运行时数据区(线程共享区)
JVM运行时区域各线程共享的区域包括堆区和方法区. 堆区 堆区最最主要的功能是存储对象实例[上篇也提到过],因此Java垃圾回收的主要战场就是在堆区,因此也有称为GC堆区.如果堆区的内存不够会出现Ou ...
- java开发时,eclipse设置编码
修改eclipse默认工作空间编码方式,General——Workspace——Text file encoding 修改工程编码方式,右击工程——Properties——Resource——Text ...
- csv测试类。用起来,就是那么简单。每个单元格都是以逗号分隔
package com.hyxt.wxpay.util; import java.io.File; import java.util.ArrayList; import java.util.List; ...
- [React] Create a Query Parameter Modal Route with React Router
Routes are some times better served as a modal. If you have a modal (like a login modal) that needs ...
- 题解 [JOI 2019 Final] 独特的城市
题面 解析 首先有一个结论, 对一个点\(x\)有贡献的城市 肯定在它到离它较远的直径的端点的链上. 假设离它较远的端点是\(S\), 如果有一个点\(u\)不在\(x\)到\(S\)的链上, 却对\ ...
- select([[data],fn])
select([[data],fn]) 概述 当 textarea 或文本类型的 input 元素中的文本被选择时,会发生 select 事件.大理石平台生产厂 这个函数会调用执行绑定到select事 ...
- React中生命周期
1.过时的生命周期(v16.3之前) 1.当前组件初次渲染: 绿色表示执行顺序. constructor(): 如果不需要初始化,可以直接省略,会自动补全该函数. 可以在这个方法中初始化this.st ...