Introducing the Go Race Detector

26 June 2013

Introduction

Race conditions are among the most insidious and elusive programming errors. They typically cause erratic and mysterious failures, often long after the code has been deployed to production. While Go's concurrency mechanisms make it easy to write clean concurrent code, they don't prevent race conditions. Care, diligence, and testing are required. And tools can help.

We're happy to announce that Go 1.1 includes a race detector, a new tool for finding race conditions in Go code. It is currently available for Linux, OS X, and Windows systems with 64-bit x86 processors.

The race detector is based on the C/C++ ThreadSanitizer runtime library, which has been used to detect many errors in Google's internal code base and in Chromium. The technology was integrated with Go in September 2012; since then it has detected 42 races in the standard library. It is now part of our continuous build process, where it continues to catch race conditions as they arise.

How it works

The race detector is integrated with the go tool chain. When the -race command-line flag is set, the compiler instruments all memory accesses with code that records when and how the memory was accessed, while the runtime library watches for unsynchronized accesses to shared variables. When such "racy" behavior is detected, a warning is printed. (See this article for the details of the algorithm.)

Because of its design, the race detector can detect race conditions only when they are actually triggered by running code, which means it's important to run race-enabled binaries under realistic workloads. However, race-enabled binaries can use ten times the CPU and memory, so it is impractical to enable the race detector all the time. One way out of this dilemma is to run some tests with the race detector enabled. Load tests and integration tests are good candidates, since they tend to exercise concurrent parts of the code. Another approach using production workloads is to deploy a single race-enabled instance within a pool of running servers.

Using the race detector

The race detector is fully integrated with the Go tool chain. To build your code with the race detector enabled, just add the -race flag to the command line:

$ go test -race mypkg    // test the package
$ go run -race mysrc.go  // compile and run the program
$ go build -race mycmd   // build the command
$ go install -race mypkg // install the package

To try out the race detector for yourself, fetch and run this example program:

$ go get -race golang.org/x/blog/support/racy
$ racy

Examples

Here are two examples of real issues caught by the race detector.

Example 1: Timer.Reset

The first example is a simplified version of an actual bug found by the race detector. It uses a timer to print a message after a random duration between 0 and 1 second. It does so repeatedly for five seconds. It uses time.AfterFunc to create a Timer for the first message and then uses the Reset method to schedule the next message, re-using the Timer each time.

11 func main() {
12 start := time.Now()
13 var t *time.Timer
14 t = time.AfterFunc(randomDuration(), func() {
15 fmt.Println(time.Now().Sub(start))
16 t.Reset(randomDuration())
17 })
18 time.Sleep(5 * time.Second)
19 }
20
21 func randomDuration() time.Duration {
22 return time.Duration(rand.Int63n(1e9))
23 }
Run

This looks like reasonable code, but under certain circumstances it fails in a surprising way:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x8 pc=0x41e38a] goroutine 4 [running]:
time.stopTimer(0x8, 0x12fe6b35d9472d96)
src/pkg/runtime/ztime_linux_amd64.c:35 +0x25
time.(*Timer).Reset(0x0, 0x4e5904f, 0x1)
src/pkg/time/sleep.go:81 +0x42
main.func·001()
race.go:14 +0xe3
created by time.goFunc
src/pkg/time/sleep.go:122 +0x48

What's going on here? Running the program with the race detector enabled is more illuminating:

==================
WARNING: DATA RACE
Read by goroutine 5:
main.func·001()
race.go:14 +0x169 Previous write by goroutine 1:
main.main()
race.go:15 +0x174 Goroutine 5 (running) created at:
time.goFunc()
src/pkg/time/sleep.go:122 +0x56
timerproc()
src/pkg/runtime/ztime_linux_amd64.c:181 +0x189
==================

The race detector shows the problem: an unsynchronized read and write of the variable t from different goroutines. If the initial timer duration is very small, the timer function may fire before the main goroutine has assigned a value to t and so the call to t.Reset is made with a nil t.

To fix the race condition we change the code to read and write the variable t only from the main goroutine:

11 func main() {
12 start := time.Now()
13 reset := make(chan bool)
14 var t *time.Timer
15 t = time.AfterFunc(randomDuration(), func() {
16 fmt.Println(time.Now().Sub(start))
17 reset <- true
18 })
19 for time.Since(start) < 5*time.Second {
20 <-reset
21 t.Reset(randomDuration())
22 }
23 }
Run

Here the main goroutine is wholly responsible for setting and resetting the Timer t and a new reset channel communicates the need to reset the timer in a thread-safe way.

A simpler but less efficient approach is to avoid reusing timers.

Example 2: ioutil.Discard

The second example is more subtle.

The ioutil package's Discard object implements io.Writer, but discards all the data written to it. Think of it like /dev/null: a place to send data that you need to read but don't want to store. It is commonly used with io.Copy to drain a reader, like this:

io.Copy(ioutil.Discard, reader)

Back in July 2011 the Go team noticed that using Discard in this way was inefficient: the Copy function allocates an internal 32 kB buffer each time it is called, but when used with Discard the buffer is unnecessary since we're just throwing the read data away. We thought that this idiomatic use of Copy and Discard should not be so costly.

The fix was simple. If the given Writer implements a ReadFrom method, a Copy call like this:

io.Copy(writer, reader)

is delegated to this potentially more efficient call:

writer.ReadFrom(reader)

We added a ReadFrom method to Discard's underlying type, which has an internal buffer that is shared between all its users. We knew this was theoretically a race condition, but since all writes to the buffer should be thrown away we didn't think it was important.

When the race detector was implemented it immediately flagged this code as racy. Again, we considered that the code might be problematic, but decided that the race condition wasn't "real". To avoid the "false positive" in our build we implemented a non-racy version that is enabled only when the race detector is running.

But a few months later Brad encountered a frustrating and strange bug. After a few days of debugging, he narrowed it down to a real race condition caused by ioutil.Discard.

Here is the known-racy code in io/ioutil, where Discard is a devNull that shares a single buffer between all of its users.

var blackHole [4096]byte // shared buffer

func (devNull) ReadFrom(r io.Reader) (n int64, err error) {
readSize := 0
for {
readSize, err = r.Read(blackHole[:])
n += int64(readSize)
if err != nil {
if err == io.EOF {
return n, nil
}
return
}
}
}

Brad's program includes a trackDigestReader type, which wraps an io.Reader and records the hash digest of what it reads.

type trackDigestReader struct {
r io.Reader
h hash.Hash
} func (t trackDigestReader) Read(p []byte) (n int, err error) {
n, err = t.r.Read(p)
t.h.Write(p[:n])
return
}

For example, it could be used to compute the SHA-1 hash of a file while reading it:

tdr := trackDigestReader{r: file, h: sha1.New()}
io.Copy(writer, tdr)
fmt.Printf("File hash: %x", tdr.h.Sum(nil))

In some cases there would be nowhere to write the data—but still a need to hash the file—and so Discard would be used:

io.Copy(ioutil.Discard, tdr)

But in this case the blackHole buffer isn't just a black hole; it is a legitimate place to store the data between reading it from the source io.Reader and writing it to the hash.Hash. With multiple goroutines hashing files simultaneously, each sharing the same blackHole buffer, the race condition manifested itself by corrupting the data between reading and hashing. No errors or panics occurred, but the hashes were wrong. Nasty!

func (t trackDigestReader) Read(p []byte) (n int, err error) {
// the buffer p is blackHole
n, err = t.r.Read(p)
// p may be corrupted by another goroutine here,
// between the Read above and the Write below
t.h.Write(p[:n])
return
}

The bug was finally fixed by giving a unique buffer to each use of ioutil.Discard, eliminating the race condition on the shared buffer.

Conclusions

The race detector is a powerful tool for checking the correctness of concurrent programs. It will not issue false positives, so take its warnings seriously. But it is only as good as your tests; you must make sure they thoroughly exercise the concurrent properties of your code so that the race detector can do its job.

What are you waiting for? Run "go test -race" on your code today!

By Dmitry Vyukov and Andrew Gerrand

Related articles

33 Introducing the Go Race Detector的更多相关文章

  1. 28 Data Race Detector 数据种类探测器:数据种类探测器手册

    Data Race Detector 数据种类探测器:数据种类探测器手册 Introduction Usage Report Format Options Excluding Tests How To ...

  2. golang中的race检测

    golang中的race检测 由于golang中的go是非常方便的,加上函数又非常容易隐藏go. 所以很多时候,当我们写出一个程序的时候,我们并不知道这个程序在并发情况下会不会出现什么问题. 所以在本 ...

  3. 31 Godoc: documenting Go code 编写良好的文档关于godoc

    Godoc: documenting Go code  编写良好的文档关于godoc 31 March 2011 The Go project takes documentation seriousl ...

  4. 32 Profiling Go Programs 分析go语言项目

    Profiling Go Programs  分析go语言项目 24 June 2011 At Scala Days 2011, Robert Hundt presented a paper titl ...

  5. 30 C? Go? Cgo!

    C? Go? Cgo! 17 March 2011 Introduction Cgo lets Go packages call C code. Given a Go source file writ ...

  6. 25 The Go image/draw package go图片/描绘包:图片/描绘包的基本原理

    The Go image/draw package  go图片/描绘包:图片/描绘包的基本原理 29 September 2011 Introduction Package image/draw de ...

  7. 24 The Go image package go图片包:图片包的基本原理

    The Go image package  go图片包:图片包的基本原理 21 September 2011 Introduction The image and image/color packag ...

  8. 23 The Laws of Reflection 反射定律:反射包的基本原理

    The Laws of Reflection  反射定律:反射包的基本原理 6 September 2011 Introduction 介绍 Reflection in computing is th ...

  9. 22 Gobs of data 设计和使用采集数据的包

    Gobs of data 24 March 2011 Introduction To transmit a data structure across a network or to store it ...

随机推荐

  1. bzoj1683[Usaco2005 Nov]City skyline 城市地平线

    题目链接:http://www.lydsy.com/JudgeOnline/problem.php?id=1683 Input 第1行:2个用空格隔开的整数N和W. 第2到N+1行:每行包括2个用空格 ...

  2. loj2537 「PKUWC2018」Minimax 【概率 + 线段树合并】

    题目链接 loj2537 题解 观察题目的式子似乎没有什么意义,我们考虑计算出每一种权值的概率 先离散化一下权值 显然可以设一个\(dp\),设\(f[i][j]\)表示\(i\)节点权值为\(j\) ...

  3. BZOJ.4034 [HAOI2015]树上操作 ( 点权树链剖分 线段树 )

    BZOJ.4034 [HAOI2015]树上操作 ( 点权树链剖分 线段树 ) 题意分析 有一棵点数为 N 的树,以点 1 为根,且树点有边权.然后有 M 个 操作,分为三种: 操作 1 :把某个节点 ...

  4. WEB入门.八 背景特效

    学习内容 background属性 CSS Sprite 技术 滑动门技术 能力目标 使用background设置网页背景 使用Sprites制作平滑投票特效 使用滑动门技术实现Tab菜单 本章简介 ...

  5. 如何在 Android 程序中禁止屏幕旋转和重启Activity

    禁止屏幕随手机旋转变化 有时候我们希望让一个程序的界面始终保持在一个方向,不随手机方向旋转而变化:在AndroidManifest.xml的每一个需要禁止转向的Activity配置中加入android ...

  6. supervisor "unix:///var/run/supervisor/supervisor.sock no such file" 解决方法

    如果是没有开启 supervisord 服务的情况下出现这种报错,可以先 systemctl start supervisor 试试, 如果不是,那就 sudo touch /var/run/supe ...

  7. 浏览器json数据格式化

    在浏览器上作接口测试的时候看到json 格式的数据是密密麻麻的一片,眼睛都花了..  如: 设置方法:  chrome  的右上角选择,然后---  更多工具---  扩展程序  ----   JSO ...

  8. Eureka的一些注意事项

    1.心跳设置:只能在application.yml中 2. 注册到Eureka上面的服务名称 与swagger2使用的时候,需要配置此项,否则显示服务名称为unknown 3.高可用的Eureka 4 ...

  9. Java 9 新特性快速预览

    原文出处:wangwenjun69 Java 8 已经出来三年多的时间了,原本计划2016年七月份release Java 9,但是基于种种原因,Java 9 被推迟到了2017年的3月份,本人也在O ...

  10. caffe 配置文件详解

    主要是遇坑了,要记录一下. solver算是caffe的核心的核心,它协调着整个模型的运作.caffe程序运行必带的一个参数就是solver配置文件.运行代码一般为 # caffe train --s ...