https://www.python.org/dev/peps/pep-0442/

PEP 442 -- Safe object finalization

PEP: 442
Title: Safe object finalization
Author: Antoine Pitrou <solipsis at pitrou.net>
BDFL-Delegate: Benjamin Peterson <benjamin at python.org>
Status: Final
Type: Standards Track
Created: 2013-05-18
Python-Version: 3.4
Post-History: 2013-05-18
Resolution: https://mail.python.org/pipermail/python-dev/2013-June/126746.html

Abstract

This PEP proposes to deal with the current limitations of object finalization. The goal is to be able to define and run finalizers for any object, regardless of their position in the object graph.

This PEP doesn't call for any change in Python code. Objects with existing finalizers will benefit automatically.

Definitions

Reference
A directional link from an object to another. The target of the reference is kept alive by the reference, as long as the source is itself alive and the reference isn't cleared.
Weak reference
A directional link from an object to another, which doesn't keep alive its target. This PEP focusses on non-weak references.
Reference cycle
A cyclic subgraph of directional links between objects, which keeps those objects from being collected in a pure reference-counting scheme.
Cyclic isolate (CI)
A standalone subgraph of objects in which no object is referenced from the outside, containing one or several reference cycles, and whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers.
Cyclic garbage collector (GC)
A device able to detect cyclic isolates and turn them into cyclic trash. Objects in cyclic trash are eventually disposed of by the natural effect of the references being cleared and their reference counts dropping to zero.
Cyclic trash (CT)
A former cyclic isolate whose objects have started being cleared by the GC. Objects in cyclic trash are potential zombies; if they are accessed by Python code, the symptoms can vary from weird AttributeErrors to crashes.
Zombie / broken object
An object part of cyclic trash. The term stresses that the object is not safe: its outgoing references may have been cleared, or one of the objects it references may be zombie. Therefore, it should not be accessed by arbitrary code (such as finalizers).
Finalizer
A function or method called when an object is intended to be disposed of. The finalizer can access the object and release any resource held by the object (for example mutexes or file descriptors). An example is a __del__ method.
Resurrection
The process by which a finalizer creates a new reference to an object in a CI. This can happen as a quirky but supported side-effect of __del__ methods.

Impact

While this PEP discusses CPython-specific implementation details, the change in finalization semantics is expected to affect the Python ecosystem as a whole. In particular, this PEP obsoletes the current guideline that "objects with a __del__ method should not be part of a reference cycle".

Benefits

The primary benefits of this PEP regard objects with finalizers, such as objects with a __del__ method and generators with a finally block. Those objects can now be reclaimed when they are part of a reference cycle.

The PEP also paves the way for further benefits:

  • The module shutdown procedure may not need to set global variables to None anymore. This could solve a well-known class of irritating issues.

The PEP doesn't change the semantics of:

  • Weak references caught in reference cycles.
  • C extension types with a custom tp_dealloc function.

Description

Reference-counted disposal

In normal reference-counted disposal, an object's finalizer is called just before the object is deallocated. If the finalizer resurrects the object, deallocation is aborted.

However, if the object was already finalized, then the finalizer isn't called. This prevents us from finalizing zombies (see below).

Disposal of cyclic isolates

Cyclic isolates are first detected by the garbage collector, and then disposed of. The detection phase doesn't change and won't be described here. Disposal of a CI traditionally works in the following order:

  1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
  2. The CI becomes a CT as the GC systematically breaks all known references inside it (using the tp_clear function).
  3. Nothing. All CT objects should have been disposed of in step 2 (as a side-effect of clearing references); this collection is finished.

This PEP proposes to turn CI disposal into the following sequence (new steps are in bold):

  1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
  2. The finalizers of all CI objects are called.
  3. The CI is traversed again to determine if it is still isolated. If it is determined that at least one object in CI is now reachable from outside the CI, this collection is aborted and the whole CI is resurrected. Otherwise, proceed.
  4. The CI becomes a CT as the GC systematically breaks all known references inside it (using the tp_clear function).
  5. Nothing. All CT objects should have been disposed of in step 4 (as a side-effect of clearing references); this collection is finished.

Note

The GC doesn't recalculate the CI after step 2 above, hence the need for step 3 to check that the whole subgraph is still isolated.

C-level changes

Type objects get a new tp_finalize slot to which __del__ methods are mapped (and reciprocally). Generators are modified to use this slot, rather than tp_del. A tp_finalize function is a normal C function which will be called with a valid and alive PyObjectas its only argument. It doesn't need to manipulate the object's reference count, as this will be done by the caller. However, it must ensure that the original exception state is restored before returning to the caller.

For compatibility, tp_del is kept in the type structure. Handling of objects with a non-NULL tp_del is unchanged: when part of a CI, they are not finalized and end up in gc.garbage. However, a non-NULL tp_del is not encountered anymore in the CPython source tree (except for testing purposes).

Two new C API functions are provided to ease calling of tp_finalize, especially from custom deallocators.

On the internal side, a bit is reserved in the GC header for GC-managed objects to signal that they were finalized. This helps avoid finalizing an object twice (and, especially, finalizing a CT object after it was broken by the GC).

Note

Objects which are not GC-enabled can also have a tp_finalize slot. They don't need the additional bit since their tp_finalize function can only be called from the deallocator: it therefore cannot be called twice, except when resurrected.

Discussion

Predictability

Following this scheme, an object's finalizer is always called exactly once, even if it was resurrected afterwards.

For CI objects, the order in which finalizers are called (step 2 above) is undefined.

Safety

It is important to explain why the proposed change is safe. There are two aspects to be discussed:

  • Can a finalizer access zombie objects (including the object being finalized)?
  • What happens if a finalizer mutates the object graph so as to impact the CI?

Let's discuss the first issue. We will divide possible cases in two categories:

  • If the object being finalized is part of the CI: by construction, no objects in CI are zombies yet, since CI finalizers are called before any reference breaking is done. Therefore, the finalizer cannot access zombie objects, which don't exist.
  • If the object being finalized is not part of the CI/CT: by definition, objects in the CI/CT don't have any references pointing to them from outside the CI/CT. Therefore, the finalizer cannot reach any zombie object (that is, even if the object being finalized was itself referenced from a zombie object).

Now for the second issue. There are three potential cases:

  • The finalizer clears an existing reference to a CI object. The CI object may be disposed of before the GC tries to break it, which is fine (the GC simply has to be aware of this possibility).
  • The finalizer creates a new reference to a CI object. This can only happen from a CI object's finalizer (see above why). Therefore, the new reference will be detected by the GC after all CI finalizers are called (step 3 above), and collection will be aborted without any objects being broken.
  • The finalizer clears or creates a reference to a non-CI object. By construction, this is not a problem.

Implementation

An implementation is available in branch finalize of the repository at http://hg.python.org/features/finalize/.

Validation

Besides running the normal Python test suite, the implementation adds test cases for various finalization possibilities including reference cycles, object resurrection and legacy tp_del slots.

The implementation has also been checked to not produce any regressions on the following test suites:

References

Notes about reference cycle collection and weak reference callbacks:http://hg.python.org/cpython/file/4e687d53b645/Modules/gc_weakref.txt

Generator memory leak: http://bugs.python.org/issue17468

Allow objects to decide if they can be collected by GC: http://bugs.python.org/issue9141

Module shutdown procedure based on GC http://bugs.python.org/issue812369

Copyright

This document has been placed in the public domain.

Source: https://github.com/python/peps/blob/master/pep-0442.txt

PEP 442 -- Safe object finalization的更多相关文章

  1. 深入tornado中的协程

    tornado使用了单进程(当然也可以多进程) + 协程 + I/O多路复用的机制,解决了C10K中因为过多的线程(进程)的上下文切换 而导致的cpu资源的浪费. tornado中的I/O多路复用前面 ...

  2. 2.5 – Garbage Collection 自动垃圾回收 Stop-the-world vs. incremental vs. concurrent 垃圾回收策略

    2.5 – Garbage Collection  自动垃圾回收 Lua 5.3 Reference Manual http://www.lua.org/manual/5.3/manual.html# ...

  3. 关于C#你应该知道的2000件事

    原文 关于C#你应该知道的2000件事 下面列出了迄今为止你应该了解的关于C#博客的2000件事的所有帖子. 帖子总数= 1,219 大会 #11 -检查IL使用程序Ildasm.exe d #179 ...

  4. 禁止使用finalize方法

    Don´t use Finalizers, mainly because are unpredictable and we don´t know when will be executed, &quo ...

  5. ExtJs4得知(五岁以下儿童)主要的Ext分类

    Ext类是ExtJs最常见的.最基本的类,它是一个全局对象,它封装了全班.辛格尔顿和 Sencha 该方法提供了一种有用的库. 嵌套在该命名空间中一个较低的水平最用户界面组件. 但是提供了很多有用的功 ...

  6. 再谈.net的堆和栈---.NET Memory Management Basics

    .NET Memory Management Basics .NET memory management is designed so that the programmer is freed fro ...

  7. (转)调用System.gc没有立即执行的解决方法

    调用System.gc没有立即执行的解决方法 查看源码 当我们调用System.gc()的时候,其实并不会马上进行垃圾回收,甚至不一定会执行垃圾回收,查看系统源码可以看到 /** * Indicate ...

  8. Servet

    一.Servlet 是单例吗 不是. 1.你可以用多个 URL 映射同一个 Servlet.这样就会出现多个实例. 2.看看 Servlet 定义: 引用 For a servlet not host ...

  9. .NET本质论 实例

    对象和值的比较 CLR的类型系统(其实就是通用类型系统(CTS),它定义了如何在运行库中声明,使用和管理类型,同时也是运行库支持跨语言集成的一个重要组成部分)将对应简单值的类型同对应传统"对 ...

随机推荐

  1. 注册码云和使用git

    1.4.1 码云 注册码云 码云 填写信息注册后进入 创建仓库 问题:提交到码云的中文变成乱码 可以改变本机文件保存的编码为UTF-8即可 1.4.2 git git官网下载安装包 双击安装包开始安装 ...

  2. CentOS7安装CDH 第十章:CDH中安装Spark2

    相关文章链接 CentOS7安装CDH 第一章:CentOS7系统安装 CentOS7安装CDH 第二章:CentOS7各个软件安装和启动 CentOS7安装CDH 第三章:CDH中的问题和解决方法 ...

  3. Computer Vision_33_SIFT:A novel coarse-to-fine scheme for automatic image registration based on SIFT and mutual information——2014

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...

  4. Set的交集、差集踩坑记录

    项目中我用到了Set的retainAll和removeAll两个方法取差集和交集. 用法网上都有,我也不展示了. 但是因为我是急着用,直接就照着写了,没想到出大问题了. 因为我的set是一个map的k ...

  5. 【OF框架】新建库表及对应实体,并实现简单的增删改查操作,封装操作标准WebApi

    准备 搭建好项目框架及数据库,了解框架规范. 1.数据库表和实体一一对应,表名实体名名字相同,用小写,下划线连接.字段名用驼峰命名法,首字母大写. 2.实体放在Entities目录下,继承Entity ...

  6. Django :中间 件与csrf

    一.中间件 什么是中间件 中间件有什么用 自定义中间件 中间件应用场景 二.csrf csrf token跨站请求伪造 一.中间件 1.什么是中间件 中间件顾名思义,是介于request与respon ...

  7. Laravel 队列的简单使用例子

    场景: 在一个a系统中注册一个用户时,发送请求到b系统中也注册一个相同信息的账号,考虑到网络有可能错误的原因,所以使用队列去处理 1.修改根目录 .env 文件的QUEUE_CONNECTION字段配 ...

  8. 为什么要设置HTTP timeout?

    先看一个不设置timeout造成的线上事故. 一次线上事故 有一次生产上的一个服务出了点故障,一个原本每5分钟执行一次的定时任务突然不执行了.第一反应是任务执行报错,查看日志,却没有找到任何异常报错信 ...

  9. Oracle LOB 大对象处理

    LOB类型列主要是用来存储大量数据的数据库字段,最大可以存储4G字节的非结构化数据. 一.LOB数据类型分类 1.按存储数据的类型分: ①字符类型:   CLOB:存储大量 单字节 字符数据.   N ...

  10. Java:JVM垃圾回收(GC)机制

    JVM垃圾回收算法 1.标记清除(Mark-Sweep) 原理: 从根集合节点进行扫描,标记出所有的存活对象,最后扫描整个内存空间并清除没有标记的对象(即死亡对象)适用场合: 存活对象较多的情况下比较 ...