TVM:解析TVM算子

在对TVM:编译流程一文中，从ONNX模型中读取模型并转换为relay IR，其中调用_convert_operator函数关于将onnx算子转换成Relay算子，其中如何实现当时直接跳过去了，本节将以卷积算子为例，看下Relay表达式是如何转换为TOPI算子并结合TVM的scheduler在后端上运行的

算子映射表

获取映射表的_get_convert_map()接口定义如下：

# _convert_map defines maps of name to converter functor(callable)

# for 1 to 1 mapping, use Renamer if nothing but name is different

# use AttrCvt if attributes need to be converted

# for 1 to N mapping(composed), use custom callable functions

# for N to 1 mapping, currently not supported(?)

def _get_convert_map(opset):

    return {

        # defs/experimental

        "Identity": Renamer("copy"),

        "Affine": Affine.get_converter(opset),

        "BitShift": BitShift.get_converter(opset),

        "ThresholdedRelu": ThresholdedRelu.get_converter(opset),

        "ScaledTanh": ScaledTanh.get_converter(opset),

        "ParametricSoftplus": ParametricSoftPlus.get_converter(opset),

        "Constant": Constant.get_converter(opset),

        "ConstantOfShape": ConstantOfShape.get_converter(opset),

        # 'GivenTensorFill'

        "FC": AttrCvt("dense", ignores=["axis", "axis_w"]),

        "Scale": Scale.get_converter(opset),

        # 'GRUUnit'

        # 'ATen'

        # 'ImageScaler'

        "MeanVarianceNormalization": MeanVarianceNormalization.get_converter(opset),

        # 'Crop'

        # 'Embedding'

        "Upsample": Upsample.get_converter(opset),

        "SpatialBN": BatchNorm.get_converter(opset),

从注释看，当前支持两种映射：

onnx算子到tvm算子一对一映射。这种情况是双方算子仅仅名字不同，其他都一致。算子映射接口为Renamer，返回对应的tvm算子表示；再使用AttrCvt将onnx属性转换tvm属性即可；
onnx算子在tvm中需要多个算子组合来表示，此时需要实现特定的转换函数。

代码中get_converter即第二种情况。

算子转换

在处理onnx节点时，调用_convert_operator将onnx node转换为tvm relay ir，函数实现如下：

def _convert_operator(self, op_name, inputs, attrs, opset):

        """Convert ONNX operator into a Relay operator.

        The converter must specify conversions explicitly for incompatible name, and

        apply handlers to operator attributes.

        Parameters

        ----------

        op_name : str

            Operator name, such as Convolution, FullyConnected

        inputs : list of tvm.relay.function.Function

            List of inputs.

        attrs : dict

            Dict of operator attributes

        opset : int

            Opset version

        Returns

        -------

        sym : tvm.relay.function.Function

            Converted relay function

        """

        convert_map = _get_convert_map(opset)

        if op_name in _identity_list:

            sym = get_relay_op(op_name)(*inputs, **attrs)

        elif op_name in convert_map:

            sym = convert_map[op_name](inputs, attrs, self._params)

        else:

            raise NotImplementedError("Operator {} not implemented.".format(op_name))

        return sym

可看到：

1.首先获取算子映射表
2.如果算子在_identity_list表中，调用get_relay_op得到转换后的算子表达
3.否则，如果在算子转换映射表中，调用映射接口转换算子
4.否则认为转换异常
5.返回转换后的表达式

算子转换 -- _identity_list表和get_relay_op

_identity_list表

在python/tvm/relay/frontend/onnx.py中，_identity_list表为空

# compatible operators that do NOT require any conversion.

_identity_list = []

所以_convert_operator中这个分支是走不到的。所有支持的框架里面，只有mxnet里面该表不为空:

# Note: due to attribute conversion constraint

# ops in the identity set must be attribute free

_identity_list = [

    "abs",

    "log",

    "exp",

    "erf",

    "sqrt",

    "floor",

    "ceil",

    "round",

    "trunc",

    "sign",

    "sigmoid",

    "negative",

    "reshape_like",

    "zeros_like",

    "ones_like",

    "cos",

    "cosh",

    "sin",

    "sinh",

    "tan",

    "tanh",

    "where",

]

从注释看是因为这些算子的属性转换限制，才单列了这些算子。

get_relay_op函数

函数定义如下：

def get_relay_op(op_name):

    """Get the callable function from Relay based on operator name.

    Parameters

    ----------

    op_name : str

        The Relay operator name.

    """

    if "." in op_name:

        # explicit hierarchical modules

        op = _op

        try:

            for opn in op_name.split("."):

                op = getattr(op, opn)

        except AttributeError:

            op = None

    else:

        # try search op in various modules

        for candidate in (_op, _op.nn, _op.image, _op.vision, _op.contrib):

            op = getattr(candidate, op_name, None)

            if op is not None:

                break

    if not op:

        raise tvm.error.OpNotImplemented("Unable to map op_name {} to relay".format(op_name))

    return op

从注释可知：是基于算子名称获取一个可调用的函数。

getattr(object, name[, default]) -> value

Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y. When a default argument is given, it is returned when the attribute doesn't exist; without it, an exception is raised in that case.

又因为_op：

from .. import op as _op

所以_op是python/tvm/relay/op模块。这个下面有所有的relay算子，并且做了归类，例如nn，image， vision，contrib。

get_relay_op的if分支检查下传入的op_name是不是用点号形式给出，比如relay.op.abs；else分支就到了nn，image， vision，contrib目录下去找是否有名为op_name的算子。

两个分支下，任一找到，都会返回算子的定义接口。所以返回的是跟传入的op_name同名的函数地址。例如op_name为abs时，对应的函数定义（python/tvm/relay/op/tensor.py）：

 def abs(data):

    """Compute element-wise absolute of data.

    Parameters

    ----------

    data : relay.Expr

        The input data

    Returns

    -------

    result : relay.Expr

        The computed result.

    """

    return _make.abs(data)

所以_convert_operator中get_relay_op(op_name)(*inputs, ** attrs)就是调用了_make.abs(*inputs, ** attrs)，_make.abs()执行的是src/relay/op/op_common.h中lambda函数体

#define RELAY_REGISTER_UNARY_OP(OpName)                                        \

  TVM_REGISTER_GLOBAL("relay.op._make." OpName).set_body_typed([](Expr data) { \

    static const Op& op = Op::Get(OpName);                                     \

    return Call(op, {data}, Attrs(), {});                                      \

  });                                                                          \

回到前面，因为onnx的_identity_list表为空，所以算子转换不会走到get_relay_op。

算子转换 -- get_converter()

算子映射表中通过get_converter是这样用的：

"Conv": Conv.get_converter(opset)

其中get_converter()是类OnnxOpConverter的方法，而其他各种算子在tvm/relay/frontend/onnx.py中，定义自己的算子转换类时都是继承了OnnxOpConverter。例如：

class Conv(OnnxOpConverter):

    """Operator converter for Conv."""

    @classmethod

    def _impl_v1(cls, inputs, attr, params):

        # Use shape of input to determine convolution type.

        data = inputs[0]

		......

调用的get_converter方法也就是OnnxOpConverter的。

OnnxOpConverter.get_converter的实现：

	@classmethod

    def get_converter(cls, opset):

        """Get converter matches given opset.

        Parameters

        ----------

        opset: int

            opset from model.

        Returns

        -------

        converter, which should be `_impl_vx`. Number x is the biggest

            number smaller than or equal to opset belongs to all support versions.

        """

        # 当在继承自OnnxOpConverter的各算子转换类调用get_convertver的时候，这里的cls就是子类本身了。

        # dir(cls)是获取子类的属性,

        # for d in dir(cls) if "_impl_v" in d 就是遍历子类的属性,查找名称包含字符串_impl_v的属性和方法.

        #int(d.replace("_impl_v", ""))是将找到的属性或者方法名中_impl_v部分去掉,并将剩余的部分转换为int类型

        versions = [int(d.replace("_impl_v", "")) for d in dir(cls) if "_impl_v" in d]

        # version是一个list,将当前传入的版本号opset加入到version表中,并从小到大排序

        versions = sorted(versions + [opset])

        # 遍历versions表,i为表单元序号,v为对应的单元值.找到所有版本号为opset的单元的下标.

        # 因为表中至少有一个opset, 所以减1就得到的是和opset相等或者仅比opset小的那个版本号的下标.

        # 所以这里就是找到和opset相等或者比opset小但是最接近opset的版本号

        version = versions[max([i for i, v in enumerate(versions) if v == opset]) - 1]

        # 返回该版本的_impl_v方法

        if hasattr(cls, "_impl_v{}".format(version)):

            return getattr(cls, "_impl_v{}".format(version))

        raise NotImplementedError(

            "opset version {} of {} not implemented".format(version, cls.__name__)

        )

因为各算子的转换类定义了多个版本的转换函数，这些函数的函数名都是"_impl_v" + "版本号"的形式。这里get_converter是找到一个最接近但是不高于opset的版本的_impl_v方法，返回该方法的地址，也就是返回一个函数。

算子转换接口_impl_vxx

每个需要转换的算子都有一个或者多个版本的转换接口。我们以卷积算子为例，Conv类支持的_impl_vx方法：

    """Operator converter for Conv."""

    @classmethod

    def _impl_v1(cls, inputs, attr, params):

        # Use shape of input to determine convolution type.

        # 从传入的inputs参数中获取输入和卷积核数据,并推导各自的形状

        data = inputs[0]

        kernel = inputs[1]

        input_shape = infer_shape(data)

        ndim = len(input_shape)

        kernel_type = infer_type(inputs[1])

        kernel_shapes = [get_const_tuple(kernel_type.checked_type.shape)]

        # 如果onnx卷积属性中没有给出卷积核的形状,就使用inputs里面推导出来的形状

        if "kernel_shape" not in attr:

            attr["kernel_shape"] = kernel_shapes[0][2:]

        # 如果onnx卷积算子设置了auto_pad属性

        if "auto_pad" in attr:

            # 对用的tvm卷积算子也使用onnx设置的auto_pad属性值

            attr["auto_pad"] = attr["auto_pad"].decode("utf-8")

            # 根据auto_pad属性值对数据进行填充处理

            if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):

                # Warning: Convolution does not yet support dynamic shapes,

                # one will need to run dynamic_to_static on this model after import

                # 对输入数据进行填充,得到填充后的数据

                data = autopad(

                    data,

                    attr.get("strides", [1] * (ndim - 2)),

                    attr["kernel_shape"],

                    attr.get("dilations", [1] * (ndim - 2)),

                    mode=attr["auto_pad"],

                )

            elif attr["auto_pad"] == "VALID":

                attr["pads"] = [0 for i in range(ndim - 2)]

            elif attr["auto_pad"] == "NOTSET":

                pass

            else:

                msg = 'Value {} in attribute "auto_pad" of operator Conv is invalid.'

                raise tvm.error.OpAttributeInvalid(msg.format(attr["auto_pad"]))

            attr.pop("auto_pad")

        attr["channels"] = kernel_shapes[0][0]

        out = AttrCvt(

            # 返回的op_name是一个函数,返回当前算子对应的tvm算子名称.在AttrCvt.__call__方法中调用该函数，根据当前attr中kernel_shape

            # 属性得到对应的TVM conv1d/conv2d/conv3d算子接口;然后算子接收([data, kernel], attr, params)

            # 参数, 返回转换后的TVM表示out

            op_name=dimension_picker("conv"),

            # 参数转换表

            transforms={

                # 当前属性名 : 转换后的属性名

                "kernel_shape": "kernel_size",

                # 当前属性名 : (转换后的属性名, 转换后的默认值)

                "dilations": ("dilation", 1),

                # 当前属性名 : (转换后的属性名, 转换后的默认值)

                "pads": ("padding", 0),

                # 当前属性名 : (转换后的属性名, 转换后的默认值)

                "group": ("groups", 1),

            },

            custom_check=dimension_constraint(),

        )([data, kernel], attr, params)

        use_bias = len(inputs) == 3

        # 如果输入中有偏置参数,则在表达式中添加偏置运算

        if use_bias:

            out = _op.nn.bias_add(out, inputs[2])

        return out

在_impl_v1中对卷积的输入数据，卷积核参数，以及填充做了初步的处理，然后创建一个AttrCvt实例。传入的参数op_name是一个函数，在AttrCvt.__call__方法中会调用该方法，参数为当前卷积的attr。根据attr中的kernel_shape参数，判断当前是1d/2d/3d卷积，得到对应的tvm算子名称conv1d/conv2d/conv3d；传入的transforms参数，用作AttrCvt.__call__中对当前attr和权重参数转换，会转换为tvm的卷积需要的参数形式；custom_check参数用于检查参数，这里对于卷积来说，是检查当前卷积维度是否合法（1d/2d/3d）。

算子属性转换AttrCvt

AttrCvt.__call__方法大致流程是对参数进行检查，转换，然后调用get_relay_op得到算子对应的tvm接口函数，将当前算子的输入和变换后的参数输入接口，得到onnx node对应的tvm relay ir。

class AttrCvt(object):

    def __init__(

        self,

        op_name,

        transforms=None,

        excludes=None,

        disables=None,

        ignores=None,

        extras=None,

        custom_check=None,

    ):

        # 算子的新名字,op_name可以是一个字符串,也可以是一个返回字符串的函数

        self._op_name = op_name

        # 属性转换表,表项为属性转换字典,形式为"attr_name : new_attr_name",

        # 或者"attr_name : (new_name, default_value, transform function)"

        self._transforms = transforms if transforms else {}

        # 不允许出现的属性集合,如果出现会抛出异常

        self._excludes = excludes if excludes else []

        # 转换后会被disable的属性集合

        self._disables = disables if disables else []

        # 转换过程中会被忽略的属性集合

        self._ignores = ignores if ignores else []

        # 转换后会被额外返回的属性

        self._extras = extras if extras else {}

        # 转换执行的检测函数,返回False会抛出异常

        self._custom_check = custom_check

    def __call__(self, inputs, attrs, *args):

        # 忽略待转换算子的这些属性

        self._ignores.append("_output_shapes")

        self._ignores.append("_input_shapes")

        self._ignores.append("T")

        self._ignores.append("use_cudnn_on_gpu")

        self._ignores.append("_node_name")

        self._ignores.append("is_training")

        self._ignores.append("_target_layout")

        # apply custom check

        # 如果算子转换传入了检测函数,则执行该检测函数

        if self._custom_check:

            func, msg = self._custom_check

            if not func(attrs):

                raise RuntimeError("Check failed: {}".format(msg))

        # get new op_name

        # 得到算子转换后的名字

        if isinstance(self._op_name, str):

            op_name = self._op_name

        else:

            assert callable(self._op_name), "op_name can either be string or callable"

            op_name = self._op_name(attrs)

        # ignore 'tvm_custom' always

        # 忽略tvm_custom属性

        self._ignores.append("tvm_custom")

        # convert attributes

        new_attrs = {}

        # 遍历传入的待转换算子的属性

        for k in attrs.keys():

            # 如果属性在排除表中, 抛出异常

            if k in self._excludes:

                raise NotImplementedError(

                    "Attribute %s in operator %s is not" + " supported.", k, op_name

                )

            # 如果属性是要求disable的,打印debug日志

            if k in self._disables:

                logger.debug("Attribute %s is disabled in relay.sym.%s", k, op_name)

            # 如果属性是要求忽略的,打印debug日志

            elif k in self._ignores:

                if k != "tvm_custom":

                    logger.debug("Attribute %s is ignored in relay.sym.%s", k, op_name)

            # 如果属性在转换表中

            elif k in self._transforms:

                # 从转换表中该属性对应的转换dict,得到属性的新名字,新默认值和转换操作函数

                # 如果转换表中没有给出转换函数,则将转换函数设置为lambda x: x,也就是直接返回参数

                new_name, defaults, transform = self._parse_default(self._transforms[k])

                # 如果没有给出默认值

                if defaults is None:

                    # 那么必须是"attr_name:new_attr_name"形式,获取新属性名

                    new_attr = self._required_attr(attrs, k)

                else:

                    # 从原始的属性表中查找该属性的值,如果没找到,则为新属性为None

                    new_attr = attrs.get(k, None)

                if new_attr is None:

                    # 如果新属性为None,在新的属性表中添加该属性,值为转换表中得到的默认值

                    new_attrs[new_name] = defaults

                else:

                    # 在新的属性表中添加该属性,调用转换函数得到新的属性值

                    new_attrs[new_name] = transform(new_attr)

            else:

                # copy

                # 如果属性不在转换表中,直接原封不动的加入新属性表

                new_attrs[k] = attrs[k]

        # add extras

        # 更新额外的属性

        new_attrs.update(self._extras)

        # 将输入和新属性表传入算子转换接口,返回转换后tvm relay ir

        return get_relay_op(op_name)(*inputs, **new_attrs)

这个类核心就是调用了AttrCvt函数，完成了ONNX卷积算子到Relay 卷积算子的转换。这个转换包含了属性的转换以及根据layout对weights，inputs，outputs进行重排并返回一个Relay 卷积算子。（在tensorflow中倒是看到了对应代码的描述，在onnx模型转换中并没有看到类似的代码）

AttrCvt的调用位于python/tvm/relay/frontend/common.py文件中，根据类注释可知，这个类主要是实现了算子转换，即根据输入的op_name映射到relay的算子。具体过程是：先对传入的attrs进行检查，如有非法属性就报错，如果属性有相应的转换策略就直接转换（即上述代码中的transform），最后调用get_relay_op返回一个TVM Relay卷积算子。

以Conv2d为例，这里get_relay_op(conv2d)将返回nn.conv2d

nn.py中conv2d的实现如下：

def conv2d(

    data,

    weight,

    strides=(1, 1),

    padding=(0, 0),

    dilation=(1, 1),

    groups=1,

    channels=None,

    kernel_size=None,

    data_layout="NCHW",

    kernel_layout="OIHW",

    out_layout="",

    out_dtype="",

):

    r"""2D convolution.

    This operator takes the weight as the convolution kernel

    and convolves it with data to produce an output.

    In the default case, where the data_layout is `NCHW`

    and kernel_layout is `OIHW`, conv2d takes in

    a data Tensor with shape `(batch_size, in_channels, height, width)`,

    and a weight Tensor with shape `(channels, in_channels, kernel_size[0], kernel_size[1])`

    to produce an output Tensor with the following rule:

    .. math::

        \mbox{out}[b, c, y, x] = \sum_{dy, dx, k}

           \mbox{data}[b, k, \mbox{strides}[0] * y  + dy, \mbox{strides}[1] * x + dx] *

           \mbox{weight}[c, k, dy, dx]

    Padding and dilation are applied to data and weight respectively before the computation.

    This operator accepts data layout specification.

    Semantically, the operator will convert the layout to the canonical layout

    (`NCHW` for data and `OIHW` for weight), perform the computation,

    then convert to the out_layout.

    Parameters

    ----------

    data : tvm.relay.Expr

        The input data to the operator.

    weight : tvm.relay.Expr

        The weight expressions.

    strides : Optional[int, Tuple[int]]

        The strides of convolution.

    padding : Optional[int, Tuple[int]]

        The padding of convolution on both sides of inputs before convolution.

    dilation : Optional[int, Tuple[int]]

        Specifies the dilation rate to be used for dilated convolution.

    groups : Optional[int]

        Number of groups for grouped convolution.

    channels : Optional[int]

        Number of output channels of this convolution.

    kernel_size : Optional[int, Tuple[int]]

        The spatial of the convolution kernel.

    data_layout : Optional[str]

        Layout of the input.

    kernel_layout : Optional[str]

        Layout of the weight.

    out_layout : Optional[str]

        Layout of the output, by default, out_layout is the same as data_layout

    out_dtype : Optional[str]

        Specifies the output data type for mixed precision conv2d.

    Returns

    -------

    result : tvm.relay.Expr

        The computed result.

    """

    if isinstance(kernel_size, int):

        kernel_size = (kernel_size, kernel_size)

    if isinstance(strides, int):

        strides = (strides, strides)

    if isinstance(dilation, int):

        dilation = (dilation, dilation)

    # TODO enforce 4-way padding in topi/nn/conv2d after #4644 merged

    # convert 2-way padding to 4-way padding

    padding = get_pad_tuple2d(padding)

    return _make.conv2d(

        data,

        weight,

        strides,

        padding,

        dilation,

        groups,

        channels,

        kernel_size,

        data_layout,

        kernel_layout,

        out_layout,

        out_dtype,

    )

调用关系：conv2d() ->_make.conv2d(),在_make.py中实现了C++类到python类的接口暴露

import tvm._ffi

tvm._ffi._init_api("relay.op.nn._make", __name__)

这里__name__是一个python内置变量，表示当前模块的文件名（不包括.py），即tvm/relay/op/nn/_make。

tvm._ffi模块位于python/tvm/_ffi。函数_init_api的定义在python/tvm/_ffi/registry.py中

def _init_api(namespace, target_module_name=None):

    """Initialize api for a given module name

    namespace : str

       The namespace of the source registry

    target_module_name : str

       The target module name if different from namespace

    """

    target_module_name = target_module_name if target_module_name else namespace

    if namespace.startswith("tvm."):

        _init_api_prefix(target_module_name, namespace[4:])

    else:

        _init_api_prefix(target_module_name, namespace)

这里传入的第一个参数namespace为relay.op.nn._make， target_module_name参数为tvm/relay/op/nn/_make。这样传入_init_api_prefix的两个参数将是 tvm.relay.op.nn._make和relay.op.nn._make。

def _init_api_prefix(module_name, prefix):

    module = sys.modules[module_name]

    for name in list_global_func_names():

        if not name.startswith(prefix):

            continue

        fname = name[len(prefix) + 1 :]

        target_module = module

        if fname.find(".") != -1:

            continue

        f = get_global_func(name)

        ff = _get_api(f)

        ff.__name__ = fname

        ff.__doc__ = "TVM PackedFunc %s. " % fname

        setattr(target_module, ff.__name__, ff)

module = sys.modules[module_name]获取的是tvm.relay.op.nn._make模块的句柄。list_global_func_names()定义在python/tvm/_ffi/registry.py中：

def list_global_func_names():

    """Get list of global functions registered.

    Returns

    -------

    names : list

       List of global functions names.

    """

    plist = ctypes.POINTER(ctypes.c_char_p)()

    size = ctypes.c_uint()

    check_call(_LIB.TVMFuncListGlobalNames(ctypes.byref(size), ctypes.byref(plist)))

    fnames = []

    for i in range(size.value):

        fnames.append(py_str(plist[i]))

    return fnames

接口种通过ctypes方式，调用C++库的TVMFuncListGlobalNames接口，得到的结果字符串数组plist，该数组为所有全局接口的函数名集合。TVMFuncListGlobalNames接口定义在src/runtime/registry.cc中

int TVMFuncListGlobalNames(int* out_size, const char*** out_array) {

  API_BEGIN();

  TVMFuncThreadLocalEntry* ret = TVMFuncThreadLocalStore::Get();

  ret->ret_vec_str = tvm::runtime::Registry::ListNames();

  ret->ret_vec_charp.clear();

  for (size_t i = 0; i < ret->ret_vec_str.size(); ++i) {

    ret->ret_vec_charp.push_back(ret->ret_vec_str[i].c_str());

  }

  *out_array = dmlc::BeginPtr(ret->ret_vec_charp);

  *out_size = static_cast<int>(ret->ret_vec_str.size());

  API_END();

}

函数中调用tvm::runtime::Registry::ListNames()得到函数名表：

std::vector<std::string> Registry::ListNames() {

  Manager* m = Manager::Global();

  std::lock_guard<std::mutex> lock(m->mutex);

  std::vector<std::string> keys;

  keys.reserve(m->fmap.size());

  for (const auto& kv : m->fmap) {

    keys.push_back(kv.first);

  }

  return keys;

}

可以看到，函数名都是从Manager类实例的fmap表的第一个元素。而且Manager还是个单实例类。而fmap的定义：

struct Registry::Manager {

  // map storing the functions.

  // We deliberately used raw pointer.

  // This is because PackedFunc can contain callbacks into the host language (Python) and the

  // resource can become invalid because of indeterministic order of destruction and forking.

  // The resources will only be recycled during program exit.

  std::unordered_map<std::string, Registry*> fmap;

  // mutex

  std::mutex mutex;

  Manager() {}

  static Manager* Global() {

    // We deliberately leak the Manager instance, to avoid leak sanitizers

    // complaining about the entries in Manager::fmap being leaked at program

    // exit.

    static Manager* inst = new Manager();

    return inst;

  }

};

从注释看，这个fmap是一个存储函数的map表。表单元的第一个元素是string类型。

回去继续往下看，_init_api_prefix中的get_global_func：

def get_global_func(name, allow_missing=False):

    return _get_global_func(name, allow_missing)

def _get_global_func(name, allow_missing=False):

    handle = PackedFuncHandle()

    check_call(_LIB.TVMFuncGetGlobal(c_str(name), ctypes.byref(handle)))

    if handle.value:

        return _make_packed_func(handle, False)

    if allow_missing:

        return None

    raise ValueError("Cannot find global function %s" % name)

_get_global_func中使用ctypes方式调用C++库中的TVMFuncGetGlobal函数：

int TVMFuncGetGlobal(const char* name, TVMFunctionHandle* out) {

  API_BEGIN();

  const tvm::runtime::PackedFunc* fp = tvm::runtime::Registry::Get(name);

  if (fp != nullptr) {

    *out = new tvm::runtime::PackedFunc(*fp);  // NOLINT(*)

  } else {

    *out = nullptr;

  }

  API_END();

}

const PackedFunc* Registry::Get(const std::string& name) {

  Manager* m = Manager::Global();

  std::lock_guard<std::mutex> lock(m->mutex);

  auto it = m->fmap.find(name);

  if (it == m->fmap.end()) return nullptr;

  return &(it->second->func_);

}

TVMFuncGetGlobal调用了Registry::Get，从Manager的fmap表中，找到第一个元素为python传入的函数名的单元，从该单元的第二个元素中获取了函数指针。也就是根据函数名获取函数句柄。

搜索下谁在往fmap成员中写数据，可以看到是Registry::Register接口：

Registry& Registry::Register(const std::string& name, bool can_override) {  // NOLINT(*)

  Manager* m = Manager::Global();

  std::lock_guard<std::mutex> lock(m->mutex);

  if (m->fmap.count(name)) {

    ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";

  }

  Registry* r = new Registry();

  r->name_ = name;

  m->fmap[name] = r;

  return *r;

}

可以看到调用Registry::Register接口接口时，如果name在fmap中不存在，就会创建一个Registry实例，加入Manager的fmap表，并返回新建的Registry实例。搜索Registry::Register接口的调用，在include/tvm/runtime/registry.h中有定义

/*!

 * \brief Register a function globally.

 * \code

 *   TVM_REGISTER_GLOBAL("MyPrint")

 *   .set_body([](TVMArgs args, TVMRetValue* rv) {

 *   });

 * \endcode

 */

#define TVM_REGISTER_GLOBAL(OpName) \

  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

这里调用Registry::Register接口，传入的是一个函数名。在代码中搜索TVM_REGISTER_GLOBAL宏的使用会有很多。

继续关注relay.op.nn._make.conv2d的，搜索到src/relay/op/nn/convolution.cc中代码：

conv2d的注册代码如下：



TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d")

    .set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,

                       Array<IndexExpr> dilation, int groups, IndexExpr channels,

                       Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,

                       String out_layout, DataType out_dtype) {

      return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels,

                                   kernel_size, data_layout, kernel_layout, out_layout, out_dtype,

                                   "nn.conv2d");

    });

RELAY_REGISTER_OP("nn.conv2d")

    .describe(R"code(2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved

with the layer input to produce a tensor of outputs.

- **data**: This depends on the `layout` parameter. Input is 4D array of shape

            (batch_size, in_channels, height, width) if `layout` is `NCHW`.

- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])

- **out**:  This depends on the `layout` parameter. Output is 4D array of shape

            (batch_size, channels, out_height, out_width) if `layout` is `NCHW`.

)code" TVM_ADD_FILELINE)

    .set_attrs_type<Conv2DAttrs>()

    .set_num_inputs(2)

    .add_argument("data", "Tensor", "The input tensor.")

    .add_argument("weight", "Tensor", "The weight tensor.")

    .set_support_level(2)

    .add_type_rel("Conv2D", Conv2DRel)

    .set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>)

    .set_attr<TOpPattern>("TOpPattern", kOutEWiseFusable);

TVM_REGISTER_GLOBAL这个宏定义将算子注册到一个全局对象中。可以看一下这个宏定义：

#define TVM_REGISTER_GLOBAL(OpName) \

  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

可以看到注册的实现在Registry类中，这个类有一个Register成员函数，这个函数会通过全局manager来将算子注册进去：

Registry& Registry::Register(const std::string& name, bool can_override) {  // NOLINT(*)

  Manager* m = Manager::Global();

  std::lock_guard<std::mutex> lock(m->mutex);

  if (m->fmap.count(name)) {

    ICHECK(can_override) << "Global PackedFunc " << name << " is already registered";

  }

  Registry* r = new Registry();

  r->name_ = name;

  m->fmap[name] = r;

  return *r;

}

其中set_body将通过MakeConv构建一个conv算子，然后注册到registry中。在MakeConv中，首先根据传入的conv参数，包括strides，kernel，layout等，构建atrrs对象，然后根据op的名字从已经注册过的conv算子中得到conv的算子，然后再将attrs和op一起打包到call类中。即在tvm/src/relay/op/nn/convolution_make.h中的：



template <typename T>

inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,

                     Array<IndexExpr> dilation, int groups, IndexExpr channels,

                     Array<IndexExpr> kernel_size, std::string data_layout,

                     std::string kernel_layout, std::string out_layout, DataType out_dtype,

                     std::string op_name) {

  auto attrs = make_object<T>();

  attrs->strides = std::move(strides);

  attrs->padding = std::move(padding);

  attrs->dilation = std::move(dilation);

  attrs->groups = groups;

  attrs->channels = std::move(channels);

  attrs->kernel_size = std::move(kernel_size);

  attrs->data_layout = std::move(data_layout);

  attrs->kernel_layout = std::move(kernel_layout);

  attrs->out_layout = std::move(out_layout);

  attrs->out_dtype = std::move(out_dtype);

  const Op& op = Op::Get(op_name);

  return Call(op, {data, weight}, Attrs(attrs), {});

}

这里将卷积参数打包，生成一个Op实例，然后生成一个Call实例返回。

Call是继承了Expr类：

class Call : public Expr {

 public:

  /*!

   * \brief The destructor

   */

  ~Call();

  /*!

   * \brief The constructor

   * \param op The operator will be invoked.

   * \param args The arguments of the call.

   * \param attrs The attributes of the call node.

   * \param type_args The type arguments passed to a polymorphic function.

   * \param span The source span of the expression.

   */

  TVM_DLL Call(Expr op, Array<Expr> args, Attrs attrs = Attrs(),

               Array<Type> type_args = Array<Type>(), Span span = Span());

  TVM_DEFINE_OBJECT_REF_METHODS(Call, RelayExpr, CallNode);

  TVM_DEFINE_OBJECT_REF_COW_METHOD(CallNode);

};

Op算子是通过RELAY_REGISTER_OP注册到一个公共AttrRegistry中的。

在一个op类中实际上并没有包含这个op的计算过程，只是纳入了这个算子的输入输出以及属性的信息。

特别注意Relay OP并没有包含具体的计算过程！上面的一系列操作仅仅是拿到了Relay 卷积OP的IR以及输入和属性。那么这个OP的计算过程是在哪里完成的呢？TOPI

定义算子的compute函数

算子的compute函数是算子的计算过程实现。

nn.conv2d的算子算法实现入口为python/tvm/topi/nn/conv2d.py中定义的conv2d函数，调用了同文件中的conv接口，在该接口中实现了compute函数并调用：

def conv(

    inp: te.Tensor,

    filt: te.Tensor,

    stride: Union[int, Sequence[int]],

    padding: Union[int, Sequence[int]],

    dilation: Union[int, Sequence[int]],

    groups: int,

    order: str,

    out_dtype: Union[str, None] = None,

):

    ...

    def compute(*args):

        nn, ff, *dim_indices = list(np.array(args)[permutation_to])

        return te.sum(

            temp.__getitem__(

                tuple(

                    np.array(

                        [nn, ff // (num_filter // groups) * (in_channel // groups) + rc]

                        + [

                            di * stride + r * dil

                            for di, stride, r, dil in zip(dim_indices, strides, rs, dilations)

                        ]

                    )[permutation_from]

                )

            ).astype(out_dtype)

            * filt.__getitem__(tuple(np.array([ff, rc] + rs)[permutation_from_kernel])).astype(

                out_dtype

            ),

            # Schedules depend on reduction axes being in the same order as the

            # layout, so we reorder here.

            axis=np.array([rc, *rs])[permutation_from_reductions].tolist(),

        )

    return te.compute(

        list(np.array([batch, out_channel] + out_dimensions)[permutation_from]),

        compute,

        # tag is expected to be lowercase

        tag=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",

        name=f"{'group_' if groups > 1 else ''}conv{dim}d_{order.lower()}",

    )

这里只是conv2d的默认compute，根据参数和输入数据的排布格式，在 python/tvm/topi/nn/conv2d.py中定义了对应的compute函数。

注册算子的compute函数和schedule

在实现算子的compute函数后，需要将这个compute函数加入relay算子中

在TVM中，这意味着我们不仅仅只是实现计算方法，还要给出对应的调度schedule策略，也就是为compute挑选合适的schedule。例如，当2d卷积是一个分组卷积时，我们会给它分配合适的计算方法和调度。conv2d的shedule定义在python/tvm/topi/generic/nn.py中，以schedule_conv2d_开头的函数定义了各种数据排布格式对应的调度策略，大部分都是使用了默认的调度方法

conv2d的strategy函数conv2d_strategy定义在python/tvm/relay/op/strategy/generic.py中。在该函数中，根据输入数据和卷积核的排布格式，给出各种排布组合的计算方法和调度。compute和schedule的组合即strategy。

这样，relay中就已经增加了我们的算子，便可以通过Relay Call Node来调用它。这一步我们要写一个接口，将参数传入算子，然后返回一个Relay Call Node。这个Node可以加入Relay的语法树。

不支持直接调用 Attrs和参数，所以这里用Op::Get从算子注册表中获取算子信息，作为参数传递给Call Node。nn.conv2d的Relay Call Node生成函数(src/relay/op/nn/convolution_make.h)，即上面的MakeConv函数

当模板参数为Conv2DAttrs的时候，即生成的nn.conv2d的Relay Call Node。这里先是new了一个Conv2DAttrs，接收传入的各参数和属性；然后获取2d卷积注册信息，一并传给Call；最后返回CallNode类型实例的引用

在定义Relay Call Node函数后，我们要向Python注册一个接口来调用这个函数。这里注册是使用TVM_REGISTER_GLOBAL宏。注册后，在Python中就可以用relay.op._make.xxx(...)形式调用了。nn.conv2d的注册，该注册在本文开头

补充：

tvm relay function python调用C++

在Graphproto.from_onnx的最后，使用网络的输入输出和权重参数打包成一个Function实例，然后生成一个IRModule实例：

# 由模型输入, 输出表达式依赖的权重和输出表达式生成一个function

func = _function.Function([v for k, v in self._inputs.items()], outputs)

# 返回表达式和所有权重

return IRModule.from_expr(func), self._params

这两步也都是会调用到C++代码。先看_function.Function的流程：

@tvm._ffi.register_object("relay.Function")

class Function(BaseFunc):

    def __init__(self, params, body, ret_type=None, type_params=None, attrs=None):

        if type_params is None:

            type_params = convert([])

        self.__init_handle_by_constructor__(

            _ffi_api.Function, params, body, ret_type, type_params, attrs

        )

    def __call__(self, *args):

        return Call(self, args, None, None)

__init__函数第二个参数body是函数体，而前面在调用_function.Function时传入的时outputs。这是因为outputs并不是网络或者函数的输出张量，而是输出的计算表达式，而且这个表达式描述的是从输入开始，一步一步的到输出的计算过程，也就是函数实现的所有计算过程了。所以这个outputs就是函数体。

__init__中调用了self.init_handle_by_constructor，参数_ffi_api.Function这种形式在前面算子调用流程中我们已经分析过，_ffi_api引入的是模块，Function是具体的函数，所以我们看下当前目录下的_ffi_api是什么模块，见python/tvm/relay/_ffi_api.py：

import tvm._ffi

tvm._ffi._init_api("relay.ir", __name__)

模块为relay.ir，所以_ffi_api.Function就是relay.ir.Function。

搜索该标记符的注册TVM_REGISTER_GLOBAL("relay.ir.Function")可以看到：

TVM_REGISTER_GLOBAL("relay.ir.Function")

    .set_body_typed([](tvm::Array<Var> params, Expr body, Type ret_type,

                       tvm::Array<TypeVar> ty_params, tvm::DictAttrs attrs) {

      return Function(params, body, ret_type, ty_params, attrs);

    });

也就是调用_ffi_api.Function会在C++端实例化一个Function。

在python的Function类中， _ffi_api.Function是作为参数传给self.__init_handle_by_constructor__，这个方法定义在python/tvm/_ffi/_ctypes/object.py中的基类ObjectBase中，而ObjectBase.__init_handle_by_constructor__调用的是

def __init_handle_by_constructor__(fconstructor, args):

    """Initialize handle by constructor"""

    temp_args = []

    values, tcodes, num_args = _make_tvm_args(args, temp_args)

    ret_val = TVMValue()

    ret_tcode = ctypes.c_int()

    if (

        _LIB.TVMFuncCall(

            fconstructor.handle,

            values,

            tcodes,

            ctypes.c_int(num_args),

            ctypes.byref(ret_val),

            ctypes.byref(ret_tcode),

        )

        != 0

    ):

        raise get_last_ffi_error()

    _ = temp_args

    _ = args

    assert ret_tcode.value == ArgTypeCode.OBJECT_HANDLE

    handle = ret_val.v_handle

    return handle

看下TVMFuncCall的调用链

src/runtime/c_runtime_api.cc:

int TVMFuncCall(TVMFunctionHandle func, TVMValue* args, int* arg_type_codes, int num_args,

                TVMValue* ret_val, int* ret_type_code) {

  API_BEGIN();

  TVMRetValue rv;

  (*static_cast<const PackedFunc*>(func)).CallPacked(TVMArgs(args, arg_type_codes, num_args), &rv);

...

}

include/tvm/runtime/packed_func.h:

inline void PackedFunc::CallPacked(TVMArgs args, TVMRetValue* rv) const { body_(args, rv); }

这里最后调用到的body_就是TVM_REGISTER_GLOBAL("relay.ir.Function").set_typed_body设置的lamabd函数体。

这里比较绕，我们理下：

首先将注册的relay.ir.Function作为参数传给了__init_handle_by_constructor__；
__init_handle_by_constructor__调用了_LIB.TVMFuncCall；
_LIB.TVMFuncCall相当于一个函数执行器，它执行了relay.ir.Function；
relay.ir.Function的函数体被执行时，返回一个C++端的Function对象句柄。

tvm relay op IRModule python调用C++

onnx.py中GraphProto.from_onnx最后return IRModule.from_expr(func), self._params，这个from_expr代码在python/tvm/ir/module.py中：

    def from_expr(expr, functions=None, type_defs=None):

        funcs = functions if functions is not None else {}

        defs = type_defs if type_defs is not None else {}

        return _ffi_api.Module_FromExpr(expr, funcs, defs)

这里直接调用_ffi_api.Module_FromExpr，python/tvm/ir/目录定义的模块名为ir(见python/tvm/ir/_ffi_api.py)，搜索对应的函数注册TVM_REGISTER_GLOBAL("ir.Module_FromExpr")，注册函数执行IRModule::FromExpr，FromExpr调用IRModule::FromExprInContext，生成一个C++端的IRModule实例

参考：

https://zhuanlan.zhihu.com/p/368940120

https://blog.csdn.net/zx_ros/article/details/123526147

https://blog.csdn.net/zx_ros/article/details/122931616

https://blog.csdn.net/zx_ros/article/details/122917673