TVM Pass IR如何使用
TVM Pass IR如何使用
随着Relay / tir中优化遍数的增加,执行并手动维护其依赖关系变得很棘手。引入了一个基础结构来管理优化过程,并应用于TVM堆栈中IR的不同层。
Relay / tir程序的优化可以以各种粒度应用,即分别使用tvm.relay.transform.FunctionPass/ tvm.tir.transform.PrimFuncPass和的功能级别和模块级别tvm.transform.ModulePass。或者,用户可以依靠在tvm.transform.Sequential中继/ tir程序上应用一系列pass,其中pass之间的依赖性可以通过pass下文解决。有关这些pass的每种类型的更多详细信息,请参阅pass基础结构。
本文主要说明开发人员如何使用pass infra进行特定的优化,创建用于Relay程序的优化管道。同样的方法也可以用于tir。
import numpy as np
import tvm
from tvm import te
import tvm.relay as relay
创建一个示例Relay中继程序
首先,创建一个简单的Relay程序。该程序将用于示例的各种优化。类似地,用户可以编写一个tir基本函数并应用tirpass。
def example():
shape = (1, 64, 54, 54)
c_data = np.empty(shape).astype("float32")
c = relay.const(c_data)
weight = relay.var("weight", shape=(64, 64, 3, 3))
x = relay.var("x", relay.TensorType((1, 64, 56, 56), "float32"))
conv = relay.nn.conv2d(x, weight)
y = relay.add(c, c)
y = relay.multiply(y, relay.const(2, "float32"))
y = relay.add(conv, y)
z = relay.add(y, c)
z1 = relay.add(y, c)
z2 = relay.add(z, z1)
return relay.Function([x, weight], z2)
让为conv2d op注册布局更改,以便可以在示例中应用布局更改通道。alter layout pass如何工作不在本文的讨论范围之内。
@relay.op.register_alter_op_layout("nn.conv2d", level=101)
def alter_conv2d(attrs, inputs, tinfos, out_type):
data, weight = inputs
new_attrs = dict(attrs)
new_attrs["data_layout"] = "NCHW16c"
return relay.nn.conv2d(data, weight, **new_attrs)
优化程序
现在要优化程序。Relay中继具有许多优化功能。将选择其中一些以应用于此示例程序。
有多种方法可以优化中继程序。下面将为每个示例提供示例。
手动应用优化pass
# Let's first create a relay Module which contains one or multiple Relay
# functions for optimization.
f = example()
mod = tvm.IRModule.from_expr(f)
# Now we can apply constant folding on the module.
# fold_const here is a callback that doesn't take any parameters.
fold_const = relay.transform.FoldConstant()
# Then, we can invoke the pass on the given module. Note that the constant
# folding pass works at the function-level. That being said, each function in
# the module will be applied with the optimization. Users don't need to iterate
# through individual functions manually to apply this pass.
mod = fold_const(mod)
# We can see from the updated program that the constants are folded.
print(mod)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
可以以类似方式应用更多优化。例如,可以消除z和z1使用的通用表达式。
mod = relay.transform.EliminateCommonSubexpr()(mod)
print(mod)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
一些优化(例如融合)也是参数化的。例如,选择级别0不允许将算子融合在一起。用户可以传递 fuse_opt_level来启用此功能。
mod = relay.transform.FuseOps(fuse_opt_level=0)(mod)
# We can observe that the optimized module contains functions that only have
# a signle primitive op.
print(mod)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%1 = %0(%x, %weight) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = fn (%p01: Tensor[(1, 64, 54, 54), float32], %p11: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p01, %p11) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3 = %2(%1, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%4 = fn (%p02: Tensor[(1, 64, 54, 54), float32], %p12: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p02, %p12) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%5 = %4(%3, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%6 = fn (%p03: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p03, %p03) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%6(%5) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
使用序列来应用pass序列
如上所述,应用pass实际上是乏味的,并且可能需要用户更好地了解依赖性。例如,融合目前不适用于let绑定。如果relay.transform.ToANormalForm()在融合之前应用算子,将无法将融合在一起,因为此过程会为每个表达式生成let绑定,以规范化Relay程序。
Relaytvm.transform.Sequential通过指定每个遍历,将打包为整体来缓解开发人员显式处理这些问题的麻烦。例如,可以使用以下序列样式应用相同遍历。tvm.transform.Sequential,torch.nn.sequential 和mxnet.gluon.block类似。例如,torch.nn.sequential用于包含一系列PyTorch模块,这些模块将被添加,以构建网络,着重于网络层。取而代之的是tvm.transform.Sequential,下面的过程中的基础工作于优化过程。
# Now let's execute some passes through :py:class:`tvm.transform.Sequential`
f = example()
mod = tvm.IRModule.from_expr(f)
# Glob the interested passes.
seq = tvm.transform.Sequential(
[
relay.transform.FoldConstant(),
relay.transform.EliminateCommonSubexpr(),
relay.transform.FuseOps(fuse_opt_level=2),
]
)
mod1 = seq(mod)
print(mod1)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%4 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%4(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
从转换后的Relay程序中,可以看到仍然有两个相同的加法运算。这是因为EliminateCommonSubexpr 未实际执行。默认情况下,只有优化级别小于或等于2的过程才被执行 tvm.transform.Sequential。下面的pass提供了一个配置界面,供用户自定义要执行的优化级别。
with tvm.transform.PassContext(opt_level=3):
mod2 = seq(mod)
print(mod2)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
可以看到仅保留了两个相同的加法之一。
In addition, users can selectively disable some passes using the disabled_pass config, which is similar to the -fno-xxx option used the general purpose compilers, such as Clang and GCC. For example, we can disable EliminateCommonSubexpr as following. The printed module will again show two identical addition operations.
with tvm.transform.PassContext(opt_level=3, disabled_pass=["EliminateCommonSubexpr"]):
mod3 = seq(mod)
print(mod3)
Out:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%4 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%4(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
The passes applied so far are target independent. The pass infra also provides a means to make pass target-aware. For example, the layout alteration pass falls in such category.
with tvm.transform.PassContext(opt_level=3):
mod4 = seq(mod)
print(mod4)
seq1 = tvm.transform.Sequential([relay.transform.AlterOpLayout()])
with tvm.transform.PassContext(opt_level=3):
with tvm.target.Target("llvm"):
mod5 = seq1(mod)
print(mod5)
Out:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = layout_transform(%x, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 56, 56, 16), float32] */;
%1 = nn.conv2d(%0, %weight, padding=[0, 0, 0, 0], data_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%2 = add(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = multiply(%2, 2f /* ty=float32 */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%4 = layout_transform(%3, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%5 = add(%1, %4) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%6 = layout_transform(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%7 = add(%5, %6) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%8 = add(%5, %6) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
layout_transform(%9, src_layout="NCHW16c", dst_layout="NCHW") /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Implement a Pass Using Python Decorator
下一个示例说明了如何使用Python装饰器,通过传递基础流程来编排定制的优化管道。此功能极大地简化了pass的实施。例如,用户可以简单地定义一个修饰的类,进行功能级别的优化,如以下示例所示。transform_function包装一个类,以用c的倍数替换所有常量。稍后,当调用自定义过程时,将访问给定模块中的每个函数,并且将替换函数中的每个常量。
@relay.transform.function_pass(opt_level=1)
class CustomPipeline:
"""Simple test function to replace one argument to another."""
def __init__(self, multiplier):
self.multiplier = multiplier
# This function can define a pass.
def transform_function(self, func, mod, ctx):
obj = self
class ReplaceConstant(tvm.relay.ExprMutator):
def visit_constant(self, c):
return relay.multiply(obj.multiplier, c)
return ReplaceConstant().visit(func)
f = example()
mod = tvm.IRModule.from_expr(f)
custom_pass = CustomPipeline(multiplier=relay.const(3, "float32"))
assert custom_pass.info.name == "CustomPipeline"
mod3 = custom_pass(mod)
print(mod3)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = multiply(3f /* ty=float32 */, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %1) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = multiply(3f /* ty=float32 */, 2f /* ty=float32 */) /* ty=float32 */;
%4 = multiply(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%5 = add(%0, %4) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%6 = add(%5, %1) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%7 = add(%5, %1) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%6, %7) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
调试pass
TVM为用户提供了一种即插即用式的调试通道,该通道在通过特殊通道(PrintIR)来转储整个模块的IR之后,将IR打印出来。序列传递示例的略微修改版本,可能类似于以下内容,以启用IR转储以进行FoldConstant优化。
f = example()
mod = tvm.IRModule.from_expr(f)
seq = tvm.transform.Sequential(
[
relay.transform.FoldConstant(),
relay.transform.EliminateCommonSubexpr(),
relay.transform.AlterOpLayout(),
]
)
# By inserting the ``PrintIR`` pass after ``FoldConstant``, the pass infra will
# dump out the module IR when ``FoldConstant`` is done. Users can plug in this
# pass after any pass they want to debug for viewing the optimization effect.
#
# There is a more flexible debugging mechanism also exposed by the build configuration
# object. One can pass a tracing function which can be used to execute arbitrary code
# before and/or after each pass. A tracing function will receive a :py::class:`tvm.IRModule`,
# a :py:class:`tvm.transform.PassInfo` object,
# and a boolean indicating whether you are executing before, or after a pass.
# An example is below.
def print_ir(mod, info, is_before):
"""Print the name of the pass, the IR, only before passes execute."""
if is_before:
print("Running pass: {}", info)
print(mod)
with tvm.transform.PassContext(opt_level=3, trace=print_ir):
with tvm.target.Target("llvm"):
# Perform the optimizations.
mod = seq(mod)
print(mod)
print("done")
输出:
Running pass: {} The meta data of the pass: pass name: FoldConstantopt_level: 2required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]);
%1 = add(meta[relay.Constant][0], meta[relay.Constant][0]);
%2 = multiply(%1, 2f);
%3 = add(%0, %2);
%4 = add(%3, meta[relay.Constant][0]);
%5 = add(%3, meta[relay.Constant][0]);
add(%4, %5)
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() {
add(meta[relay.Constant][0], meta[relay.Constant][0])
}
Running pass: {} The meta data of the pass: pass name: FuseOpsopt_level: 1required passes: [
InferType, ]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
add(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p0, %p0)
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */)
}
Running pass: {} The meta data of the pass: pass name: ToANormalFormopt_level: 1required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p0, %p0) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
let %x = meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */;
let %x1 = fn (%p0: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p0, %p0) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
let %x2 = %x1(%x);
%x2
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() {
multiply(meta[relay.Constant][0], 2f)
}
Running pass: {} The meta data of the pass: pass name: FuseOpsopt_level: 1required passes: [
InferType, ]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
multiply(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, 2f /* ty=float32 */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], %p1: float32, Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
multiply(%p0, %p1)
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, 2f /* ty=float32 */)
}
Running pass: {} The meta data of the pass: pass name: ToANormalFormopt_level: 1required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], %p1: float32, Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
multiply(%p0, %p1) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, 2f /* ty=float32 */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
let %x = meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */;
let %x1 = 2f /* ty=float32 */;
let %x2 = fn (%p0: Tensor[(1, 64, 54, 54), float32], %p1: float32, Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
multiply(%p0, %p1) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
let %x3 = %x2(%x, %x1);
%x3
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]);
%1 = add(%0, meta[relay.Constant][0]);
%2 = add(%1, meta[relay.Constant][1]);
%3 = add(%1, meta[relay.Constant][1]);
add(%2, %3)
}
Running pass: {} The meta data of the pass: pass name: PrintIRopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: EliminateCommonSubexpropt_level: 3required passes: [
InferType, ]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2)
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: FuseOpsopt_level: 1required passes: [
InferType, ]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]);
%1 = add(%0, %p2);
%2 = add(%1, %p3);
add(%2, %2)
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */)
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: AlterOpLayoutopt_level: 3required passes: [
InferType, ]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%7 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = layout_transform(%p0, src_layout="NCHW", dst_layout="NCHW16c");
%1 = nn.conv2d(%0, %p1, padding=[0, 0, 0, 0], data_layout="NCHW16c");
%2 = layout_transform(%p2, src_layout="NCHW", dst_layout="NCHW16c");
%3 = add(%1, %2);
%4 = layout_transform(%p3, src_layout="NCHW", dst_layout="NCHW16c");
%5 = add(%3, %4);
%6 = add(%5, %5);
layout_transform(%6, src_layout="NCHW16c", dst_layout="NCHW")
};
%7(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */)
}
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%7 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = layout_transform(%p0, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 56, 56, 16), float32] */;
%1 = nn.conv2d(%0, %p1, padding=[0, 0, 0, 0], data_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%2 = layout_transform(%p2, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%3 = add(%1, %2) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%4 = layout_transform(%p3, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%5 = add(%3, %4) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%6 = add(%5, %5) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
layout_transform(%6, src_layout="NCHW16c", dst_layout="NCHW") /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%7(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
done
概括
本文介绍了如何使用pass基础,更加方便地在TVM中编写和调用pass。还讨论了调用pass的不同方法。使用tvm.transform.Sequential,可以极大地帮助用户简化处理多个优化过程及其依赖项的工作。另外,提供了一个示例来说明如何使用PrintIR和跟踪调试过程。
TVM Pass IR如何使用的更多相关文章
- 如何使用TVM Pass红外线
如何使用TVM Pass红外线 随着Relay / tir中优化遍数的增加,执行并手动维护其依赖关系变得很棘手.引入了一个基础结构来管理优化过程,将其应用于TVM堆栈中IR的不同层. Relay / ...
- rac one node在线relocation
1.查看数据库运行状态 $ srvctl status database -d rone Instance rone_2 is running on node rone2 Online relocat ...
- 转://ORA-00603,ORA-27501,ORA-27300,ORA-27301,ORA-27302故障案例一则
背景介绍: 这是一套windows的rac系统.数据库后台日志报ORA-00474:SMON process terminated with error.接着报ORA-00603,ORA-27501, ...
- oracle 错误实例分析(ORA-01078)
01,问题描述 心血来潮想看一下启动数据库的alert log.然后把数据库给关闭了,同时也在监听日志文件 下面可谓是详细的描述了整个关机过程,也看到了无数的error [root@node1 ...
- 【体系结构】有关Oracle SCN知识点的整理
[体系结构]有关Oracle SCN知识点的整理 1 BLOG文档结构图 BLOG_Oracle_lhr_Oracle SCN的一点研究.pdf 2 前言部分 2.1 导读和注意事项 各位技 ...
- nbu还原集群数据库异常问题
集群数据库软件均已安装完毕,现在想从NBU上还原数据库,但在还原控制文件报错 [oracle@oracle-db1 ~]$ rman target / Recovery Manager: Releas ...
- oracle数据库启动报错,不能启动ASM实例
数据库rac启动时报错,日志例如以下,后来使用 Sat Jun 7 06:02:11 2014 GATHER_STATS_JOB encountered errors. Check the tra ...
- TVM:一个端到端的用于开发深度学习负载以适应多种硬件平台的IR栈
TVM:一个端到端的用于开发深度学习负载以适应多种硬件平台的IR栈 本文对TVM的论文进行了翻译整理 深度学习如今无处不在且必不可少.这次创新部分得益于可扩展的深度学习系统,比如 TensorFlo ...
- 用TVM在硬件平台上部署深度学习工作负载的端到端 IR 堆栈
用TVM在硬件平台上部署深度学习工作负载的端到端 IR 堆栈 深度学习已变得无处不在,不可或缺.这场革命的一部分是由可扩展的深度学习系统推动的,如滕索弗洛.MXNet.咖啡和皮托奇.大多数现有系统针对 ...
随机推荐
- composer 使用提示
1.使用国内镜像[推荐] composer config -g repo.packagist composer https://mirrors.aliyun.com/composer/ 或者 comp ...
- display:table-cell,inline-block 之间的缝隙
1.dispaly:table-call ,会让你对元素设置的为百分百的宽度失效,设置正常数值宽高时, vertical-align: middle;可让内部元素垂直居中,但如果加上position: ...
- hdu2056 矩形重叠面积(水题)
题意: 给你两个矩形,问你他们的重叠面积是多少. 思路: 这两个矩形是平行x和y轴的,所以水题,不解释. #include<stdio.h> typedef stru ...
- 利用cm压缩包手动安装cm和cdh
安装准备: 1.操作系统为centos6.9 CentOS-6.9-x86_64-bin-DVD1to2 2.安装Oracle JDK (1.8u121) 下载jdk-8u121-linux-x64. ...
- 03.28,周六,12:00-17:00,ICPC训练联盟周赛,选用试题:UCF Local Programming Contest 2016正式赛。
A. Majestic 10 题意:三个数均大于10则输出"triple-double",如果两个数大于10则输出"double-double",如果一个大于1 ...
- DOM 绑定事件
// 1.获取事件源 var oDiv = document.getElementById('box'); console.log(oDiv); //2.事件 (1)直接绑定匿名函数 oDiv.onc ...
- 面试侃集合 | LinkedBlockingQueue篇
面试官:好了,聊完了ArrayBlockingQueue,我们接着说说LinkedBlockingQueue吧 Hydra:还真是不给人喘口气的机会,LinkedBlockingQueue是一个基于链 ...
- linux服务器默认使用中文字符集zh_CN.UTF-8
linux服务器默认使用中文字符集zh_CN.UTF-8 一.问题描述和相关概念 linux服务器的字符集设置可能影响到网站页面出现 "???" 等问号乱码,还有可能导致文件中的汉 ...
- Linux创建RAID0_实战
Linux创建RAID实战 一.Linux创建RAID0 RAID0俗称条带,它将两个或多个硬盘组成一个逻辑硬盘,容量是所有硬盘之和 因为是多个硬盘组合成一个,故可并行写操作,写入速度提高,但此方式硬 ...
- Linux软件安装管理之——dpkg与apt-*详解 apt命令(dpkg和apt代替rpm)
apt list *python* Nosee123关注赞赏支持 Linux软件安装管理之--dpkg与apt-*详解 [Linux软件安装管理系列]- - 传送门: - -<Linux软件 ...