TVM Pass IR如何使用
TVM Pass IR如何使用
随着Relay / tir中优化遍数的增加,执行并手动维护其依赖关系变得很棘手。引入了一个基础结构来管理优化过程,并应用于TVM堆栈中IR的不同层。
Relay / tir程序的优化可以以各种粒度应用,即分别使用tvm.relay.transform.FunctionPass/ tvm.tir.transform.PrimFuncPass和的功能级别和模块级别tvm.transform.ModulePass。或者,用户可以依靠在tvm.transform.Sequential中继/ tir程序上应用一系列pass,其中pass之间的依赖性可以通过pass下文解决。有关这些pass的每种类型的更多详细信息,请参阅pass基础结构。
本文主要说明开发人员如何使用pass infra进行特定的优化,创建用于Relay程序的优化管道。同样的方法也可以用于tir。
import numpy as np
import tvm
from tvm import te
import tvm.relay as relay
创建一个示例Relay中继程序
首先,创建一个简单的Relay程序。该程序将用于示例的各种优化。类似地,用户可以编写一个tir基本函数并应用tirpass。
def example():
shape = (1, 64, 54, 54)
c_data = np.empty(shape).astype("float32")
c = relay.const(c_data)
weight = relay.var("weight", shape=(64, 64, 3, 3))
x = relay.var("x", relay.TensorType((1, 64, 56, 56), "float32"))
conv = relay.nn.conv2d(x, weight)
y = relay.add(c, c)
y = relay.multiply(y, relay.const(2, "float32"))
y = relay.add(conv, y)
z = relay.add(y, c)
z1 = relay.add(y, c)
z2 = relay.add(z, z1)
return relay.Function([x, weight], z2)
让为conv2d op注册布局更改,以便可以在示例中应用布局更改通道。alter layout pass如何工作不在本文的讨论范围之内。
@relay.op.register_alter_op_layout("nn.conv2d", level=101)
def alter_conv2d(attrs, inputs, tinfos, out_type):
data, weight = inputs
new_attrs = dict(attrs)
new_attrs["data_layout"] = "NCHW16c"
return relay.nn.conv2d(data, weight, **new_attrs)
优化程序
现在要优化程序。Relay中继具有许多优化功能。将选择其中一些以应用于此示例程序。
有多种方法可以优化中继程序。下面将为每个示例提供示例。
手动应用优化pass
# Let's first create a relay Module which contains one or multiple Relay
# functions for optimization.
f = example()
mod = tvm.IRModule.from_expr(f)
# Now we can apply constant folding on the module.
# fold_const here is a callback that doesn't take any parameters.
fold_const = relay.transform.FoldConstant()
# Then, we can invoke the pass on the given module. Note that the constant
# folding pass works at the function-level. That being said, each function in
# the module will be applied with the optimization. Users don't need to iterate
# through individual functions manually to apply this pass.
mod = fold_const(mod)
# We can see from the updated program that the constants are folded.
print(mod)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
可以以类似方式应用更多优化。例如,可以消除z和z1使用的通用表达式。
mod = relay.transform.EliminateCommonSubexpr()(mod)
print(mod)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
一些优化(例如融合)也是参数化的。例如,选择级别0不允许将算子融合在一起。用户可以传递 fuse_opt_level来启用此功能。
mod = relay.transform.FuseOps(fuse_opt_level=0)(mod)
# We can observe that the optimized module contains functions that only have
# a signle primitive op.
print(mod)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%1 = %0(%x, %weight) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = fn (%p01: Tensor[(1, 64, 54, 54), float32], %p11: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p01, %p11) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3 = %2(%1, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%4 = fn (%p02: Tensor[(1, 64, 54, 54), float32], %p12: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p02, %p12) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%5 = %4(%3, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%6 = fn (%p03: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p03, %p03) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%6(%5) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
使用序列来应用pass序列
如上所述,应用pass实际上是乏味的,并且可能需要用户更好地了解依赖性。例如,融合目前不适用于let绑定。如果relay.transform.ToANormalForm()在融合之前应用算子,将无法将融合在一起,因为此过程会为每个表达式生成let绑定,以规范化Relay程序。
Relaytvm.transform.Sequential通过指定每个遍历,将打包为整体来缓解开发人员显式处理这些问题的麻烦。例如,可以使用以下序列样式应用相同遍历。tvm.transform.Sequential,torch.nn.sequential 和mxnet.gluon.block类似。例如,torch.nn.sequential用于包含一系列PyTorch模块,这些模块将被添加,以构建网络,着重于网络层。取而代之的是tvm.transform.Sequential,下面的过程中的基础工作于优化过程。
# Now let's execute some passes through :py:class:`tvm.transform.Sequential`
f = example()
mod = tvm.IRModule.from_expr(f)
# Glob the interested passes.
seq = tvm.transform.Sequential(
[
relay.transform.FoldConstant(),
relay.transform.EliminateCommonSubexpr(),
relay.transform.FuseOps(fuse_opt_level=2),
]
)
mod1 = seq(mod)
print(mod1)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%4 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%4(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
从转换后的Relay程序中,可以看到仍然有两个相同的加法运算。这是因为EliminateCommonSubexpr 未实际执行。默认情况下,只有优化级别小于或等于2的过程才被执行 tvm.transform.Sequential。下面的pass提供了一个配置界面,供用户自定义要执行的优化级别。
with tvm.transform.PassContext(opt_level=3):
mod2 = seq(mod)
print(mod2)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
可以看到仅保留了两个相同的加法之一。
In addition, users can selectively disable some passes using the disabled_pass config, which is similar to the -fno-xxx option used the general purpose compilers, such as Clang and GCC. For example, we can disable EliminateCommonSubexpr as following. The printed module will again show two identical addition operations.
with tvm.transform.PassContext(opt_level=3, disabled_pass=["EliminateCommonSubexpr"]):
mod3 = seq(mod)
print(mod3)
Out:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%4 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%4(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
The passes applied so far are target independent. The pass infra also provides a means to make pass target-aware. For example, the layout alteration pass falls in such category.
with tvm.transform.PassContext(opt_level=3):
mod4 = seq(mod)
print(mod4)
seq1 = tvm.transform.Sequential([relay.transform.AlterOpLayout()])
with tvm.transform.PassContext(opt_level=3):
with tvm.target.Target("llvm"):
mod5 = seq1(mod)
print(mod5)
Out:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = layout_transform(%x, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 56, 56, 16), float32] */;
%1 = nn.conv2d(%0, %weight, padding=[0, 0, 0, 0], data_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%2 = add(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = multiply(%2, 2f /* ty=float32 */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%4 = layout_transform(%3, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%5 = add(%1, %4) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%6 = layout_transform(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%7 = add(%5, %6) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%8 = add(%5, %6) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%9 = add(%7, %8) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
layout_transform(%9, src_layout="NCHW16c", dst_layout="NCHW") /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Implement a Pass Using Python Decorator
下一个示例说明了如何使用Python装饰器,通过传递基础流程来编排定制的优化管道。此功能极大地简化了pass的实施。例如,用户可以简单地定义一个修饰的类,进行功能级别的优化,如以下示例所示。transform_function包装一个类,以用c的倍数替换所有常量。稍后,当调用自定义过程时,将访问给定模块中的每个函数,并且将替换函数中的每个常量。
@relay.transform.function_pass(opt_level=1)
class CustomPipeline:
"""Simple test function to replace one argument to another."""
def __init__(self, multiplier):
self.multiplier = multiplier
# This function can define a pass.
def transform_function(self, func, mod, ctx):
obj = self
class ReplaceConstant(tvm.relay.ExprMutator):
def visit_constant(self, c):
return relay.multiply(obj.multiplier, c)
return ReplaceConstant().visit(func)
f = example()
mod = tvm.IRModule.from_expr(f)
custom_pass = CustomPipeline(multiplier=relay.const(3, "float32"))
assert custom_pass.info.name == "CustomPipeline"
mod3 = custom_pass(mod)
print(mod3)
输出:
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = multiply(3f /* ty=float32 */, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %1) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = multiply(3f /* ty=float32 */, 2f /* ty=float32 */) /* ty=float32 */;
%4 = multiply(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%5 = add(%0, %4) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%6 = add(%5, %1) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%7 = add(%5, %1) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%6, %7) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
调试pass
TVM为用户提供了一种即插即用式的调试通道,该通道在通过特殊通道(PrintIR)来转储整个模块的IR之后,将IR打印出来。序列传递示例的略微修改版本,可能类似于以下内容,以启用IR转储以进行FoldConstant优化。
f = example()
mod = tvm.IRModule.from_expr(f)
seq = tvm.transform.Sequential(
[
relay.transform.FoldConstant(),
relay.transform.EliminateCommonSubexpr(),
relay.transform.AlterOpLayout(),
]
)
# By inserting the ``PrintIR`` pass after ``FoldConstant``, the pass infra will
# dump out the module IR when ``FoldConstant`` is done. Users can plug in this
# pass after any pass they want to debug for viewing the optimization effect.
#
# There is a more flexible debugging mechanism also exposed by the build configuration
# object. One can pass a tracing function which can be used to execute arbitrary code
# before and/or after each pass. A tracing function will receive a :py::class:`tvm.IRModule`,
# a :py:class:`tvm.transform.PassInfo` object,
# and a boolean indicating whether you are executing before, or after a pass.
# An example is below.
def print_ir(mod, info, is_before):
"""Print the name of the pass, the IR, only before passes execute."""
if is_before:
print("Running pass: {}", info)
print(mod)
with tvm.transform.PassContext(opt_level=3, trace=print_ir):
with tvm.target.Target("llvm"):
# Perform the optimizations.
mod = seq(mod)
print(mod)
print("done")
输出:
Running pass: {} The meta data of the pass: pass name: FoldConstantopt_level: 2required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]);
%1 = add(meta[relay.Constant][0], meta[relay.Constant][0]);
%2 = multiply(%1, 2f);
%3 = add(%0, %2);
%4 = add(%3, meta[relay.Constant][0]);
%5 = add(%3, meta[relay.Constant][0]);
add(%4, %5)
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() {
add(meta[relay.Constant][0], meta[relay.Constant][0])
}
Running pass: {} The meta data of the pass: pass name: FuseOpsopt_level: 1required passes: [
InferType, ]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
add(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p0, %p0)
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */)
}
Running pass: {} The meta data of the pass: pass name: ToANormalFormopt_level: 1required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p0, %p0) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
let %x = meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */;
let %x1 = fn (%p0: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
add(%p0, %p0) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
let %x2 = %x1(%x);
%x2
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() {
multiply(meta[relay.Constant][0], 2f)
}
Running pass: {} The meta data of the pass: pass name: FuseOpsopt_level: 1required passes: [
InferType, ]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
multiply(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, 2f /* ty=float32 */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], %p1: float32, Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
multiply(%p0, %p1)
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, 2f /* ty=float32 */)
}
Running pass: {} The meta data of the pass: pass name: ToANormalFormopt_level: 1required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
%0 = fn (%p0: Tensor[(1, 64, 54, 54), float32], %p1: float32, Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
multiply(%p0, %p1) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%0(meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, 2f /* ty=float32 */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main() -> Tensor[(1, 64, 54, 54), float32] {
let %x = meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */;
let %x1 = 2f /* ty=float32 */;
let %x2 = fn (%p0: Tensor[(1, 64, 54, 54), float32], %p1: float32, Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
multiply(%p0, %p1) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
let %x3 = %x2(%x, %x1);
%x3
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]);
%1 = add(%0, meta[relay.Constant][0]);
%2 = add(%1, meta[relay.Constant][1]);
%3 = add(%1, meta[relay.Constant][1]);
add(%2, %3)
}
Running pass: {} The meta data of the pass: pass name: PrintIRopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: EliminateCommonSubexpropt_level: 3required passes: [
InferType, ]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%3 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %3) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2)
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: FuseOpsopt_level: 1required passes: [
InferType, ]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%x, %weight, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]);
%1 = add(%0, %p2);
%2 = add(%1, %p3);
add(%2, %2)
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */)
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: AlterOpLayoutopt_level: 3required passes: [
InferType, ]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%3 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = nn.conv2d(%p0, %p1, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%1 = add(%0, %p2) /* ty=Tensor[(1, 64, 54, 54), float32] */;
%2 = add(%1, %p3) /* ty=Tensor[(1, 64, 54, 54), float32] */;
add(%2, %2) /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%3(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
Running pass: {} The meta data of the pass: pass name: InferTypeopt_level: 0required passes: [
]
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%7 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = layout_transform(%p0, src_layout="NCHW", dst_layout="NCHW16c");
%1 = nn.conv2d(%0, %p1, padding=[0, 0, 0, 0], data_layout="NCHW16c");
%2 = layout_transform(%p2, src_layout="NCHW", dst_layout="NCHW16c");
%3 = add(%1, %2);
%4 = layout_transform(%p3, src_layout="NCHW", dst_layout="NCHW16c");
%5 = add(%3, %4);
%6 = add(%5, %5);
layout_transform(%6, src_layout="NCHW16c", dst_layout="NCHW")
};
%7(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */)
}
def @main(%x: Tensor[(1, 64, 56, 56), float32], %weight: Tensor[(64, 64, 3, 3), float32]) -> Tensor[(1, 64, 54, 54), float32] {
%7 = fn (%p0: Tensor[(1, 64, 56, 56), float32], %p1: Tensor[(64, 64, 3, 3), float32], %p2: Tensor[(1, 64, 54, 54), float32], %p3: Tensor[(1, 64, 54, 54), float32], Primitive=1) -> Tensor[(1, 64, 54, 54), float32] {
%0 = layout_transform(%p0, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 56, 56, 16), float32] */;
%1 = nn.conv2d(%0, %p1, padding=[0, 0, 0, 0], data_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%2 = layout_transform(%p2, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%3 = add(%1, %2) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%4 = layout_transform(%p3, src_layout="NCHW", dst_layout="NCHW16c") /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%5 = add(%3, %4) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
%6 = add(%5, %5) /* ty=Tensor[(1, 4, 54, 54, 16), float32] */;
layout_transform(%6, src_layout="NCHW16c", dst_layout="NCHW") /* ty=Tensor[(1, 64, 54, 54), float32] */
};
%7(%x, %weight, meta[relay.Constant][0] /* ty=Tensor[(1, 64, 54, 54), float32] */, meta[relay.Constant][1] /* ty=Tensor[(1, 64, 54, 54), float32] */) /* ty=Tensor[(1, 64, 54, 54), float32] */
}
done
概括
本文介绍了如何使用pass基础,更加方便地在TVM中编写和调用pass。还讨论了调用pass的不同方法。使用tvm.transform.Sequential,可以极大地帮助用户简化处理多个优化过程及其依赖项的工作。另外,提供了一个示例来说明如何使用PrintIR和跟踪调试过程。
TVM Pass IR如何使用的更多相关文章
- 如何使用TVM Pass红外线
如何使用TVM Pass红外线 随着Relay / tir中优化遍数的增加,执行并手动维护其依赖关系变得很棘手.引入了一个基础结构来管理优化过程,将其应用于TVM堆栈中IR的不同层. Relay / ...
- rac one node在线relocation
1.查看数据库运行状态 $ srvctl status database -d rone Instance rone_2 is running on node rone2 Online relocat ...
- 转://ORA-00603,ORA-27501,ORA-27300,ORA-27301,ORA-27302故障案例一则
背景介绍: 这是一套windows的rac系统.数据库后台日志报ORA-00474:SMON process terminated with error.接着报ORA-00603,ORA-27501, ...
- oracle 错误实例分析(ORA-01078)
01,问题描述 心血来潮想看一下启动数据库的alert log.然后把数据库给关闭了,同时也在监听日志文件 下面可谓是详细的描述了整个关机过程,也看到了无数的error [root@node1 ...
- 【体系结构】有关Oracle SCN知识点的整理
[体系结构]有关Oracle SCN知识点的整理 1 BLOG文档结构图 BLOG_Oracle_lhr_Oracle SCN的一点研究.pdf 2 前言部分 2.1 导读和注意事项 各位技 ...
- nbu还原集群数据库异常问题
集群数据库软件均已安装完毕,现在想从NBU上还原数据库,但在还原控制文件报错 [oracle@oracle-db1 ~]$ rman target / Recovery Manager: Releas ...
- oracle数据库启动报错,不能启动ASM实例
数据库rac启动时报错,日志例如以下,后来使用 Sat Jun 7 06:02:11 2014 GATHER_STATS_JOB encountered errors. Check the tra ...
- TVM:一个端到端的用于开发深度学习负载以适应多种硬件平台的IR栈
TVM:一个端到端的用于开发深度学习负载以适应多种硬件平台的IR栈 本文对TVM的论文进行了翻译整理 深度学习如今无处不在且必不可少.这次创新部分得益于可扩展的深度学习系统,比如 TensorFlo ...
- 用TVM在硬件平台上部署深度学习工作负载的端到端 IR 堆栈
用TVM在硬件平台上部署深度学习工作负载的端到端 IR 堆栈 深度学习已变得无处不在,不可或缺.这场革命的一部分是由可扩展的深度学习系统推动的,如滕索弗洛.MXNet.咖啡和皮托奇.大多数现有系统针对 ...
随机推荐
- 03- 手机App功能测试要点以及登录页面的测试
当你进入一个互联网公司以后,首先对公司结构有所了解,然后遇到问题找对应的工作人员,效率就事半功倍了. 公司的结构: 产品经理 项目经理 设计师 开发人员 测试人员 运维人员 运营人员 配置管理 App ...
- 菜鸟教程jsonp基础知识讲解
jsonp是什么? Jsonp(JSON with Padding) 是 json 的一种"使用模式",可以让网页从别的域名(网站)那获取资料,即跨域读取数据. 为什么我们从不同的 ...
- hdu3336 KMP + DP 前缀数组出现的次数
题意: 给你一个串,问你他的所有前缀子串在本串中的出现次数,注释:abc的前缀子串是 a ab abc; 思路: 还是利用了next数组,先对子串求出next数组,再开一个数组d ...
- poj2112 二分最大流+Floyd
题意: 一个农场主有一些奶牛,和一些机器,每台机器有自己的服务上限,就是一天最多能给多少头奶牛挤奶,给你任意两点的距离,问你让所有的奶牛都被挤奶时,奶牛于机器最远距离的最近是多少. 思路: ...
- POJ1376简单广搜
题意: 给你一个n*m的矩阵,然后给你机器人的起点和终点,还有起点的方向,然后每次机器人有两种操作,左右旋转90度,或者是朝着原来的方向走1,2或者3步,机器人再走的过程中不能碰到格子,也 ...
- Java并发容器篇
作者:汤圆 个人博客:javalover.cc 前言 断断续续一个多月,也写了十几篇原创文章,感觉真的很不一样: 不能说技术有很大的进步,但是想法确实跟以前有所不同: 还没开始的时候,想着要学的东西太 ...
- 将一个eclipse的SSM项目用IDEA打开并运行
项目部署 将一个eclipse项目用idea打开,并且 部署到tomcat中 .或者你tomcat部署成功,但是启动就是404,下面的步骤就要更认真看了 项目配置 打开idea,Import Proj ...
- 匿名函数lambda / map()方法
lambda一般配合其他方法使用,一般使用在只用过一次就不用的函数,那就没必要特意去定义 lambda能支持的最复杂的语句就是三元运算 例如: lambda x,y: x*y if x < y ...
- Nginx导航
简介 最近都在弄微服务的东西,现在来记录下收获.我从一知半解到现在能从0搭建使用最大的感触有两点 1.微服务各大组件的版本很多,网上很多博客内容不一定适合你的版本,很多时候苦苦琢磨都是无用功 2.网上 ...
- [bug] flink on yarn 启动失败
参考 https://www.cnblogs.com/huangguoming/p/11732663.html