(原)人体姿态识别PyraNet
转载请注明出处:
https://www.cnblogs.com/darkknightzh/p/12424767.html
论文:
Learning Feature Pyramids for Human Pose Estimation
https://arxiv.org/abs/1708.01101
第三方pytorch代码:
https://github.com/Naman-ntc/Pytorch-Human-Pose-Estimation
1. 整体结构
将hourglass的残差模块改为金字塔残差模块(白框),用于学习输入图像不同尺度的特征。
hourglass见https://www.cnblogs.com/darkknightzh/p/11486185.html。参考代码中的Hourglass内部也使用了PRM模块,而不是原始的Hourglass。
该算法在stacked hourglass的基础上更容易理解。
2. 金字塔残差模块PRM
论文给出了4中PRM(金字塔残差模块)的结构,最终发现PRM-B的效果最好,如下图所示。其中虚线代表同等映射,白色虚框代表该处无上采样或下采样。
3. 下采样
由于pooling下采样速度太快,下采样倍数最低为2,因而论文未使用pool。而是使用了fractional max-pooling的下采样方式,第c层的下采样率(论文中M=1,C=4):
${{s}_{c}}={{2}^{-M\frac{c}{C}}},c=0,\cdots ,C,M\ge 1$
4. 训练及测试
训练阶段和其他姿态估计算法相似,都是估计热图,然后计算真值热图和估计热图的均方误差,如下
$L=\frac{1}{2}\sum\limits_{n=1}^{N}{\sum\limits_{k=1}^{K}{{{\left\| {{\mathbf{S}}_{k}}-{{{\mathbf{\hat{S}}}}_{k}} \right\|}^{2}}}}$
其中N为样本数量,K为关键点的数量(也即热图数量)
测试阶段,使用最后一个hourglass热图最大的score的位置作为关键点。由于该算法为自顶向下的姿态估计算法,输入网络的图像仅有一个人,因而最大score的位置即为对应的关键点。
${{\mathbf{\hat{z}}}_{k}}=\underset{\mathbf{p}}{\mathop{\arg \max }}\,{{\mathbf{\hat{S}}}_{k}}(\mathbf{p}),k=1,L,K$
5. 代码
PyraNet定义如下:
class PyraNet(nn.Module):
"""docstring for PyraNet"""
def __init__(self, nChannels=256, nStack=4, nModules=2, numReductions=4, baseWidth=6, cardinality=30, nJoints=16, inputRes=256):
super(PyraNet, self).__init__()
self.nChannels = nChannels
self.nStack = nStack
self.nModules = nModules
self.numReductions = numReductions
self.baseWidth = baseWidth
self.cardinality = cardinality
self.inputRes = inputRes
self.nJoints = nJoints self.start = M.BnReluConv(3, 64, kernelSize = 7, stride = 2, padding = 3) # BN+ReLU+conv # 先通过两分支(1*1 conv+3*3 conv,1*1 conv+不同尺度特征之和+3*3 conv,这两分支求和,并使用1*1 conv升维),并在输入输出通道相等时,直接返回,否则使用1*1 conv相加
self.res1 = M.ResidualPyramid(64, 128, self.inputRes//2, self.baseWidth, self.cardinality, 0)
self.mp = nn.MaxPool2d(2, 2)
self.res2 = M.ResidualPyramid(128, 128, self.inputRes//4, self.baseWidth, self.cardinality,) # 先通过两分支,并在输入输出通道相等时,直接返回,否则使用1*1 conv相加
self.res3 = M.ResidualPyramid(128, self.nChannels, self.inputRes//4, self.baseWidth, self.cardinality) # 先通过两分支,并在输入输出通道相等时,直接返回,否则使用1*1 conv相加 _hourglass, _Residual, _lin1, _chantojoints, _lin2, _jointstochan = [],[],[],[],[],[] for _ in range(self.nStack): # 堆叠个数
_hourglass.append(PyraNetHourGlass(self.nChannels, self.numReductions, self.nModules, self.inputRes//4, self.baseWidth, self.cardinality))
_ResidualModules = []
for _ in range(self.nModules):
_ResidualModules.append(M.Residual(self.nChannels, self.nChannels)) # 输入和输出相等,只有3*(BN+ReLU+conv)
_ResidualModules = nn.Sequential(*_ResidualModules)
_Residual.append(_ResidualModules)
_lin1.append(M.BnReluConv(self.nChannels, self.nChannels)) # BN+ReLU+conv
_chantojoints.append(nn.Conv2d(self.nChannels, self.nJoints,1)) # 1*1 conv,维度变换
_lin2.append(nn.Conv2d(self.nChannels, self.nChannels,1)) # 1*1 conv,维度变换
_jointstochan.append(nn.Conv2d(self.nJoints,self.nChannels,1)) # 1*1 conv,维度变换 self.hourglass = nn.ModuleList(_hourglass)
self.Residual = nn.ModuleList(_Residual)
self.lin1 = nn.ModuleList(_lin1)
self.chantojoints = nn.ModuleList(_chantojoints)
self.lin2 = nn.ModuleList(_lin2)
self.jointstochan = nn.ModuleList(_jointstochan) def forward(self, x):
x = self.start(x)
x = self.res1(x)
x = self.mp(x)
x = self.res2(x)
x = self.res3(x)
out = [] for i in range(self.nStack):
x1 = self.hourglass[i](x)
x1 = self.Residual[i](x1)
x1 = self.lin1[i](x1)
out.append(self.chantojoints[i](x1))
x1 = self.lin2[i](x1)
x = x + x1 + self.jointstochan[i](out[i]) # 特征求和 return (out)
ResidualPyramid定义如下:
class ResidualPyramid(nn.Module):
"""docstring for ResidualPyramid"""
# 先通过两分支(1*1 conv+3*3 conv,1*1 conv+不同尺度特征之和+3*3 conv,这两分支求和,并使用1*1 conv升维),并在输入输出通道相等时,直接返回,否则使用1*1 conv相加
def __init__(self, inChannels, outChannels, inputRes, baseWidth, cardinality, type = 1):
super(ResidualPyramid, self).__init__()
self.inChannels = inChannels
self.outChannels = outChannels
self.inputRes = inputRes
self.baseWidth = baseWidth
self.cardinality = cardinality
self.type = type
# PyraConvBlock:两分支,一个是1*1 conv+3*3 conv,一个是1*1 conv+不同尺度特征之和+3*3 conv,这两分支求和,并使用1*1 conv升维
self.cb = PyraConvBlock(self.inChannels, self.outChannels, self.inputRes, self.baseWidth, self.cardinality, self.type)
self.skip = SkipLayer(self.inChannels, self.outChannels) # 输入和输出通道相等,则为None,否则为1*1 conv def forward(self, x):
out = 0
out = out + self.cb(x)
out = out + self.skip(x)
return out
PyraConvBlock如下:
class PyraConvBlock(nn.Module):
"""docstring for PyraConvBlock""" # 两分支,一个是1*1 conv+3*3 conv,一个是1*1 conv+不同尺度特征之和+3*3 conv,这两分支求和,并使用1*1 conv升维
def __init__(self, inChannels, outChannels, inputRes, baseWidth, cardinality, type = 1):
super(PyraConvBlock, self).__init__()
self.inChannels = inChannels
self.outChannels = outChannels
self.inputRes = inputRes
self.baseWidth = baseWidth
self.cardinality = cardinality
self.outChannelsby2 = outChannels//2
self.D = self.outChannels // self.baseWidth
self.branch1 = nn.Sequential( # 第一个分支,1*1 conv + 3*3 conv
BnReluConv(self.inChannels, self.outChannelsby2, 1, 1, 0), # BN+ReLU+conv
BnReluConv(self.outChannelsby2, self.outChannelsby2, 3, 1, 1) # BN+ReLU+conv
)
self.branch2 = nn.Sequential( # 第二个分支,1*1 conv + 3*3 conv
BnReluConv(self.inChannels, self.D, 1, 1, 0), # BN+ReLU+conv
BnReluPyra(self.D, self.cardinality, self.inputRes), # BN+ReLU+不同尺度的特征之和
BnReluConv(self.D, self.outChannelsby2, 1, 1, 0) # BN+ReLU+conv
)
self.afteradd = BnReluConv(self.outChannelsby2, self.outChannels, 1, 1, 0) # BN+ReLU+conv def forward(self, x):
x = self.branch2(x) + self.branch1(x) # 两个分支特征之和
x = self.afteradd(x) # 1*1 conv进行升维
return x
BnReluPyra如下
class BnReluPyra(nn.Module):
"""docstring for BnReluPyra""" # BN + ReLU + 不同尺度的特征之和
def __init__(self, D, cardinality, inputRes):
super(BnReluPyra, self).__init__()
self.D = D
self.cardinality = cardinality
self.inputRes = inputRes
self.bn = nn.BatchNorm2d(self.D)
self.relu = nn.ReLU()
self.pyra = Pyramid(self.D, self.cardinality, self.inputRes) # 将不同尺度的特征求和 def forward(self, x):
x = self.bn(x)
x = self.relu(x)
x = self.pyra(x)
return x
Pyramid如下:
class Pyramid(nn.Module):
"""docstring for Pyramid""" # 将不同尺度的特征求和
def __init__(self, D, cardinality, inputRes):
super(Pyramid, self).__init__()
self.D = D
self.cardinality = cardinality # 论文中公式3的C,金字塔层数
self.inputRes = inputRes
self.scale = 2**(-1/self.cardinality) # 金字塔第1层的下采样率,后面层在此基础上+1
_scales = []
for card in range(self.cardinality):
temp = nn.Sequential( # 下采样 + 3*3 conv + 上采样
nn.FractionalMaxPool2d(2, output_ratio = self.scale**(card + 1)), # 每一层在第1层基础上+1的下采样率
nn.Conv2d(self.D, self.D, 3, 1, 1),
nn.Upsample(size = self.inputRes)#, mode='bilinear') # 上采样到输入分辨率
)
_scales.append(temp)
self.scales = nn.ModuleList(_scales) def forward(self, x):
#print(x.shape, self.inputRes)
out = torch.zeros_like(x) # 初始化和输入大小一样的0矩阵
for card in range(self.cardinality):
out += self.scales[card](x) # 将所有尺度的特征求和
return out
PyraNetHourGlass如下:
class PyraNetHourGlass(nn.Module):
"""docstring for PyraNetHourGlass"""
def __init__(self, nChannels=256, numReductions=4, nModules=2, inputRes=256, baseWidth=6, cardinality=30, poolKernel=(2,2), poolStride=(2,2), upSampleKernel=2):
super(PyraNetHourGlass, self).__init__()
self.numReductions = numReductions
self.nModules = nModules
self.nChannels = nChannels
self.poolKernel = poolKernel
self.poolStride = poolStride
self.upSampleKernel = upSampleKernel self.inputRes = inputRes
self.baseWidth = baseWidth
self.cardinality = cardinality """ For the skip connection, a residual module (or sequence of residuaql modules) """
# ResidualPyramid:先通过两分支,并在输入输出通道相等时,直接返回,否则使用1*1 conv相加
# Residual:输入和输出相等,只有3*(BN+ReLU+conv)
Residualskip = M.ResidualPyramid if numReductions > 1 else M.Residual
Residualmain = M.ResidualPyramid if numReductions > 2 else M.Residual
_skip = []
for _ in range(self.nModules): # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
_skip.append(Residualskip(self.nChannels, self.nChannels, self.inputRes, self.baseWidth, self.cardinality))
self.skip = nn.Sequential(*_skip) """ First pooling to go to smaller dimension then pass input through
Residual Module or sequence of Modules then and subsequent cases:
either pass through Hourglass of numReductions-1 or pass through Residual Module or sequence of Modules """
self.mp = nn.MaxPool2d(self.poolKernel, self.poolStride) _afterpool = []
for _ in range(self.nModules): # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
_afterpool.append(Residualmain(self.nChannels, self.nChannels, self.inputRes//2, self.baseWidth, self.cardinality))
self.afterpool = nn.Sequential(*_afterpool) if (numReductions > 1): # 嵌套调用本身
self.hg = PyraNetHourGlass(self.nChannels, self.numReductions-1, self.nModules, self.inputRes//2, self.baseWidth,
self.cardinality, self.poolKernel, self.poolStride, self.upSampleKernel)
else:
_num1res = []
for _ in range(self.nModules): # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
_num1res.append(Residualmain(self.nChannels,self.nChannels, self.inputRes//2, self.baseWidth, self.cardinality))
self.num1res = nn.Sequential(*_num1res) # doesnt seem that important ? """ Now another Residual Module or sequence of Residual Modules """
_lowres = []
for _ in range(self.nModules): # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
_lowres.append(Residualmain(self.nChannels,self.nChannels, self.inputRes//2, self.baseWidth, self.cardinality))
self.lowres = nn.Sequential(*_lowres) """ Upsampling Layer (Can we change this??????) As per Newell's paper upsamping recommended """
self.up = nn.Upsample(scale_factor = self.upSampleKernel) # 将高和宽扩充,实现上采样 def forward(self, x):
out1 = x
out1 = self.skip(out1) # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
out2 = x
out2 = self.mp(out2) # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
out2 = self.afterpool(out2)
if self.numReductions>1:
out2 = self.hg(out2) # 嵌套调用本身
else:
out2 = self.num1res(out2) # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
out2 = self.lowres(out2) # 根据numReductions确定使用金字塔还是3*(BN+ReLU+conv)
out2 = self.up(out2) # 升维 return out2 + out1 # 求和
Residual如下:
class Residual(nn.Module):
"""docstring for Residual""" # 输入和输出相等,只有3*(BN+ReLU+conv);否则输入通过1*1conv结果和3*(BN+ReLU+conv)求和
def __init__(self, inChannels, outChannels, inputRes=None, baseWidth=None, cardinality=None, type=None):
super(Residual, self).__init__()
self.inChannels = inChannels
self.outChannels = outChannels
self.cb = ConvBlock(self.inChannels, self.outChannels) # 3 * (BN+ReLU+conv) 其中第一组降维,第二组不变,第三组升维
self.skip = SkipLayer(self.inChannels, self.outChannels) # 输入和输出通道相等,则为None,否则为1*1 conv def forward(self, x):
out = 0
out = out + self.cb(x)
out = out + self.skip(x)
return out
(原)人体姿态识别PyraNet的更多相关文章
- AR人体姿态识别,实现无边界的人机交互
近年来,AR不断发展,作为一种增强现实技术,给用户带来了虚拟和现实世界的融合体验.但用户已经不满足于单纯地将某件虚拟物品放在现实场景中来感受AR技术,更想用身体姿势来触发某个指令,达到更具真实感的人机 ...
- 牛!Python 也能实现图像姿态识别溺水行为了!
作者 | 李秋键 责编 | Carol 封图 | CSDN 下载自视觉中国 众所周知随着人工智能智能的发展,人工智能的落地项目也在变得越来越多,尤其是计算机视觉方面. 很多人学习python,不知道从 ...
- zz扔掉anchor!真正的CenterNet——Objects as Points论文解读
首发于深度学习那些事 已关注写文章 扔掉anchor!真正的CenterNet——Objects as Points论文解读 OLDPAN 不明觉厉的人工智障程序员 关注他 JustDoIT 等 ...
- Kinect开发资源汇总
Kinect开发资源汇总 转自: http://www.sigvc.org/bbs/forum.php?mod=viewthread&tid=254&highlight=kinec ...
- 如何成为快手尬舞王?HUAWEI HiAI了解一下!
左手!右手!抱一抱!扭一扭! 快手短视频,红遍东西南北中, 给大家的生活增添了不少乐趣. 有了人体姿态识别的魔法表情, 不会跳舞的也都可以跟着跳一跳. 从村口朴实的阿姨,到写字楼里端庄的白领, 在人体 ...
- Convolutional Pose Machines(理解)
0 - 背景 人体姿态识别存在遮挡以及关键点不清晰等主要挑战,然而,人体的关键点之间由于人体结构而具有相互关系,利用容易识别的关键点来指导难以识别关键点的检测,是提高关键点检测的一个思路.本文通过提出 ...
- Learning Feature Pyramids for Human Pose Estimation(理解)
0 - 背景 人体姿态识别是计算机视觉的基础的具有挑战性的任务,其中对于身体部位的尺度变化性是存在的一个显著挑战.虽然金字塔方法广泛应用于解决此类问题,但该方法还是没有很好的被探索,我们设计了一个Py ...
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields(理解)
0 - 人体姿态识别存在的挑战 图像中的个体数量.尺寸大小.位置均未知 个体间接触.遮挡等影响检测 实时性要求较高,传统的自顶向下方法运行时间随着个体数越多而越长 1 - 整体思路 整个模型架构是自底 ...
- 【将门创投】AI 往期技术分享
计算机视觉 1. 嘉宾:商汤科技CEO 徐立 文章回顾:计算机视觉的完整链条,从成像到早期视觉再到识别理解 2. 嘉宾:格灵深瞳CTO 赵勇 文章回顾:计算机视觉在安防.交通.机器人.无人车等领域的应 ...
随机推荐
- P4327 彼得潘框架
题意翻译 “彼得·潘框架”是一种装饰文字,每一个字母都是由一个菱形框架.一个彼得·潘框架看起来像这样 (x是字母,#是框架): ..#.. .#.#. #.X.# .#.#. ..#.. 然而,只是一 ...
- skip-list(跳表)原理及C++代码实现
跳表是一个很有意思的数据结构,它实现简单,但是性能又可以和平衡二叉搜索树差不多. 据MIT公开课上教授的讲解,它的想法和纽约地铁有异曲同工之妙,简而言之就是不断地增加“快线”,从而降低时间复杂度. 当 ...
- HttpClient怎么获取cookie
// 旧版 HttpClient httpClient = new DefaultHttpClient(); // execute get/post/put or whatever httpClien ...
- logback日志大量写磁盘导致微服务不能正常响应的解决方案
最近几天,遇到一个莫名其妙的问题,每天几乎同一时段微服务自己跑着跑着就假死了,过几个小时就又自动恢复了. 通过对定时任务.网卡.内存.磁盘.业务日志的排查分析,只有磁盘的IO在假死前一段时间偏高,经查 ...
- 进程异常行为-反弹Shell攻击,KILL多个进程
进程异常行为-反弹Shell攻击 父进程名称:bash 进程名称:bash 进程名称:/usr/bin/bash 进程id:23,077 命令行参数:sh -c /bin/bash -i >&a ...
- Part-Linux-2
1.cgi #1.创建cgi-bin目录#2.创建hi.json -> {"hi":"hello"}#3.python2 -m CGIHTTPServer ...
- 吴裕雄--天生自然python学习笔记:python设置文档的格式
Win32com 组件可为特定范围的内 容设置格式, 较常用的格式有标题格式.对齐 方式格式及字体格式 . 许多格式使用 常量表示 , 所 以 需先导入 constants常量模块 : 设置标题格式的 ...
- 吴裕雄--天生自然 JAVASCRIPT开发学习:JavaScript 对象 实例
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...
- 前端之css引入方式/长度及颜色单位/常用样式
1.css三种引入方式 <!DOCTYPE html><html><head> <meta charset="UTF-8"> < ...
- 深入JVM内核--常用JVM配置参数
Trace跟踪参数 -verbose:gc -XX:+printGC 可以打印GC的简要信息 [GC 4790K->374K(15872K), 0.0001606 secs] [GC 4790K ...