IROS2017: Voxblox & RAL2019: Voxblox++

Status: Finished

Type: RAL

Year: 2019

组织/Sensor: ETH-ASL

参考与前言

此文档涵盖了两篇内容，从2017年IROS的voxblox到2019年RAL的voxblox++，但是主要重点在voxblox哈~

论文链接：https://arxiv.org/abs/1611.03631 and https://arxiv.org/abs/1903.00268

代码链接：https://github.com/ethz-asl/voxblox and https://github.com/ethz-asl/voxblox-plusplus

voxblox的文档链接：https://voxblox.readthedocs.io/en/latest/index.html

后续关于语义的也可以看看，同ETH-ASL这篇：

ICRA2022: Panoptic Multi-TSDFs: a Flexible Representation for Online Multi-resolution Volumetric Mapping and Long-term Dynamic Scene Consistency

voxblox实现了三种不同形式的积分策略：

Fast

Merged 应对大场景，将多个voxels捆绑在一起进行投影

Simple 直接遍历的操作，很淳朴简单哈

与Octomap相比运行时间 Octomap对每个voxel都进行映射，但是voxblox面对大规模场景时候可以(使用Merged策略) 对voxel进行捆绑映射，能够在节省运行时间同时精度不产生明显下降。版权声明：本文为CSDN博主「憨憨2号」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。原文链接：https://blog.csdn.net/qq_45401419/article/details/125125993

【语义地图】voxblox++ :Volumetric Semantic Mapping

使用一个在线的已经定位了的RGBD摄像机的扫描，能够增量式构建volumetric object-level 的地图。

使用一个帧帧分割框架 + instance-aware 的语义预测的无监督几何方法来同时检测已经识别过得场景元素和之前没有见过的物体

data-association：在不同的帧之间追踪已经预测到的物体实例

一个地图整合策略把关于它们的3D形状，位置，以及语义信息融合进入一个全局栅格地图（global volume）

原文链接：https://zhuanlan.zhihu.com/p/117665107

前提知识：

ESDFs (Euclidean Signed Distance Fields) are a voxel grid where every point contains its Euclidean distance to the nearest obstacle
TSDFs (Truncated Signed Distance Fields) use projective distance, which is the distance along the sensor ray to the measured surface, and calculate these distances only within a short truncation radius around the surface boundary.

1. Motivation

因为小型无人机的规划需求，通过我们获取与障碍物之间的距离信息是通过 ESDFs。voxblox 主要是使用 TSDF进行建图，然后增量构建ESDFs

voxblox++ 则是走到了object-level，所以voxblox++其实是在voxblox基础上加了语义的label

下面主要是介绍一下voxblox和其延续的voxblox++论文上提到的点。首先是voxblox中说明了为什么使用TSDF进行操作：TSDFs are fast to build and smooth out sensor noise over many observations, and are designed to produce surface meshes.

voxblox 主要还是关注在无人机需要这个地图是用来进行规划的，所以最终形态其实是ESDF 来做规划使用，对比之前的：

[3] 可以增量式构建distance map，但是缺点是 maximum size of the map需要是已知，而且不能动态调整
octomap[4] 虽然能使用，但是难以让人理解的 different for human to parse

为了解决以上问题我们提出了voxblox 这样的系统，可以增量式构建ESDF，同时underlying map representation 可以可视化；同时从TSDF中直接提取距离信息来构建ESDF

而voxblox++ 指出在机械臂抓取中，我们通常需要知道更多信息，其中就包括了3D物体的模型大小，类型等，但是在真实世界中exhibit large variability in object appearance, shape, placement and location, posing a direct chagenge to robotic perception. 虽然CV有针对pixel-level的分割，但是仅识别训练中遇到的；完全基于几何的方法可以适用于openset，但是他们 tend to over-segment the reconstructed objects and additionally fail to provide any semantic information about them, making highlevel scene understanding and task planning impractical.

voxblox++ 系统主要就是增量的构建精确几何信息的volumetric maps，同时标注出所有的object instance，从[7] 的 incremental geometry-based scene segmentation approach然后扩展到完整的 instance-aware semantic mapping

Contribution

voxblox 的主要贡献就是第一个提出使用TSDFs 增量构建ESDFs，然后分析了不同的构建TSDFs的方式在large voxels size的情况下，提升构建速度和表面精度。

而voxblox++ 则是专注于语义的部分，首先是结合了geometric-semantic segmentation that extends object detection，同时有关于预测出的label怎样在多帧之间进行跟踪，匹配等

2. Method

voxblox：

为了exploration和mapping，使用了[12] 提出的voxel hashing；同时因为mapping的block position and their locations in memory通过hash table存储，可以实现O(1)的插入和查找，这种数据结构适合 flexible to growing maps，然后比Octomap更快 O(logn)

voxblox++：

这样看来两个是不太一样的任务不应该放在阅读 hhh ，进阶版干点语义的事大概是这感觉，过程总结：

A frame-wise segmentation scheme combines an unsupervised geometric segmentation of depth images [9] with semantic object predictions from RGB [1]. The use of semantics allows the system to infer the category of some of the 3D segments predicted in a frame, as well as to group segments by the object instance to which they belong. 所以主要是使用深度相机做无监督的几何分割，同时对RGB图片也做mask，得到refined
The tracking of the individual predicted instances across multiple frames is addressed by matching perframe predictions to existing segments in the global map via a data association strategy.
Observed surface geometry and segmentation information are integrated into a global Truncated Signed Distance Field (TSDF) map volume.

主要就是接受了Mask R-CNN 走到点云的点去给出label，注意有时候可能存在不同物体点之间有overlap，voxblox++ 论文中设了一个阈值

整理的integration 也就是使用了voxblox进行的进图，然后给每个voxel分配label info，选取各自最大的object label和semantic class

2.2 TSDF构建

对于文中更新的方式则是如下公式，\(\bf x, p, s \in \R^3\)

\[\begin{aligned}d(\mathbf{x}, \mathbf{p}, \mathbf{s}) &=\|\mathbf{p}-\mathbf{x}\| \operatorname{sign}((\mathbf{p}-\mathbf{x}) \bullet(\mathbf{p}-\mathbf{s})) \\w_{\text {const }}(\mathbf{x}, \mathbf{p}) &=1 \\D_{i+1}(\mathbf{x}, \mathbf{p}) &=\frac{W_i(\mathbf{x}) D_i(\mathbf{x})+w(\mathbf{x}, \mathbf{p}) d(\mathbf{x}, \mathbf{p})}{W_i(\mathbf{x})+w(\mathbf{x}, \mathbf{p})} \\W_{i+1}(\mathbf{x}, \mathbf{p}) &=\min \left(W_i(\mathbf{x})+w(\mathbf{x}, \mathbf{p}), W_{\max }\right)\end{aligned}
\]

x表示current voxel的中心位置
p表示传感器数据的3D point位置
s表示传感器中心
d为来自传感器点的新更新数据

最后关于如何merge 新收的数据和之前的voxel grid

For each point in the sensor scan, we project its position to the voxel grid, and group it with all other points mapping to the same voxel.
Then we take the weighted mean of all points and colors within each voxel, and do raycasting only once on this mean position

速度上比普通的raycasting方法快了20倍

在上面weight是常量为1，但是本文提出了使用更sophisticated weight，主要是[19]中针对RGB-D 发现 the \(\sigma\) of a single ray measurement varied predominantly with z2，其中z为相机坐标系下测量的深度信息，结合对RGB-D model的 behind-surface drop-off的简单假设，设置如下权重：

\[w_{\text {quad }}(\mathbf{x}, \mathbf{p})=\left\{\begin{array}{lr}\frac{1}{z^2} & -\epsilon<d \\\frac{1}{z^2} \frac{1}{\delta-\epsilon}(d+\delta) & -\delta<d<-\epsilon \\0 & d<-\delta\end{array}\right.
\]

其中 truncation distance of \(\delta=4v \text{ and }\epsilon=v\)，其中v 为voxel size

code对应和 paper里中间那个条件没有对应起来：

// Thread safe.

float TsdfIntegratorBase::getVoxelWeight(const Point& point_C) const {

  if (config_.use_const_weight) {

    return 1.0f;

  }

  const FloatingPoint dist_z = std::abs(point_C.z());

  if (dist_z > kEpsilon) {

    return 1.0f / (dist_z * dist_z);

  }

  return 0.0f;

}

2.3 TSDF → ESDF

由voxblox 文档截图出来的：

代码主要在 esdf_integrator.cc 文件中

3. 实验及结果

从图五可以看出voxel size越小 error越小，大了之后的Quadratic Weight操作对于error的减小也有帮助，速度上本篇提出的速度最快，耗时最少

voxblox++

定量结果主要是和3D semantic instance-segmentation的一个方法对比IoU

定性结果

同时还有每个部分所耗的时间，文中给出了计算平台型号

4. Conclusion

所以voxblox主要是把RGB-D收到的信息做一个彩色建图，使用TSDF进行距离信息保留和构建，同时直接从TSDF增量生成ESDF给到规划使用，是一个非常明确下游任务需要的地图类型，real-time, efficient 也并未讨论未来工作

voxblox++添加了每个点上的object level和segmentation label信息，当然这样是耗时的，所以未来工作减少耗时，同时还有 involves investigating the optimal way to fuse RGB and depth information within a unified per-frame object detection, discovery and segmentation framework.

赠人点赞手有余香；正向回馈才能更好开放记录 hhh

【论文阅读】IROS2017: Voxblox & RAL2019: Voxblox++的更多相关文章

论文阅读（Xiang Bai——【PAMI2017】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition）
白翔的CRNN论文阅读 1. 论文题目 Xiang Bai--[PAMI2017]An End-to-End Trainable Neural Network for Image-based Seq ...
BITED数学建模七日谈之三：怎样进行论文阅读
前两天,我和大家谈了如何阅读教材和备战数模比赛应该积累的内容,本文进入到数学建模七日谈第三天:怎样进行论文阅读. 大家也许看过大量的数学模型的书籍,学过很多相关的课程,但是若没有真刀真枪地看过论文,进 ...
论文阅读笔记 - YARN : Architecture of Next Generation Apache Hadoop MapReduceFramework
作者:刘旭晖 Raymond 转载请注明出处 Email:colorant at 163.com BLOG:http://blog.csdn.net/colorant/ 更多论文阅读笔记 http:/ ...
论文阅读笔记 - Mesos: A Platform for Fine-Grained ResourceSharing in the Data Center
作者:刘旭晖 Raymond 转载请注明出处 Email:colorant at 163.com BLOG:http://blog.csdn.net/colorant/ 更多论文阅读笔记 http:/ ...
Deep Reinforcement Learning for Dialogue Generation 论文阅读
本文来自李纪为博士的论文 Deep Reinforcement Learning for Dialogue Generation. 1,概述当前在闲聊机器人中的主要技术框架都是seq2seq模型.但 ...
论文阅读笔记 Word Embeddings A Survey
论文阅读笔记 Word Embeddings A Survey 收获 Word Embedding 的定义 dense, distributed, fixed-length word vectors, ...
论文阅读笔记六：FCN：Fully Convolutional Networks for Semantic Segmentation(CVPR2015)
今天来看一看一个比较经典的语义分割网络,那就是FCN,全称如题,原英文论文网址:https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn ...
论文阅读笔记 Improved Word Representation Learning with Sememes
论文阅读笔记 Improved Word Representation Learning with Sememes 一句话概括本文工作使用词汇资源--知网--来提升词嵌入的表征能力,并提出了三种基于 ...
论文阅读：Prominent Object Detection and Recognition: A Saliency-based Pipeline
论文阅读:Prominent Object Detection and Recognition: A Saliency-based Pipeline 如上图所示,本文旨在解决一个问题:给定一张图像, ...
[置顶] 人工智能（深度学习）加速芯片论文阅读笔记（已添加ISSCC17，FPGA17...ISCA17...）
这是一个导读,可以快速找到我记录的关于人工智能(深度学习)加速芯片论文阅读笔记. ISSCC 2017 Session14 Deep Learning Processors: ISSCC 2017关于 ...

随机推荐

surging版本有哪些？
surging 一直在升级开发各个版本,以下是各个版本的费用,有需要可以联系或者可以聘请我为架构师,技术顾问,后期会升级多语言版本,推广至海外. 基于surging 的物联网技术架构平台版本:
构造照亮世界——快速沃尔什变换 (FWT)
博客园我的博客快速沃尔什变换解决的卷积问题快速沃尔什变换(FWT)是解决这样一类卷积问题: \[c_i=\sum_{i=j\odot k}a_jb_k \] 其中,\(\odot\) 是位运算的 ...
Masscan入门手册
相关文章 https://www.cnblogs.com/huim/p/12116004.html https://4hou.win/wordpress/?cat=3080 Nmap vs Massc ...
grafana模板参考
空的,把面板都删除了 { "__inputs": [ { "name": "DS_PROMETHEUS", "label" ...
hadoop部署2
完全分布式部署介绍学习目标完全分部式是真正利用多台Linux主机来进行部署Hadoop,对Linux机器集群进行规划,使得Hadoop各个模块分别部署在不同的多台机器上. 能够了解完全分布式部署 ...
【题解】P2627 [USACO11OPEN] Mowing the Lawn G
[题解]P2627 [USACO11OPEN] Mowing the Lawn G 题目跳转数据量比较大,暴力肯定是不行的.只能考虑用动态规划的方式来做. 这道题有许多dp设计的思路,这里提供两个: ...
面试官：说说Netty的核心组件？
Netty 核心组件是指 Netty 在执行过程中所涉及到的重要概念,这些核心组件共同组成了 Netty 框架,使 Netty 框架能够正常的运行. Netty 核心组件包含以下内容: 启动器 Boo ...
nginx map模块使用和配置
主机 IP 备注 master1 10.0.0.63 master2 10.0.0.64 1. NGINX MAP模块与NGINX GEO模块在通常情况下,使用nginx基于 ip 限制访问请求频率 ...
uniapp 判断当前是保存还是修改操作
步骤分析: 首先得确定你进入表单后传入了id或者整个对象[这里使用id来进行讲解]其次就是两个请求:POST(保存的) 和 PUT(修改的)最后就是通过传入的id是否存在进行判断即可 POST 请求 ...
Qt OPC UA通信
介绍 OPC UA全称Open Platform Unified Architecture,开放平台统一架构,是工业自动化领域通用的数据交换协议,它有两套主要的通信机制:1.客户端-服务器通信:2.发 ...

【论文阅读】IROS2017: Voxblox & RAL2019: Voxblox++