Fast 6D Object Pose Estimation from a RGB-D Image Using Balanced Pose Tree
Yoshinori KONISHI, Kosuke HATTORI and Manabu HASHIMOTO
Abstract
In this paper, we propose a fast and robust 6D pose estimation of objects from a RGB-D image. Our proposed method consists of two components: PCOF-MOD (multimodal perspectively cumulated orientation feature) and balanced pose tree. PCOF-MOD is based on the orientation histograms of depth gradient and surface normal vectors, those are extracted on synthesized depth images using randomized 3D pose parameters and 3D CAD data of a target object. Therefore, the model templates of PCOF-MOD explicitly handle a certain range of 3D object pose. Additionally, a large number of templates are organized into a coarse-to-fine 3D pose tree in order to accelerate 6 D pose estimation. Predefined polyhedra for viewpoint sampling are prepared for each level of an image pyramid and 3D object pose trees are built so that the number of child nodes of every parent node are almost equal in each pyramid level. In the experimental evaluation of 6 D object pose estimation on publicly available RGB-D image dataset, our proposed method showed higher accuracy and comparable speed in comparison with state-of-the-art techniques.
Key words: 6D pose estimation, RGB-D image, 3D CAD, template matching, PCOF-MOD, balanced pose tree
Fig. 1 Our new template based algorithm can estimate 6D object pose from a RGB-D image which contains cluttered backgrounds and partial occlusions. It takes an average of approximately 100 ms on a single CPU core
Fig. 2 (a) 3D CAD of iron, its coordinate axes and a sphere for viewpoint sampling (b) Examples of depth images from randomized viewpoints around a certain vertex
図2(b)と同じ視点範囲において生成したデプス画像から 算出した方向ヒストグラム,累積方向特徴量(ori),特徴量の 重み(w)を任意に選択した四つの画素について図4に示し た. 点 A と B は勾配方向画像群から,点 C と D は法線方向画像群から選択した。画像の生成枚数 (N)(N) 及び頻度に対す るしきい値( Th)T h) は実験的に決定し ^(13)){ }^{13)} ,勾配方向画像群に対 しては N=1000N=1000 と Th=100T h=100 ,法線方向画像群に対しては N=1000N=1000 と Th=200T h=200 とした。PCOF 抽出処理の結果,勾配
Fig. 4 Examples of the orientation histograms, binary features (ori) and their weights (w) on arbitrarily selected pixels. Pixel A and B are extracted from gradient orientations, and pixel C and D are from normal orientations. Red dotted lines show the threshold for feature extraction
上式では入力画像の量子化方向 (ori^(I))\left(o r i^{I}\right) がテンプレートの PCOF ( ori ^(T){ }^{T} )に含まれている場合に重み( ww )が照合スコアに加算 される. また式(2)のデルタ関数は式(3)のようにビット積 (記号へ)で高速に演算可能であり,CPU 固有のSIMD 命令を 用いることで照合演算の更なる高速化を図ることもできる。
delta_(i)(ori^(I)in ori^(T))={[w_(i)," if "" ori "i^(I)^^ori^(T) > 0],[0," otherwise "]:}\delta_{i}\left(o r i^{I} \in o r i^{T}\right)= \begin{cases}w_{i} & \text { if } \text { ori } i^{I} \wedge o r i^{T}>0 \\ 0 & \text { otherwise }\end{cases}
Fig. 6 Part of the balanced pose tree of the iron are shown. The bottom templates are originally created PCOF-MOD templates and the tree structures are built in a bottom-up way by adding and downscaling of orientation histograms. In the estimation of object pose, the tree is traced from top to bottom along the red arrow
2592, 41088 個となる.
Fig. 8 Examples of results whose mean squared errors of transformed vertices were almost 10%10 \%. Left: driller (the error was 9.6%9.6 \% ). Right: duck (the error was 9.7%9.7 \% )
duck の誤差が 9.7%9.7 \% であるが,描画結果からは対像物体の位置姿勢をほぼ正しく認識できていることが確認でき, 10%10 \% は認識成功率の基準として妥当であると言える。
A. Johnson and M. Hebert: Using spin images for efficient object recognition in cluttered 3D scenes, IEEE Trans. Pattern Anal. Mach. Intell. 21, 5, (1999) 433.
R.B. Rusu, N. Blodow and M. Beetz: Fast point feature histogram (FPFH) for 3d registration, Proc. IEEE Int. Conf. Robotics and Automation, (2009) 1848
F. Tombari, S. Salti, L.D. Stefanob: Unique signatures of histograms for local surface description, Proc. European Conf. Comput. Vision, (2010) 356 .
B. Drost, M. Ulrich, N. Navab and S. Ilic: Model globally, match locally: Efficient and robust 3D object recognition, Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2010) 998.
S. Hinterstoisser, V. Lepetit, N. Rajkumar and K. Konolige: Going further with point pair features, Proc. European Conf. Comput. Vision, (2016) 834
S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G.R. Bradski, K. Konolige and N. Navab: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes, Proc. Asian Conf. Comput. Vision, (2012) 548.
E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton and C. Rother: Learning 6D object pose estimation using 3 d object coordinates, Proc. European Conf. Comput. Vision, (2014) 536.
A. Tejani, D. Tang, R. Kouskouridas and T.K. Kim: Latent-class hough forests for 3D object detection and pose estimation, Proc. European Conf. Comput. Vision, (2014) 462.
W. Kehl, F. Milletari, F. Tombari, S. Ilic, and N. Navab: Deep learning of local rgb-d patches for 3D object detection and 6D pose estimation, Proc. European Conf. Comput. Vision, (2016) 205.
W. Kehl, F. Tombari, N. Navab, S. Ilic and V. Lepetit: Hashmod: A hashing method for scalable 3D object detection, Proc. British Mach. Vision Conf., (2015).
T. Hodan, X. Zabulis, M. Lourakis, S. Obdrzalek and J. Matas: Detection and fine 3D pose estimation of texture-less objects in RGB-D images, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., (2015) 4421.
S. Hinterstoisser, C. Cagniart, S. Ilic, P. Sturm, N. Navab, P. Fua and V. Lepetit: Gradient response maps for real-time detection of textureless objects, IEEE Trans. Pattern Anal. Mach. Intell. 34, 5, (2012) 876.
Y. Konishi, Y. Hanzawa, M. Kawade and M. Hashimoto: Fast 6D6 D pose estimation from a monocular image using hierarchical pose tree, Proc. European Conf. Comput. Vision, (2016) 398.
M. Ulrich, C. Wiedemann and C. Steger: Combining scale-space and similarity-based aspect graphs for fast 3D object recognition, IEEE Trans. Pattern Anal. Mach. Intell. 34, 10, (2012) 1902.
A. Crivellaro, M. Rad, Y. Verdie, K.M. Yi, P. Fua and V. Lepetit: A novel representation of parts for accurate 3D object detection and tracking in monocular images, Proc. IEEE Int. Conf. Comput. Vision, (2015) 4391 .
V. Lepetit, J. Pilet and P. Fua: Pointmatching as a classification problem for fast and robust object pose estimation, Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2004) 244.
E. Brachmann, F. Michel, A. Krull, M.Y. Yang, S. Gumhold and C. Rother: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image, Proc. IEEE Conf. Comput. Vision Pattern Recognit., (2016) 3364.
R. Rios-Cabrera and T. Tuytelaars: Discriminatively trained templates for 3D object detection: A real time scalable approach, Proc. IEEE Int. Conf. Comput. Vision, (2013) 2048.
S. Hinterstoisser, S. Benhimane, V. Lepetit, P. Fua and N. Navab: Simultaneous recognition and homography extraction of local patches with a simple linear classifier, Proc. British Mach. Vision Conf., (2008).
R.I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision, Second edn., Cambridge University Press, 2004.
S. Rusinkiewicz and M. Levoy: Efficient variants of the ICP algorithm, Proc. 3D Digital Imaging and Modeling, (2001) 145.