Graphical Abstract Figure
Graphical Abstract Figure
Close modal

Abstract

Estimating the orientation and position of objects is a crucial step in robotic bin-picking tasks. The challenge lies in the fact that, in real-world scenarios, a diverse array of objects is often randomly stacked, resulting in significant occlusion. This study introduces an innovative approach aimed at predicting 6D poses by processing point clouds through a two-stage neural network. In the initial stage, a network for scenes with low-textured environments is designed. Its purpose is to perform instance segmentation and provide an initial pose estimation. Entering the second stage, a pose refinement network is suggested. This network is intended to enhance the precision of pose prediction, building upon the output from the first stage. To tackle the challenge of resource-intensive annotation, a simulation technique is employed to generate a synthetic dataset. Additionally, a dedicated software tool has been developed to annotate real point cloud datasets. In practical experiments, our method demonstrated superior performance compared to baseline methods such as PointGroup and Iterative Closest Point. This superiority is evident in both segmentation accuracy and pose refinement. Moreover, practical grasping experiments have underscored the method's efficacy in real-world industrial robot bin-picking applications. The results affirm its capability to successfully address the challenges produced by occluded and randomly stacked objects.

References

1.
Kaipa
,
K. N.
,
Kankanhalli-Nagendra
,
A. S.
,
Kumbla
,
N. B.
,
Shriyam
,
S.
,
Thevendria-Karthic
,
S. S.
,
Marvel
,
J. A.
, and
Gupta
,
S. K.
,
2016
, “
Addressing Perception Uncertainty Induced Failure Modes in Robotic Bin-Picking
,”
Robot. Comput.-Integr. Manuf.
,
42
, pp.
17
38
.
2.
Lowe
,
D. G.
,
2004
, “
Distinctive Image Features From Scale-Invariant Keypoints
,”
Int. J. Comput. Vision
,
60
(
2
), pp.
91
110
.
3.
Bay
,
H.
,
Ess
,
A.
,
Tuytelaars
,
T.
, and
Van Gool
,
L.
,
2008
, “
Speeded-up Robust Features (SURF)
,”
Comput. Vis. Image Und.
,
42
(
3
), pp.
346
359
.
4.
Besl
,
P. J.
, and
McKay
,
N. D.
,
1992
, “
A Method for Registration 3-D Shapes
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
14
(
2
), pp.
239
256
.
5.
Drost
,
B.
,
Ulrich
,
M.
,
Navab
,
N.
, and
Ilic
,
S.
, 2010, “
Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
,
San Francisco, USA
, pp.
998
1005
.
6.
Xiang
,
Y.
,
Schmidt
,
T.
,
Narayanan
,
V.
, and
Fox
,
D.
,
2017
,
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
.
Online Referencing
, https://arxiv.org/abs/1711.00199.
7.
Dong
,
Z.
,
Liu
,
S.
,
Zhou
,
T.
,
Cheng
,
H.
,
Zeng
,
L.
,
Yu
,
X.
, and
Liu
,
H.
, 2019, “
PPR-Net: Point-Wise Pose Regression Network for Instance Segmentation and 6D Pose Estimation in Bin-Picking Scenarios
,”
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Macau, China
, pp.
1773
1780
.
8.
Wang
,
C.
,
Xu
,
D.
,
Zhu
,
Y.
,
Martín-Martín
,
R.
,
Lu
,
C.
,
Fei-Fei
,
L.
, and
Savarese
,
S.
, 2019, “
Densefusion: 6D Object Pose Estimation by Iterative Dense Fusion
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, USA
, pp.
3343
3352
.
9.
He
,
Y.
,
Sun
,
W.
,
Huang
,
H.
,
Liu
,
J.
,
Fan
,
H.
, and
Sun
,
J.
, 2020, “
Pvn3D: A Deep Point-Wise 3D Keypoints Voting Network for 6 Dof Pose Estimation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Seattle, USA
, pp.
11632
11641
.
10.
Modassir
,
M.
,
Dilip Paranjape
,
O.
, and
Vadali
,
M.
,
2024
, “
Finite Element Method-Based Dynamic Modeling Framework for Flexible Continuum Manipulators
,”
ASME J. Mech. Rob.
,
16
(
10
), p.
101013
.
11.
Shirafuji
,
S.
,
Goto
,
H.
,
Zhang
,
X.
,
Okuhara
,
K.
,
Takamura
,
N.
,
Kagawa
,
N.
,
Baba
,
H.
, and
Ota
,
J.
,
2024
, “
Visual-Biased Observability Index for Camera-Based Robot Calibration
,”
ASME J. Mech. Rob.
,
16
(
5
), p.
051010
.
12.
Zhuang
,
C.
,
Li
,
S.
, and
Ding
,
H.
,
2023
, “
Instance Segmentation Based 6D Pose Estimation of Industrial Objects Using Point Clouds for Robotic Bin-Picking
,”
Robot. Comput.-Integr. Manuf.
,
82
, p.
102541
.
13.
Chowdhury
,
A. B.
,
Li
,
J.
, and
Cappelleri
,
D. J.
,
2023
, “
Neural Network-Based Pose Estimation Approaches for Mobile Manipulation
,”
ASME J. Mech. Rob.
,
15
(
1
), p.
011009
.
14.
Kumar
,
R.
, and
Mukherjee
,
S.
,
2022
, “
Algorithmic Selection of Preferred Grasp Poses Using Manipulability Ellipsoid Forms
,”
ASME J. Mech. Rob.
,
14
(
5
), p.
051006
.
15.
Zang
,
X.
,
Wang
,
C.
,
Zhang
,
P.
,
Liu
,
G.
,
Zhang
,
X.
, and
Zhao
,
J.
,
2023
, “
A Novel Design of a Multi-Fingered Bionic Hand With Variable Stiffness for Robotic Grasp
,”
ASME J. Mech. Rob.
,
15
(
4
), p.
045001
.
16.
Zeng
,
L.
,
Lv
,
W. J.
,
Dong
,
Z. K.
, and
Liu
,
Y. J.
,
2021
, “
PPR-Net++: Accurate 6D Pose Estimation in Stacked Scenarios
,”
IEEE Trans. Autom. Sci. Eng.
,
19
(
4
), pp.
3139
3151
.
17.
Kleeberger
,
K.
,
Landgraf
,
C.
, and
Huber
,
M. F.
, 2019, “
Large-Scale 6D Object Pose Estimation Dataset for Industrial Bin-Picking
,”
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Macau, China
, pp.
2573
2578
.
18.
Besl
,
P.
, and
Mckay
,
N.
, 1992, “
Method for Registration of 3D Shapes
,”
Proceedings of the Sensor Fusion IV: Control Paradigms and Data Structures. SPIE
,
Boston, MA
,
Apr. 30
, pp.
586
606
.
19.
Çiçek
,
Ö.
,
Abdulkadir
,
A.
,
Lienkamp
,
S.
,
Brox
,
T.
, and
Ronneberger
,
O.
, 2016, “
3D U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation
,”
Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI
,
Athens, Greece
, Proceedings, Part II 19.
Springer
, pp.
424
432
.
20.
Wang
,
X.
,
Girshick
,
R.
,
Gupta
,
A.
, and
He
,
K.
, 2018, “
Non-Local Neural Networks
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT, USA
, pp.
7794
7803
.
21.
Redmon
,
J.
,
Divvala
,
S.
,
Girshick
,
R.
, and
Farhadi
,
A.
, 2016, “
You Only Look Once: Unified, Real-Time Object Detection
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Las Vagas, NV, USA
, pp.
779
788
.
22.
Hu
,
X.
,
Liu
,
Y.
,
Zhao
,
Z.
,
Liu
,
J.
,
Yang
,
X.
,
Sun
,
C.
,
Chen
,
C.
,
Li
,
B.
, and
Zhou
,
C.
,
2021
, “
Real-Time Detection of Uneaten Feed Pellets in Underwater Images for Aquaculture Using an Improved YOLO-V4 Network
,”
Comput. Electron. Agric.
,
185
, p.
106135
.
23.
Liu
,
W.
,
Anguelov
,
D.
,
Erhan
,
D.
,
Szegedy
,
C.
,
Reed
,
S.
,
Fu
,
C. Y.
, and
Berg
,
A. C.
, 2016, “
SSD: Single Shot Multibox Detector
,”
Proceedings of the Computer Vision–ECCV 2016: 14th European Conference
,
Amsterdam, The Netherlands
, Proceedings, Part I 14.
Springer
, pp.
21
37
.
24.
He
,
K.
,
Gkioxari
,
G.
,
Dollár
,
P.
, and
Girshick
,
R.
, 2017, “
Mask R-CNN
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Venice, Italy
, pp.
2961
2969
.
25.
Charles
,
R. Q.
,
Su
,
H.
,
Mo
,
K.
, and
Guibas
,
L. J.
, 2017, “
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Hawaii, USA
, pp.
77
85
.
26.
Charles
,
R. Q.
,
Yi
,
L.
,
Su
,
H.
,
Mo
,
K.
, and
Guibas
,
L. J.
,
2017
, “
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
,”
Adv. Neural Inf. Process. Syst.
,
30
, pp.
5099
5108
.
27.
Wang
,
W.
,
Yu
,
R.
,
Huang
,
Q.
, and
Neumann
,
U.
, 2018, “
SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
, pp.
2569
2578
.
28.
Wang
,
X.
,
Liu
,
S.
,
Shen
,
X.
,
Shen
,
C.
, and
Jia
,
J.
, 2019, “
Associatively Segmenting Instances and Semantics in Point Clouds
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
, pp.
4096
4105
.
29.
Jiang
,
L.
,
Zhao
,
H.
,
Shi
,
S.
,
Liu
,
S.
,
Fu
,
C. W.
, and
Jia
,
J.
, 2020, “
Pointgroup: Dual-Set Point Grouping for 3D Instance Segmentation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Washington State, USA
, pp.
4867
4876
.
30.
Cao
,
H.
,
Dirnberger
,
L.
,
Bernardini
,
D.
,
Piazza
,
C.
, and
Caccamo
,
M.
,
2023
, “
6IMPOSE: Bridging the Reality Gap in 6D Pose Estimation for Robotic Grasping
,”
Front. Rob. AI
,
10
, p.
1176492
.
31.
Jiang
,
H.
,
Dang
,
Z.
,
Gu
,
S.
,
Piazza
,
C.
, and
Caccamo
,
M.
, 2023, “
Center-Based Decoupled Point-Cloud Registration for 6D Object Pose Estimation
,”
Proceedings of the IEEE/CVF International Conference on Computer Vision
,
Pairs, France
, pp.
3427
3437
.
32.
Zhuang
,
C.
,
Wang
,
H.
, and
Ding
,
H.
,
2024
, “
AttentionVote: A Voting Network of Anchor-Free 6D Pose Estimation on Point Cloud for Robotic Bin-Picking Application
,”
Rob. Comput.-Integr. Manuf.
,
86
, p.
102671
.
33.
Stapf
,
S.
,
Bauernfeind
,
T.
, and
Riboldi
,
M.
, PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation With Confidence-Level Prediction and Pose Tokens. Online Referencing, arXiv:2311.17504.
34.
Chen
,
H.
,
Manhardt
,
F.
,
Navab
,
N.
, and
Busam
,
B.
, 2023, “
Texpose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Vancouver, Canada
, pp.
4841
4852
.
35.
Tobin
,
J.
,
Fong
,
R.
,
Ray
,
A.
,
Schneider
,
J.
,
Zaremba
,
W.
, and
Abbeel
,
P.
, 2017, “
Domain Randomization for Transferring Deep Neural Networks From Simulation to the Real World
,”
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and System
,
Vancouver, Canada
, pp.
23
30
.
36.
Bello
,
S. A.
,
Yu
,
S.
,
Wang
,
C.
,
Adam
,
J. M.
, and
Li
,
J.
,
2020
, “
Deep Learning on 3D Point Clouds
,”
Remote Sens.
,
12
(
11
), p.
1729
.
37.
Graham
,
B.
, and
Maaten
,
L. V.
, Submanifold Sparse Convolutional Networks. Online Referencing, .
38.
de Boer
,
P.-T.
,
Kroese
,
D. P.
,
Mannor
,
S.
, and
Rubinstein
,
R. Y.
,
2005
, “
A Tutorial on the Cross-Entropy Method
,”
Ann. Oper. Res.
,
134
(
1
), pp.
19
67
.
39.
Ding
,
Z.
,
Han
,
X.
, and
Niethammer
,
M.
, 2019, “
Votenet: A Deep Learning Label Fusion Method for Multi-Atlas Segmentation
,”
Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference
,
Shenzhen, China
, Proceedings, Part III 22.
Springer
, pp.
202
210
.
You do not currently have access to this content.