Abstract

Compared with two-dimensional (2D) multi-object tracking (MOT) algorithms, three-dimensional (3D) multi-object tracking algorithms have more research significance and broad application prospects in the unmanned vehicles research field. Aiming at the problem of 3D multi-object detection and tracking, in this paper, the multi-object tracker CenterTrack, which focuses on 2D multi-object tracking task while ignoring object 3D information, is improved mainly from two aspects of detection and tracking, and the improved network is called CenterTrack3D. In terms of detection, CenterTrack3D uses the idea of attention mechanism to optimize the way that the previous-frame image and the heatmap of previous-frame tracklets are added to the current-frame image as input, and second convolutional layer of the hm output head is replaced by dynamic convolution layer, which further improves the ability to detect occluded objects. In terms of tracking, a cascaded data association algorithm based on 3D Kalman filter is proposed to make full use of the 3D information of objects in the image and increase the robustness of the 3D multi-object tracker. The experimental results show that, compared with the original CenterTrack and the existing 3D multi-object tracking methods, CenterTrack3D achieves 88.75% MOTA for cars and 59.40% MOTA for pedestrians and is very competitive on the KITTI tracking benchmark test set.

References

1.
Gioele
,
C.
,
Francisco
,
L. S.
,
Siham
,
T.
,
Luigi
,
T.
, and
Francisco
,
H.
,
2020
, “
Deep Learning in Video Multi-Object Tracking: A Survey
,”
Neurocomputing
,
381
(
C
), pp.
61
88
.
2.
Punchihewa
,
Y. G.
,
Vo
,
B. T.
,
Vo
,
B. N.
, and
Kim
,
D. Y.
,
2018
, “
Multiple Object Tracking in Unknown Backgrounds With Labeled Random Finite Sets
,”
IEEE Trans. Signal Process.
,
66
(
11
), pp.
3040
3055
.
3.
Zhou
,
X.
,
Koltun
,
V.
, and
Krhenbühl
,
P.
,
2020
, “
Tracking Objects as Points
,”
European Conference on Computer Vision
,
Springer
,
Cham
, pp.
474
490
.
4.
Zhou
,
X.
,
Wang
,
D.
, and
Krhenbühl
,
P.
,
2019
, “
Objects as Points
,”
preprint arXiv:1904.07850v1
.
5.
Ren
,
S.
,
He
,
K.
,
Girshick
,
R.
, and
Sun
,
J.
,
2015
, “
Faster r-cnn: Towards Real-Time Object Detection With Region Proposal Networks
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
39
(
6
), pp.
1137
1149
.
6.
Redmon
,
J.
, and
Farhadi
,
A.
,
2018
, “
Yolov3: An Incremental Improvement
,”
preprint arXiv:1804.02767
.
7.
Liu
,
W.
,
Anguelov
,
D.
,
Erhan
,
D.
,
Szegedy
,
C.
,
Reed
,
S.
,
Fu
,
C. Y.
, and
Berg
,
A. C.
,
2016
, “
Ssd: Single Shot Multibox Detector
,”
The 14th European Conference on Computer Vision
,
Springer
,
Cham
, pp.
21
37
.
8.
Bochkovskiy
,
A.
,
Wang
,
C. Y.
, and
Liao
,
H. Y. M.
,
2020
, “
YOLOv4: Optimal Speed and Accuracy of Object Detection
,”
preprint arXiv:2004.10934
.
9.
Ku
,
J.
,
Mozifian
,
M.
,
Lee
,
J.
,
Harakeh
,
A.
, and
Waslander
,
S. L.
,
2018
, “
Joint 3d Proposal Generation and Object Detection From View Aggregation
,”
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Spain
, pp.
1
8
.
10.
Chen
,
X.
,
Ma
,
H.
,
Wan
,
J.
,
Li
,
B.
, and
Xia
,
T.
,
2017
, “
Multi-View 3D Object Detection Network for Autonomous Driving
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
, pp.
1907
1915
.
11.
Bewley
,
A.
,
Ge
,
Z.
,
Ott
,
L.
,
Ramos
,
F.
, and
Upcroft
,
B.
,
2016
, “
Simple Online and Realtime Tracking
,”
2016 IEEE International Conference on Image Processing (ICIP)
,
Phoenix, AZ
, pp.
3464
3468
.
12.
Weng
,
X.
,
Wang
,
J.
,
Held
,
D.
, and
Kitani
,
K.
,
2020
, “
AB3DMOT: A Baseline for 3D Multi-Object Tracking and New Evaluation Metrics
,”
preprint arXiv:2008.08063
.
13.
Shenoi
,
A.
,
Patel
,
M.
,
Gwak
,
J.
,
Goebel
,
P.
,
Sadeghian
,
A.
,
Rezatofighi
,
H.
, and
Savarese
,
S.
,
2020
, “
JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset
,”
preprint arXiv:2002.08397
.
14.
Hu
,
H. N.
,
Cai
,
Q. Z.
,
Wang
,
D.
,
Lin
,
J.
,
Sun
,
M.
,
Krahenbuhl
,
P.
, and
Yu
,
F.
,
2019
, “
Joint Monocular 3D Vehicle Detection and Tracking
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Seoul, South Korea
, pp.
5390
5399
.
15.
Lin
,
T. Y.
,
Goyal
,
P.
,
Girshick
,
R.
,
He
,
K.
, and
Dollár
,
P.
,
2017
, “
Focal Loss for Dense Object Detection
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Venice, Italy
, pp.
2980
2988
.
16.
Zhang
,
W.
,
Zhou
,
H.
,
Sun
,
S.
,
Wang
,
Z.
,
Shi
,
J.
, and
Loy
,
C. C.
,
2019
, “
Robust Multi-Modality Multi-Object Tracking
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Seoul, South Korea
, pp.
2365
2374
.
17.
Bernardin
,
K.
, and
Stiefelhagen
,
R.
,
2008
, “
Evaluating Multiple Object Tracking Performance: the CLEAR MOT Metrics
,”
EURASIP J. Image Video Process.
,
2008
, pp.
1
10
.
18.
Choi
,
W.
,
2015
, “
Near-online Multi-Target Tracking With Aggregated Local Flow Descriptor
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Santiago, Chile
, pp.
3029
3037
.
19.
Pang
,
J.
,
Qiu
,
L.
,
Chen
,
H.
,
Li
,
Q.
,
Darrell
,
T.
, and
Yu
,
F.
,
2020
, “
Quasi-Dense Similarity Learning for Multiple Object Tracking
,”
preprint arXiv:2006.06664
.
You do not currently have access to this content.