Overview
In addition, we provide a benchmark suite that covers different aspects of the 4D understanding of human-object interaction. To ensure a fair evaluation of these tasks, we follow common best practices and use the results of the server-side evaluation test set.

We run three challenges that are developed based on the HOI4D dataset. Please see the corresponding homepage for specific descriptions of each challenge. Here, we will only provide short task descriptions and leaderboards.

Important: We do not approve accounts with email addresses from free email providers, such as gmail.com, qq.com, web.de, etc. Only university or company email addresses can be accessed. If you need to use a free email account, please contact us.
4D Semantic Segmentation

Task

In the semantic segmentation of 4D point clouds, we want to infer the semantic label of each 3D point. Therefore, the input of all evaluated methods is a list of coordinates of 3D points. E-ach method should then output a label for each point of the scan.

Metric

We use mean Jaccard or so-called intersection-over-union (mIoU) over all classes, i.e.,

where TPc , FPc, and FNc correspond to the number of true positive, false positive, and false negative predictions for class c, and C is the number of classes.

Leaderboard

Approach Paper Code Institution mloU Details
Enhanced Point TransformerV2 Chinese University of Hong Kong(Shenzhen) 48.0
PPTr IIIS, Tsinghua University 41.0
P4Transformer National University of Singapore 40.1
To evaluate your results, please send the pred.npy file to liuyzchina@gmail.com or yunzeliu77@163.com.
submit
4D Action Segmentation

Task

In this task, we need to give each frame of the point cloud in the point cloud video an action category label. The task’s input is a point cloud video and the output is the action described in each frame of this video.

Task

The following three metrics are reported: framewise accuracy (Acc), segmental edit distance, as well as segmental F1 scores at the overlapping thresholds of 10%, 25%, and 50%. Overlapping thresholds are determined by the IoU ratio.

Leaderboard

The following leaderboard contains only published approaches, where we at least can provide an arXiv link.

Approach Paper Code Institution Acc Edit Details
P4Tr personal 0.7152914798206278 71.30564963523261
PPTr personal 0.7850149476831091 80.40452258471214
test test 0.31198056801195817 34.140717851121174
test test 0.31198056801195817 34.140717851121174
XD-Transformer ailab 0.852219730941704 91.17519481582283
XD_Transformer 111 0.8322496263079223 90.62816999024744
X4D-SceneFormer No Disclosure 0.8322496263079223 90.62816999024744
XD-Transformer SH ailab 0.8522496263079222 91.38732622245608
alisa_31 ZJU 0.8491405082212257 88.19816637961415
alisa_30 ZJU 0.8492077727952168 87.75859590370912
test_c_tk7 test 0.8092974588938715 76.67340242967721
test_c_tk5 test 0.8061733931240658 73.95311308346066
Sat_Merge_v3 SH ailab 0.8404783258594918 91.05041810266165
SAT_Merge_v2 SH ailab 0.8404783258594918 91.05041810266165
SAT_Merge_v2 SH ailab 0.8404783258594918 91.05041810266165
SAT_Merge_v1 SH ailab 0.8405979073243647 91.0779665729014
SAT_MERGE SH ailab 0.8404783258594918 91.05041810266165
cos_version29 ailab 0.8405979073243647 91.0819170320088
cos_version28 ailab 0.8405904334828102 91.07390934462893
alisa_29 ZJU 0.8508221225710015 88.26772105034811
cos_version26 ailab 0.8406053811659193 91.05057053069882
Multi-Conv-Res_final Dalian University of Technology 0.8290059790732437 87.41627070424364
0607 SJTU 0.8196038863976084 90.09866119757815
cos_version24 ailab 0.8405680119581465 90.96080366526259
Multi-Conv-Res11 Dalian University of Technology 0.8303811659192825 87.46054470065347
Multi-Conv-Res0 Dalian University of Technology 0.8289536621823618 87.3393173540375
2306061442 SJTU 0.4501420029895366 74.95797297667686
test SJTU 0.19133781763826607 6.979134269346354
Multi-Conv-Res10 Dalian University of Technology 0.8289536621823618 87.3393173540375
cos_version23 ailab 0.8405306427503737 90.91913658434308
cos_version23 ailab 0.8405306427503737 90.91913658434308
2306061429 SJTU 0.7542750373692078 80.93867545414864
alisa_29 ZJU 0.851831091180867 87.97478083010287
test SJTU 0.7542750373692078 80.93867545414864
Multi-Conv-Res10 Dalian University of Technology 0.8289536621823618 87.3393173540375
cos_version21 ailab 0.8376905829596413 91.11819971137608
test SJTU 0.4511733931240658 74.95797297667686
cos_version21 ailab 0.8350896860986547 91.07177872421552
alisa_28 ZJU 0.8504783258594918 88.22079506901783
Multi-Conv-Res9 Dalian University of Technology 0.828034379671151 87.26804806477168
alisa_27 ZJU 0.8511210762331839 88.07147602447434
alisa_26 ZJU 0.8509865470852018 88.23664347531256
Cos_version20 SH ailab 0.8291928251121077 91.03114864122064
cos_version19 ailab 0.8405829596412556 90.93773806749792
cos_version18 ailab 0.8350896860986547 91.07177872421552
Cos_version17 ailab 0.8302989536621823 89.76397027640938
cos_version16 ailab 0.8267189835575486 89.04176941701768
panda ZJU 0.851136023916293 88.07147602447434
alisa_25 ZJU 0.852406576980568 87.81265092940069
alisa_24 ZJU 0.8524140508221226 87.82051441704616
submit
Category-Level Object and Part Pose Tracking

Task

In this task, the input is a point cloud video, and given the pose of the object in the first frame, we track this object and give the pose of the object in every frame thereafter. Note that we are referring to the category-level object poses.

Task

The following metrics are used: 5°5cm: percentage of estimates with orientation error <5°and translation error <5cm. Rerr: mean orientation error in degrees. Terr: mean translation error in centimeters.

Leaderboard

The following leaderboard contains only published approaches, where we at least can provide an arXiv link.

Approach Paper Code Institution Details
submit