Overview
In addition, we provide a benchmark suite that covers different aspects of the 4D understanding of human-object interaction. To ensure a fair evaluation of these tasks, we follow common best practices and use the results of the server-side evaluation test set.

We run three challenges that are developed based on the HOI4D dataset. Please see the corresponding homepage for specific descriptions of each challenge. Here, we will only provide short task descriptions and leaderboards.

Important: We do not approve accounts with email addresses from free email providers, such as gmail.com, qq.com, web.de, etc. Only university or company email addresses can be accessed. If you need to use a free email account, please contact us.
4D Semantic Segmentation

Task

In the semantic segmentation of 4D point clouds, we want to infer the semantic label of each 3D point. Therefore, the input of all evaluated methods is a list of coordinates of 3D points. E-ach method should then output a label for each point of the scan.

Metric

We use mean Jaccard or so-called intersection-over-union (mIoU) over all classes, i.e.,

where TPc , FPc, and FNc correspond to the number of true positive, false positive, and false negative predictions for class c, and C is the number of classes.

Leaderboard

Approach Paper Code Institution mloU Details
Enhanced Point TransformerV2 Chinese University of Hong Kong(Shenzhen) 48.0
PPTr IIIS, Tsinghua University 41.0
P4Transformer National University of Singapore 40.1
To evaluate your results, please send the pred.npy file to liuyzchina@gmail.com or yunzeliu77@163.com.
submit
4D Action Segmentation

Task

In this task, we need to give each frame of the point cloud in the point cloud video an action category label. The task’s input is a point cloud video and the output is the action described in each frame of this video.

Task

The following three metrics are reported: framewise accuracy (Acc), segmental edit distance, as well as segmental F1 scores at the overlapping thresholds of 10%, 25%, and 50%. Overlapping thresholds are determined by the IoU ratio.

Leaderboard

The following leaderboard contains only published approaches, where we at least can provide an arXiv link.

Approach Paper Code Institution Acc Edit Details
alisa_24 ZJU 0.8524140508221226 87.82051441704616
alisa_25 ZJU 0.852406576980568 87.81265092940069
XD-Transformer SH ailab 0.8522496263079222 91.38732622245608
XD-Transformer ailab 0.852219730941704 91.17519481582283
alisa_29 ZJU 0.851831091180867 87.97478083010287
HexFormer PKU 0.8518086696562033 88.94718917874394
alisa_14 ZJU 0.851136023916293 88.07147602447434
panda ZJU 0.851136023916293 88.07147602447434
alisa_27 ZJU 0.8511210762331839 88.07147602447434
alisa_26 ZJU 0.8509865470852018 88.23664347531256
alisa_7 ZJU 0.8509043348281017 87.93221777671194
alisa_9 ZJU 0.8509043348281017 87.93221777671194
alisa_29 ZJU 0.8508221225710015 88.26772105034811
alisa_28 ZJU 0.8504783258594918 88.22079506901783
alisa_30 ZJU 0.8492077727952168 87.75859590370912
alisa_31 ZJU 0.8491405082212257 88.19816637961415
alisa_21 ZJU 0.8480269058295964 88.20078933704173
alisa_19 ZJU 0.8469506726457399 87.36048862585304
alisa_22 ZJU 0.8468385650224215 86.99802585040757
alisa_23 ZJU 0.8467713004484305 87.15078206249964
alisa_16 ZJU 0.8465097159940209 88.22798408864078
alisa_17 ZJU 0.8462331838565023 86.93902979158919
alisa_20 ZJU 0.8461061285500747 87.62795308884864
alisa_18 ZJU 0.8460762331838565 86.88416127905313
alisa_15 ZJU 0.8460463378176383 85.61009813281167
alisa_11 ZJU 0.8444095665171898 86.712727305296
alisa_12 ZJU 0.8442899850523169 86.92284630931188
alisa_13 ZJU 0.8437817638266069 86.837199868755
Multi-Conv-Res7 Dalian University of Technology 0.8436920777279522 86.56866819963777
alisa_10 ZJU 0.8410762331838565 84.30390405384944
cos_version26 ailab 0.8406053811659193 91.05057053069882
cos_version29 ailab 0.8405979073243647 91.0819170320088
SAT_Merge_v1 SH ailab 0.8405979073243647 91.0779665729014
cos_version28 ailab 0.8405904334828102 91.07390934462893
cos_version19 ailab 0.8405829596412556 90.93773806749792
cos_version24 ailab 0.8405680119581465 90.96080366526259
Multi-Conv-Res5 Dalian University of Technology 0.8405530642750374 85.57560849851167
cos_version23 ailab 0.8405306427503737 90.91913658434308
cos_version23 ailab 0.8405306427503737 90.91913658434308
SAT_MERGE SH ailab 0.8404783258594918 91.05041810266165
SAT_Merge_v2 SH ailab 0.8404783258594918 91.05041810266165
SAT_Merge_v2 SH ailab 0.8404783258594918 91.05041810266165
Sat_Merge_v3 SH ailab 0.8404783258594918 91.05041810266165
Multi-Conv-Res6 Dalian University of Technology 0.8385201793721974 85.79122676098416
cos_version21 ailab 0.8376905829596413 91.11819971137608
alisa_8 ZJU 0.8372720478325859 80.6485018098239
Multi-Conv-Res8 Dalian University of Technology 0.8368385650224215 88.65632600109427
cos_version18 ailab 0.8350896860986547 91.07177872421552
cos_version21 ailab 0.8350896860986547 91.07177872421552
X4D-SceneFormer No Disclosure 0.8322496263079223 90.62816999024744
submit
Category-Level Object and Part Pose Tracking

Task

In this task, the input is a point cloud video, and given the pose of the object in the first frame, we track this object and give the pose of the object in every frame thereafter. Note that we are referring to the category-level object poses.

Task

The following metrics are used: 5°5cm: percentage of estimates with orientation error <5°and translation error <5cm. Rerr: mean orientation error in degrees. Terr: mean translation error in centimeters.

Leaderboard

The following leaderboard contains only published approaches, where we at least can provide an arXiv link.

Approach Paper Code Institution Details
submit