Data Availability 数据可用性
- Data will be made available on request.
数据将应要求提供。
Fig. 1. Visualization of the multi-view images. The occluded portion of a person cannot be captured in 2D RGB images(column 1), but it can be observed in 3D multi-view images(columns 3–6).
图 1.多视图图像的可视化。人物的遮挡部分无法在 2D RGB 图像(第 1 列)中捕捉,但可以在 3D 多视图图像(第 3-6 列)中观察到。
Fig. 2. The network structure of MV-ReID. An occluded person image is converted into multiview images by 3D reconstruction and rendering. Then, the multi-view grouping mechanism and a 2D descriptor extract the 3D multi-view feature and 2D texture feature. After that, the 3D multi-view feature and 2D texture feature are aggregated into a unified space to predict person ID. Multi-view and original images need to be simultaneously fed into the model.
图 2.MV-ReID 的网络结构。被遮挡的人物图像通过 3D 重建和渲染转换为多视图图像。然后,利用多视图分组机制和 2D 描述符提取 3D 多视图特征和 2D 纹理特征;之后,将 3D 多视图特征和 2D 纹理特征聚合到一个统一的空间中,以预测人员 ID,多视图和原始图像需要同时输入到模型中。
Fig. 3. The generation of multi-view images. The original RGB image is reconstructed to a 3D human model and then rendered to multi-view images.
图 3.生成多视图图像。原始 RGB 图像被重建为 3D 人体模型,然后渲染为多视图图像。
Fig. 4. Illustration of the random rendering strategy. Twelve rendering viewpoints(green points) are randomly selected from the vertices of a dodecahedron (total 20 vertices).
图 4.随机渲染策略的图示。从十二面体的顶点(总共 20 个顶点)中随机选择 12 个渲染视点(绿点)。
Table 1. The scale of Multi-view datasets.
表 1.多视图数据集的比例。
Dataset 数据 | Empty Cell | #ID | #3D model #3D 型号 | #MV image #MV 图像 |
---|---|---|---|---|
MV-Market-1501 MV-市场-1501 | Training 训练 | 751 752 750 | 12,936 19,732 3368 | 155,232 236,784 40,416 |
Gallery 画廊 | ||||
Query 查询 | ||||
MV-DukeMTMC MV-杜克MTMC | Training 训练 | 702 1110 702 | 16,522 17,661 2228 | 198,264 211,932 26,736 |
Gallery 画廊 | ||||
Query 查询 | ||||
MV-Occluded-Duke | Training 训练 | 702 1110 519 | 15,618 17,661 2210 | 187,146 211,932 26,520 |
Gallery 画廊 | ||||
Query 查询 | ||||
MV-Occluded-ReID | Training 训练 | - 200 200 | - 1000 1000 | - 12,000 12,000 |
Gallery 画廊 | ||||
Query 查询 |
Algorithm 1. Multi-view learning algorithm.
算法 1.多视图学习算法。
Input: Original image and multi-view images , pre-trained multi-view descriptor and pre-trained texture extractor , Classifier layer , training epoch , the label of input person . 输入:原始图像 和多视图图像 ,预训练多视图描述符 和预训练纹理提取器 ,分类器层 ,训练纪元 ,输入人员 的标签。 Output: The id of input person . 输出:输入 person 的 ID 。 1: for in do 1:for in do |
---|
2: Extract from by and then calculate the discrimination score of each . Finally, divided the into groups according to the discrimination score, the are remark as as detailed in Eq. (1) and 2; 2:从 by 中提取 然后计算每个 的判别分数 。 最后,根据判别分数将 分为 几组, 这些 表示 如方程 (1) 和 2 中详述; |
3: Conduct view-level feature fusion to fuse the features within each group, the extraction of group-level feature as detailed in Eq. (3), and 4; 3:进行视图级特征融合,融合每组内的特征,提取组级特征 ,如方程 (3)和4所示; |
4: Conduct attention-based strategy to obtain the multi-view feature by fusing group-level features , as detail in Eq. (5), 6, 7; 4: 通过融合群体级特征 ,进行基于注意力的策略,获得多视图特征 ,如方程 (5)、6、7 所示; |
5: Conduct cross-modal feature fusion to aggregate and to obtain global person feature as detail in Eq. (8), 9 and 10; 5: 进行跨模态特征融合,聚合 并 得到方程 (8)、9 和 10 中详细描述的全局人员特征 ; |
6: Predict person ID by and then calculate the loss between and to update the weight of , and . 6:预测 人员 ID,然后计算 和 之间的 损失,以更新 和 和 的权重 。 |
7: end for 7:结束 |
8: predict person ID by ; 8:通过 ; |
Fig. 5. The proposed grouping mechanism framework. The view-level features are extracted from multi-view images . And then, the grouping mechanism divides view-level features into different groups , and the elements in -th group are represented as . After that, we combine the features in the same group into the group-level feature by view-level fusion. Finally, we fuse the group-level feature into a unified space by group-level fusion to obtain the multi-view feature .
图 5.建议的分组机制框架。视图级特征 是从 multi-view images 中提取的。然后,分组机制将视图级特征 划分为不同的组 ,第 -group 中的 元素表示为 。之后,我们通过 view 级融合将同一 group 中的特征合并成 group-level 特征 。最后,我们通过组级融合将组级特征 融合到一个统一空间中,得到多视图特征 。
Fig. 6. Framework of proposed cross-modal feature fusion. The cross-modal feature fusion strategy aims to aggregate the multi-view feature and 2D texture feature of an occluded person. A residual layer is proposed to improve the model's capacity for describing inter-dependencies between 3D geometries and 2D textures. Within the residual connections, the weights , and are used to modify the weights of cross-modal features. The weights and , dynamically adjusting during training. The hyperparameter serves as the balancing factor within the residual connection.
图 6.提出的跨模态特征融合框架。跨模态特征融合策略旨在聚合被遮挡人的多视图特征 和 2D 纹理特征 。提出了一个残差层来提高模型描述 3D 几何和 2D 纹理之间相互依赖关系的能力。在残差连接中,weights 和 用于修改跨模态特征的权重。权重和 ,在训练期间动态调整。超参数 用作残差连接中的平衡因子。
Table 2. Comparison with state-of-the-arts in occluded person ReID. The symbols #1,2,3,4 indicate performance rankings, established by comparing the results of using the 2D RGB image versus the 3D multi-view image as the network input type. The symbols ×1,2,3 indicate performance rankings, established by comparing the results of using the 3D point cloud versus the 3D multi-view image as the network input type.
表 2.与闭塞者 ReID 的最新技术进行比较。符号 #1、2、3、4 表示性能排名,通过比较使用 2D RGB 图像和 3D 多视图图像作为网络输入类型的结果来确定。符号 ×1、2、3 表示性能排名,是通过比较使用 3D 点云与 3D 多视图图像作为网络输入类型的结果来确定的。
Input type 输入类型 | Year 年 | Methods 方法 | Occluded-Duke | Occluded-ReID | ||
---|---|---|---|---|---|---|
Rank-1 第 1 名 | mAP 地图 | Rank-1 第 1 名 | mAP 地图 | |||
2D RGB image 2D RGB 图像 | 2021 2021 2021 2022 2022 2023 | PAT [20] OAMN [3] OAMN[3] OP-ReID [42] PFD [35] FRT [40] 人脸识别[40] BPBReID [32] | 64.5 62.6 69.0 69.5#4 70.7#3 75.1#2 | 53.6 46.1 57.2 61.8#3 61.3#4 62.5#2 | 81.6#3 - 78.5 81.5#4 80.4 82.9#2 | 72.1 - 72.9#4 83.0#2 83.0#2 元 71.0 75.2#3 |
3D point cloud 3D 点云 | 2021 2022 | ASSP [2] OGNet [50] | 38.4 × 3 42.6 × 2 42.6 × 2 | 32.2 × 3 33.7 × 2 33.7 × 2 | 57.5 × 3 57,5×3 58.7 × 2 58.7 × 2 | 47.2 × 3 54.3 × 2 54.3 × 2 |
3D multi-view image 3D 多视图图像 | 2023 | MV-ReID(ours) MV-ReID(我们的) | 76.2#1, × 1 76.2#1、× 1 | 63.4#1, × 1 63.4#1、× 1 | 83.7#1, × 1 83.7#1、× 1 | 76.4#2, × 1 76.4#2, × 1 |
Table 3. Performance comparison on holistic ReID datasets. The symbols #1,2,3,4 indicate performance rankings, derived from comparisons between occluded ReID methods and MV-ReID. The symbols ×1,2,3 indicate performance rankings, derived from comparisons between 3D point-based ReID methods and MV-ReID. The symbols $1,2,3,4 indicate performance rankings, derived from comparisons between holistic ReID methods and MV-ReID.
表 3.整体 ReID 数据集的性能比较。符号 #1、2、3、4 表示性能排名,这些排名是通过遮挡 ReID 方法与 MV-ReID 之间的比较得出的。符号 ×1、2、3 表示性能排名,来自基于 3D 点的 ReID 方法与 MV-ReID 之间的比较。符号 $1、2、3、4 表示性能排名,该排名来自整体 ReID 方法和 MV-ReID 之间的比较。
Task type 任务类型 | Year 年 | Method 方法 | Market-1501 市场-1501 | DukeMTMC 杜克 MTMC | ||
---|---|---|---|---|---|---|
Rank-1 第 1 名 | mAP 地图 | Rank-1 第 1 名 | mAP 地图 | |||
Occluded Person ReID 遮挡人员 ReID | 2021 2021 2022 2022 | OP-ReID [42] PAT [20] PFD [35] FRT [40] | 93.4 96.1#1 95.5#2 95.5#2 | 89.9#2 89.3#4 89.7#3 88.1 | 88.3 91.1#3 91.2#2 90.5#4 | 79.9 81.3#3 83.2#2 81.7#4 |
3D Point-based ReID 3D 基于点的 ReID | 2021 2022 | ASSP [2] OGNet [50] | 95.01×2 95.01×2 元 88.1 × 3 88.1 × 3 | 87.3 × 2 87.3×2 元 72.9 × 3 72.9 × 3 | 88.2 × 2 88.2×2 元 78.5 × 3 78.5 × 3 | 76.1 × 2 60.7 × 3 60.7 × 3 |
Holistic Person ReID 整体人员 ReID | 2021 2021 2021 2022 2023 2023 | TransReID [15] BV-Person [43] BV 人[43] InSTD [29] PPLR [5] MSINet [9] DIP [18] 拨码[18] | 95.0 96.0$3 96.0 美元 3 97.6$1 97.6 美元 1 94.3 95.3 95.8$4 95.8 美元4 | 88.2 89.2 90.8$1 90.8 美元 1 84.4 89.6$4 89.6 美元4 90.8$1 90.8 美元 1 | 89.6 90.5$2 90.5 美元2 95.7$1 95.7 美元 1 - - 91.7$4 91.7 美元4 | 80.6 89.6$1 89.6 美元 1 89.1$2 89.1 美元2 - - 85.2$3 85.2 美元 3 |
3D View-based ReID 基于 3D 视图的 ReID | 2023 | MV-ReID MV-ReID 检测 | 96.1#1,×1,$2 96.1#1,×1,$2 | 89.9#1,×1,$3 售价 89.9#1,×1,$3 | 92.7#1,×1,$2 92.7#1,×1,$2 | 83.7#1,×1,$4 售价 83.7#1,×1,$4 |
Table 4. Analysis of the effect of 3D multi-view and 2D RGB features in MV-ReID performance.
表 4.分析 3D 多视图和 2D RGB 功能对 MV-ReID 性能的影响。
Index 指数 | Multi-view 多视图 | RGB | Occluded-Duke | ||
---|---|---|---|---|---|
Rank-1 第 1 名 | Rank-5 第 5 名 | mAp 地图 | |||
1 | ✓ | × | 62.1 | 69.7 | 54.4 |
2 | × | ✓ | 60.4 | 62.9 | 53.2 |
3 | ✓ | ✓ | 76.2 | 81.7 | 63.4 |
Table 5. Performance comparison with different group number.
表 5.不同组号的性能比较。
Index 指数 | Group number 组号 | Occluded-Duke | ||
---|---|---|---|---|
Rank-1 第 1 名 | Rank-5 第 5 名 | mAP 地图 | ||
1 | 2 | 62.3 | 64.3 | 54.2 |
2 | 3 | 76.2 | 81.7 | 63.4 |
3 | 4 | 73.4 | 77.1 | 61.8 |
4 | 6 | 62.2 | 66.7 | 55.7 |
Table 6. Performance comparison with different component.
表 6.不同组件的性能比较。
Index 指数 | G | C | Occluded-Duke | ||
---|---|---|---|---|---|
Rank-1 第 1 名 | Rank-5 第 5 名 | mAp 地图 | |||
1 | × | × | 56.7 | 63.1 | 52.8 |
2 | ✓ | × | 6 9.2 | 75.4 | 61.6 |
3 | × | ✓ | 68.3 | 73.7 | 57.8 |
4 | ✓ | ✓ | 76.2 | 81.7 | 63.4 |
Table 7. Performance comparison with different values of the hyperparameter.
表 7.hyperparameter 的不同值的性能比较。
β | Occluded-Duke | ||
---|---|---|---|
Rank-1 第 1 名 | Rank-5 第 5 名 | mAP 地图 | |
0.5 | 73.9 | 74.6 | 60.9 |
1 | 73.2 | 75.3 | 61.5 |
1.5 | 76.2 | 81.7 | 63.4 |
2.0 | 74.3 | 77.7 | 62.1 |
2.5 | 63.3 | 69.6 | 57.9 |
Fig. 7. Visualization of the Grouping mechanism. Different colors of box borders represent different groups. The group weight is calculated using Eq. (6).
图 7.Grouping 机制的可视化。不同颜色的框边框表示不同的组。组权重使用方程 (6) 计算。
Fig. 8. A comparative analysis of various methodologies for human reconstruction is presented. In each group, the first column represents the RGB image, followed by the SMPL, ICON, and Pifu reconstruction outcomes in the second, third, and fourth columns, respectively.
图 8.对各种人体重建方法进行了比较分析。在每组中,第一列代表 RGB 图像,然后是第二列、第三列和第四列的 SMPL、ICON 和 Pifu 重建结果。
The slight drop observed may be attributed to the fact that images from some specific views may result in low-quality features, thereby impacting anomaly detection performance. As many works [34,37] have demonstrated, adaptive views can better describe the structure of point clouds, while fixed views may lead to subpar performance. Therefore, selecting suitable views for 2D modality feature extraction could be further investigated to improve anomaly detection performance.