3D Affordance Grounding 方向复盘

BinaryOracle2025/9/15大约 3 分钟约 782 字

3D Affordance Grounding 方向复盘

点云 + 文本

Affogato (Arxiv 2025.06)

特点:

AFFOrdance Grounding All aT Once
a large-scale dataset for 3D and 2D affordance grounding
minimalistic architecture

损失函数:

Focal Loss to handle class imbalance
Dice Loss to improve region-level alignment.

现状:

wait for code release
dataset available

SeqAfford (CVPR 2025)

特点:

Propose a 3D multimodal large language model (referring to the LLaVA model architecture)
Feed the <SEG> segmentation tokens output by the 3D MMLLM into the multi-granularity language-point cloud combination module to complete 3D dense prediction
Support sequential instruction execution
Large-scale instruction-point cloud pair dataset: A dataset with 180,000 instruction-point cloud pairs, covering single and sequential operability reasoning tasks

损失函数:

Autoregressive Cross-Entropy Loss
Dice Loss
Binary Cross-Entropy Loss

现状:

code available
dataset available

LASO (CVPR 2024)

需要二次回顾思考

特点:

PointRefer : The Adaptive Fusion Module is responsible for injecting semantic information at multiple scales. The Referred Point Decoder will introduce a set of affordance queries to interact with the point cloud features and complete the generation of dynamic convolution kernels.
LASO Dataset : 19,751 question-point affordance pairs

损失函数:

Focal Loss + Dice Loss

现状:

code and dataset available

点云 + 图像

IAGNet (ICCV 2023)

特点:

Learn from 2D interactive images and generalize to 3D point clouds to infer affordance regions

Joint_Region_Alignment(JRA), Affordance_Revealed_Module(ARM), Alignment of feature distributions between image and point cloud regions (KL Loss), Local + Global Prediction

Propose the PIAD dataset: It comprises 7012 point clouds and 5162 images, spanning 23 object classes and 17 affordance categories.

损失函数:

Heatmap Loss (HM_Loss): Point-wise 3D affordance mask prediction = Focal Loss + Dice Loss
Cross-Entropy Loss (CE Loss): Global affordance classification
KL-Divergence Loss (KL Loss): Make the feature distributions of the interaction regions on the image side close to those on the point cloud side

现状:

code and dataset available

点云 + 文本 + 图像

GREAT (CVPR 2025)

特点:

grounding 3D object affordance in an Open-Vocabulary fashion
Multi-Head Affordance Chain-of-Thought

Data preparation stage:
Use prompts to generate descriptions of the object interaction area, the morphology(形态学) of the interaction area, the interaction behavior, and other common interaction behaviors of the object.
Geometric structure knowledge = Answers to Prompt 1 + Prompt 2 = Interaction parts + Inference of geometric properties of these parts
Interaction knowledge = Answers to Prompt 3 + Prompt 4 = Current interaction + Analogous(类似的)/supplementary(补充) interaction methods

PIADv2 dataset

24 affordance , 43 object categories, 15K interaction images , 38K 3D objects with annotations.

损失函数:

Focal Loss to handle class imbalance
Dice Loss to improve region-level alignment.

现状:

code available
dataset available

LMAffordance3D (CVPR 2025)

特点:

Combine language instructions, visual observations, and interaction information to locate the affordance of manipulable objects in 3D space.
AGPIL（Affordance Grounding dataset with Points, Images and Language instructions）

This dataset includes estimations of object affordances observed from full-view, partial-view, and rotated perspectives, taking into account factors such as real-world observation angles, object rotation, and spatial occlusion (遮挡).

损失函数:

focal loss
dice loss

现状:

wait for code release
dataset available

3D Gaussian Splatting (3DGS)

GEAL (CVPR 2025)

特点:

"Knowledge Distillation" from 2D to 3D: Transfer the semantic capabilities of pre-trained 2D models to the 3D affordance prediction model through Gaussian splat mapping, cross-modal consistency alignment, and multi-scale fusion.
Noisy Dataset: Construct a new benchmark with multiple types of noise/damage to evaluate the generalization and robustness of the model under real/harsh conditions.

损失函数:

BCE
Dice Loss
Consistency Loss（MSE 损失）

现状:

wait for code release
wait for dataset release

3D Affordance Grounding 方向复盘

点云 + 文本

Affogato (Arxiv 2025.06)

SeqAfford (CVPR 2025)

LASO (CVPR 2024)

点云 + 图像

IAGNet (ICCV 2023)

点云 + 文本 + 图像

GREAT (CVPR 2025)

LMAffordance3D (CVPR 2025)

3D Gaussian Splatting (3DGS)

GEAL (CVPR 2025)

3DAffordSplat (Arxiv 2025.04)

IAAO (CVPR 2025)