Robust Visual Embodiment: How Robots Discover Their Bodies in Real Environments

Robin ChhabraAmmar J Mahmood,   Salim Rezvani

Toronto Metropolitan University (Formerly Ryerson)

ArXiv |   Appendix

Self-modeling pipeline overview showing semantic segmentation, denoising techniques, and morphology reconstruction Robots with internal visual self-models promise unprecedented adaptability, yet existing autonomous modeling pipelines remain fragile under realistic sensing conditions such as noisy imagery and cluttered backgrounds. This paper presents the first systematic study quantifying how visual degradations—including blur, salt-and-pepper noise, and Gaussian noise—affect robotic self-modeling. Through both simulation and physical experiments, we demonstrate their impact on morphology prediction, trajectory planning, and damage recovery in state-of-the-art pipelines. To overcome these challenges, we introduce a task-aware denoising framework that couples classical restoration with morphology-preserving constraints, ensuring retention of structural cues critical for self-modeling. In addition, we integrate semantic segmentation to robustly isolate robots from cluttered and colorful scenes. Extensive experiments show that our approach restores near-baseline performance across simulated and physical platforms, while existing pipelines degrade significantly. These contributions advance the robustness of visual self-modeling and establish practical foundations for deploying self-aware robots in unpredictable real-world environments.


Visual Noise Impact on Self-Modeling

Mechanical design of the printed robot highlighting its structural layout

We evaluate the impact of three representative noise types on robotic self-modeling:

These degradations propagate through self-modeling pipelines, distorting morphology inference and reducing downstream task performance.


Method Overview

Experimental imaging setup with the robot positioned in front of the camera

Our task-aware denoising pipeline operates in three steps: 1) Semantic segmentation to isolate the robot from cluttered backgrounds, 2) Integration with self-modeling using the Free-Form Kinematic Self-Model (FFKSM) framework, and 3) Noise-specific filtering using Wiener filtering for blur, median filtering for salt-and-pepper noise, and Non-Local Means with IFT-SVM for Gaussian noise. This ordering ensures that the robot is first separated from the background and correctly modeled before denoising restores corrupted inputs.

Semantic Segmentation Results

Semantic segmentation comparison on different backgrounds Semantic Segmentation Comparison
FFKSM vs Our Method across different cluttered backgrounds
Baseline morphology reconstruction error comparison Baseline Performance
Morphology reconstruction error on noise-free dataset

Physical Robot Experiments

We validate our approach on a 4-DOF robotic manipulator fabricated using 3D-printed PLA components and Dynamixel XL330-M288 servos. The kinematic chain consists of a rotating base, two intermediate links, and a terminal end-effector, each with ±90° rotation. The platform introduces realistic sources of variation including mechanical tolerances, material imperfections, and actuator variability, providing a challenging testbed for self-modeling beyond strictly synthetic settings.
Comparison of morphology reconstruction error under Gaussian blur

Morphology reconstruction error under Gaussian blur with and without Wiener filtering

Comparison of morphology reconstruction error under salt-and-pepper noise

Morphology reconstruction error under salt-and-pepper noise with and without median filtering

Comparison of morphology reconstruction error under Gaussian noise

Morphology reconstruction error under Gaussian noise with and without Non-Local Means denoising


Technical Details

Hardware components Hardware and Software
  • 4-DOF robotic manipulator with 3D-printed PLA components
  • Dynamixel XL330-M288 servos
  • Intel RealSense D435 RGB-D camera
  • Ubuntu Linux, Python implementation
  • Free-Form Kinematic Self-Model (FFKSM) framework
Denoising techniques comparison Denoising Techniques
  • Wiener filtering for blur removal
  • Median filtering for salt-and-pepper noise
  • Non-Local Means with IFT-SVM for Gaussian noise
  • Semantic segmentation for cluttered backgrounds
  • Task-aware morphology preservation constraints

Results Summary

Our experimental results demonstrate three key findings:


BibTeX

@article{robust_visual_embodiment2025,
  title={Robust Visual Embodiment: How Robots Discover Their Bodies in Real Environments},
  author={Robin Chhabra and Ammar J Mahmood and Salim Rezvani},
  journal={Conference/Journal Name},
  year={2025}
}

Contact

Please reach out to ammar.j.mahmood@torontomu.ca or robin.chhabra@torontomu.ca for questions about this research.