Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction

Article Overview

This work explores image-based visual servoing (IBVS) for aerial robots using deep learning-driven keypoint detection. A monocular RGB camera onboard the UAV captures real-time visual data, which is processed by a convolutional neural network (CNN) to extract key visual features for control.

Unlike traditional methods that rely on artificial markers, this approach eliminates that dependency and enhances robustness against occlusions, lighting changes, background clutter, and scene variability. As a result, it significantly extends the practicality of perception-based control for aerial platforms in dynamic environments.

The system’s performance is rigorously evaluated through physics-based simulations in ROS-Gazebo, offering a more realistic assessment compared to marker-dependent or simplified simulation environments. A demonstration video is available: here

Key Features

Marker-Free Keypoint Detection: Utilizes a CNN to extract visual features from RGB images, removing the reliance on artificial fiducial markers.
Robustness to Real-World Variability: Enhances performance under challenging conditions such as occlusion, illumination changes, background clutter, and dynamic scenes.
Realistic Evaluation Environment: Validated through physics-based ROS-Gazebo simulations, offering a more accurate representation of real-world aerial robot behavior compared to simplified simulation setups.

Figures

Various worlds created in Gazebo were utilized to carry out robustness tests.

You can read the full paper here.

Citation

@inproceedings{sepahvand2025deep,
  title={Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction},
  author={Sepahvand, Shayan and Amiri, Niloufar and Janabi-Sharifi, Farrokh},
  booktitle={2025 International Conference on Unmanned Aircraft Systems (ICUAS)},
  pages={37--43},
  year={2025},
  organization={IEEE}
}

Article Overview

Key Features

Figures

Read More

Citation