RoboticsSoftwareHardware

Autonomous Self-Following Cart

Year2024
Duration5 months
RoleSystem Integration — ROS Architecture, Sensor Fusion, Hardware
Autonomous Self-Following Cart — Hero Image
Problem

Carrying heavy gear across campsites, warehouses, or shopping areas is physically demanding — and existing electric carts still require manual steering with no obstacle avoidance or person-following capability.

Solution

Designed and built a vision + UWB sensor fusion mobile robot that autonomously follows a user while avoiding obstacles in real-time, integrating OAK-D perception, ROS2-based control, and differential drive motor control into a complete working prototype.

Outcome

Working prototype achieved real-time person tracking, autonomous following at walking speed, and depth-based obstacle avoidance — full hardware + software integration from perception through planning to motor control.

Overview

Most autonomous cart projects stop at simulation. We built the real thing — a differential-drive cart that sees you with a depth camera, knows where you are with UWB, and follows you while dodging obstacles. The system covers the full robotics stack: perception (YOLO + depth), localization (vision + UWB fusion), planning (following algorithm + obstacle avoidance), and control (ESC-driven differential drive). The hardest part wasn't any single component; it was making vision, localization, and motor control work together reliably in the real world.

System Architecture

System Architecture Diagram

The system follows a layered robotics pipeline: Sensors (OAK-D Camera + UWB Module) feed into a Perception System (YOLO person detection, target tracking, depth processing, obstacle detection), which passes through a Sensor Fusion layer combining vision position with UWB distance/angle data. The fused target estimation feeds into a Motion Planner (follow algorithm, obstacle avoidance, speed control), which outputs ROS2 cmd_vel commands. These are converted to differential wheel speeds, sent to ESC controllers, and drive the physical cart movement.

Key Highlights

01

Hardware + Software Integration

most student projects stop at simulation; this project delivered a complete working prototype with AI + robotics + hardware

02

Multi-Sensor Fusion

vision + UWB asymmetric fusion architecture provides robust tracking even during occlusion

03

Full Robotics Stack

covers perception, planning, control, and hardware integration end-to-end

Process

Problem Scoping & Competitor Analysis — Diagram / Photo
01

Problem Scoping & Competitor Analysis

Researched existing following robots and autonomous carts on the market. Identified critical gaps: no existing product combines person following with obstacle avoidance and sensor fusion. Most carts require manual steering or remote control. Set design goals: autonomous person following, real-time obstacle avoidance, and robust tracking even during temporary occlusion.

Sensor Selection & Hardware Design — Diagram / Photo
02

Sensor Selection & Hardware Design

Selected OAK-D camera for its triple capability: RGB imaging, stereo depth, and onboard AI acceleration via Myriad X VPU — enabling neural inference directly on the camera module. Chose DWM3001CDK UWB module for relative distance and angle estimation, receiving iPhone position data via BLE as a backup tracking source. Computing runs on Raspberry Pi 4/5 with SSD storage. Power system uses 24V LiPo battery providing approximately 25 minutes of prototype runtime.

Perception Pipeline — Person Detection — Diagram / Photo
03

Perception Pipeline — Person Detection

OAK-D captures RGB frames → YOLOv8 runs real-time person detection → bounding box coordinates are mapped to the stereo depth map → person position extracted as (x, z) coordinates in camera frame. This gives both the lateral offset (for steering) and distance (for speed control) to the target person. All perception processing runs on the Raspberry Pi using the DepthAI SDK with OpenCV.

Obstacle Avoidance System — Diagram / Photo
04

Obstacle Avoidance System

Depth camera data is processed independently for obstacle detection. The pipeline extracts the center region of the depth image, calculates the minimum depth value in that region, and compares against a safety threshold. If any obstacle is detected within 1 meter, the cart immediately stops forward motion and rotates left or right to find a clear path. Obstacle avoidance always takes priority over the following algorithm — safety overrides tracking.

Sensor Fusion Strategy — Diagram / Photo
05

Sensor Fusion Strategy

Vision serves as the primary tracking source for accurate person position estimation with full spatial awareness. UWB provides backup correction specifically for three scenarios: person occlusion (someone walks between cart and user), tracking loss (YOLO loses detection), and heading drift correction. The fusion architecture is intentionally asymmetric — vision leads, UWB corrects — rather than weighted averaging, because vision provides richer spatial data when available.

Following Algorithm & Control — Diagram / Photo
06

Following Algorithm & Control

The following control is visual servoing-based. From the target person position, distance (z-axis) controls forward speed: forward_speed = min(0.5, distance × k), capping maximum velocity. Lateral offset (x-axis) controls turn rate: turn_speed = −angle. This creates smooth, proportional following behavior — the cart accelerates toward distant targets and decelerates as it closes the gap, while continuously correcting its heading to keep the person centered.

ROS2 Architecture & Topic Design — Diagram / Photo
07

ROS2 Architecture & Topic Design

Full ROS2 system with clearly separated topic structure. Perception topics: /rgb/image_rect, /stereo/depth, /stereo/points, /oak/imu. Detection topics: /yolo/detections_json, /yolo/image. Localization topics: /uwb/distance, /uwb/angle, /uwb/pose. Control topic: /cmd_vel. Each node is independently testable and the topic structure enables clean separation between perception, detection, localization, and control subsystems.

Motor Control & Hardware Integration — Diagram / Photo
08

Motor Control & Hardware Integration

cmd_vel messages are converted into individual wheel speeds for the differential drive configuration. An ESC controller translates speed commands into PWM signals for the brushed DC motors. Left and right wheel speed differences create turning behavior. The motor control node communicates via pyserial over USB. Final integration included mounting all components (camera, Pi, UWB module, ESC, battery) onto the cart chassis with proper cable management and vibration isolation.

System Integration & Field Testing — Diagram / Photo
09

System Integration & Field Testing

The most challenging phase — bringing all subsystems together. Sensor synchronization between vision (30fps) and UWB (variable rate via BLE) required careful timestamp management. Field testing revealed depth noise in outdoor environments with direct sunlight, BLE connection instability with the UWB module, and battery runtime limitations. Iterative tuning of PID gains, detection confidence thresholds, and obstacle avoidance parameters across multiple outdoor test sessions.

Gallery

Image 1
Image 2
Image 3
Image 4
Image 5
Image 6

Technical Details

OAK-D Camera — RGB + Stereo Depth + Myriad X VPU (onboard neural inference)YOLOv8 person detection, real-time inference on Raspberry PiDWM3001CDK UWB module — distance + angle estimation via BLEROS2 on Raspberry Pi 4/5 with SSD, PythonVisual servoing-based following control algorithmDepth-based obstacle avoidance (1m safety threshold)Differential drive — brushed DC motor + ESC + PWM control24V LiPo battery, ~25 min prototype runtimeDepthAI SDK, OpenCV, cv_bridge, NumPy, pyserialVision-primary / UWB-backup asymmetric sensor fusion

Challenges

!

Depth Noise in Outdoor Environments

Stereo depth accuracy degraded significantly under direct sunlight and with reflective surfaces, requiring adaptive threshold tuning and confidence filtering.

!

UWB Connection Stability

BLE connection between the UWB module and iPhone was intermittent, causing gaps in backup localization data. Required reconnection logic and graceful fallback handling.

!

Battery Runtime Limitations

24V LiPo provided only ~25 minutes of operation with all systems active. Power budgeting and duty cycling of non-critical sensors were explored but not fully implemented.

!

Sensor Synchronization

Vision (30fps) and UWB (variable BLE rate) operate on different timing cycles. Timestamp alignment and data freshness validation were critical for reliable fusion.

Key Learnings

01

Single-sensor dependency is dangerous — multi-sensor fusion architecture is essential for robustness in real-world conditions

02

Simulation ≠ reality — real-world noise, lighting variation, and mechanical play introduce problems that never appear in simulation

03

Hardware integration is harder than software — mounting, cabling, vibration, and thermal management are underestimated engineering challenges

04

Full-stack robotics (perception → planning → control → hardware) requires deep integration testing that can't be unit-tested in isolation

Future Work

SLAM integration (slam_toolbox, rtabmap) for map-based navigationROS2 Nav2 with DWA planner for path planning in complex environmentsTerrain adaptation — slope detection and rough terrain controlMulti-user tracking with specific user identification and re-identification

My Scope

System ArchitectureSensor FusionROS2 IntegrationPerception PipelineMotor ControlHardware Build