LRHPerception - Monocular Real-time Perception for Autonomous Driving

Unified perception pipeline achieving 29 FPS with object tracking, trajectory prediction, road segmentation, and depth estimation

Project Overview

Development of LRHPerception (Low-cost, Real-time, High Information richness), a comprehensive monocular perception system for autonomous driving that processes single-camera video feeds to provide real-time interpretation of driving environments. This research addresses the computational limitations of existing multi-camera fusion systems while maintaining interpretability advantages over end-to-end approaches.

Duration: April 2023 – August 2023
Role: Co-Lead Developer (Equal Contribution)
Team Size: 5 members
Institution: University of Rochester
Collaborators: Aiyinsi Zuo, Zirui Li, Haixi Zhang*, Chunshu Wu, Prof. Tong Geng, Prof. Zhiyao Duan

Technical Implementation

Hardware Platform

  • GPU: Single RTX 3090 for training and inference
  • Performance Target: Real-time processing (29+ FPS)
  • Memory Usage: 3.2GB VRAM during inference
  • Power Consumption: 15W on embedded platforms

Software Architecture

  • Framework: PyTorch with CUDA acceleration
  • Backbone: Swin Transformer for optimal feature extraction
  • Integration: Shared feature extraction across all four perception tasks
  • Training: Cross-dataset approach (KITTI, Cityscapes, JAAD, PIE)

Core Modules

  • C-BYTE Object Tracking: Camera-Calibrated BYTE with motion compensation
  • Trajectory Prediction: Conditional Variational Autoencoder with step-wise goal estimation
  • Road Segmentation: Focused single-class segmentation with optimized decoder
  • Depth Estimation: Coarse-refine architecture with secondary refinement flow

Key Achievements

Performance Breakthrough

  • Achieved 29 FPS real-time performance on single GPU
  • 555% acceleration over fastest local mapping method
  • Order-of-magnitude speed improvement while maintaining accuracy
  • First unified package combining all four perception tasks

Object Tracking Excellence

  • MOTA Score: 76.9% (vs 76.6% ByteTrack baseline)
  • IDF1 Score: 81.2% (vs 79.3% ByteTrack baseline)
  • Processing Time: 31.0ms per frame
  • Implemented camera motion correction with Lucas-Kanade optical flow

Trajectory Prediction Innovation

  • JAAD Dataset MSE: 43/113/283 (0.5s/1.0s/1.5s prediction horizons)
  • PIE Dataset MSE: 19/44/104 (0.5s/1.0s/1.5s prediction horizons)
  • Processing Speed: 111/104/92.6 FPS for 8/12/24 object batches
  • 40× faster than highest accuracy alternative method

Segmentation & Depth Accuracy

  • Road Segmentation mIOU: 88.9% on Cityscapes dataset
  • Depth RMS Error: 0.229 on KITTI dataset
  • Depth δ1 Accuracy: 96.6% (pixels within 1.25× ground truth)
  • 577% faster than best alternative depth estimation method

Research Impact

  • Published: Peer-reviewed conference paper
  • Novel Integration: First work to unify these four perception tasks
  • Real-world Validation: Extensive testing on KITTI dataset
  • Practical Deployment: Suitable for actual autonomous driving systems

Technologies Used

Deep Learning Frameworks:

  • PyTorch for model development and training
  • CUDA for GPU acceleration and optimization
  • Mixed precision training for efficiency

Computer Vision Libraries:

  • OpenCV for image processing and preprocessing
  • Multi-scale feature extraction and fusion

Autonomous Driving Datasets:

  • KITTI for object detection and depth estimation
  • Cityscapes for road segmentation validation
  • JAAD/PIE for trajectory prediction benchmarking

Optimization Techniques:

  • Shared Transformer backbone architecture
  • Cross-task feature sharing and skip connections
  • Real-time inference optimization with TensorRT

Results & Impact

  • Academic Contribution: Novel unified perception framework for autonomous driving
  • Performance: State-of-the-art accuracy with dramatic speed improvements
  • Practical Value: Bridge between research and real-world deployment
  • Industry Relevance: Cost-effective single-camera autonomous driving solution

References