SiMO: Single-Modality-Operable Multimodal Collaborative Perception
Paper
β’ 2603.08240 β’ Published
This repository contains pretrained checkpoints for SiMO (Single-Modal-Operable Multimodal Collaborative Perception), a novel framework for robust multimodal collaborative 3D object detection in autonomous driving.
Title: Single-Modal-Operable Multimodal Collaborative Perception
Conference: ICLR 2026
OpenReview: Link
ArXivοΌLink
| Model | Dataset | Architecture | Checkpoint |
|---|---|---|---|
| SiMO-PF | OPV2V-H | Pyramid Fusion + LAMMA | Download |
| SiMO-AttFuse | OPV2V-H | AttFusion + LAMMA | Download |
| Modality | AP@30 | AP@50 | AP@70 |
|---|---|---|---|
| LiDAR + Camera | 98.30 | 97.94 | 94.64 |
| LiDAR-only | 97.32 | 97.07 | 94.06 |
| Camera-only | 80.81 | 69.63 | 44.82 |
git clone https://github.com/dempsey-wen/SiMO.git
cd SiMO
pip install -r requirements.txt
# Install huggingface-hub
pip install huggingface-hub
# Download specific checkpoint
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='DempseyWen/SiMO', filename='***.pth')"
For complete documentation, training scripts, and data preparation instructions, please visit our GitHub repository.
This work builds upon:
If you find this work useful, please cite:
@inproceedings{wen2026simo,
title={Single-Modal-Operable Multimodal Collaborative Perception},
author={Wen, Dempsey and Lu, Yifan and others},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
MIT License - see LICENSE for details.