This project aims to demonstrate how to configure visible and infrared datasets to accommodate multimodal object detection tasks based on YOLOv11. With three different configuration methods (directory configuration and TXT file configuration), you can easily associate visible light datasets with infrared datasets.
YAML files for all YOLO series from YOLOv3 to YOLOv12, along with corresponding RGBT YAML files, have been added.
The training mode of YOLOv11 itself is retained. It is recommended to learn how to set up the YOLOv11 environment and how to use it before using this project (YOLOv11 environment can be used seamlessly).
Added YAML files for all YOLO series from YOLOv3 to YOLOv12, as well as the corresponding RGBT YAML files.
Retained the original training mode of YOLOv11. It is recommended to first learn how to set up the environment and usage of YOLOv11 before using this project (which can seamlessly utilize the environment of YOLOv11).
Supports multi-spectral object detection, multi-spectral keypoint detection, and multi-spectral instance segmentation tasks.
Compared to YOLOv11, two additional parameters have been added: channels, use_simotm, and the ch in the YAML model file must correspond accordingly.
channels: 1 # (int) Number of model channels, detailed introduction is provided below.
use_simotm: SimOTMBBS # (str) The training mode used, such as BGR, RGBT, Gray, etc.

Among them, the directory format of 1-4 is consistent with YOLOv8. With train.txt and val.txt, all you need to do is write the image address below visible, and the data format directory of 'RGBT' is as follows:
In YOLOv8, the visible light (visible) directory must conform to the dataset configuration principles. Additionally, an infrared (infrared) directory must exist at the same level as the visible light directory. Furthermore, the dataset should be divided into train and val (optional) subdirectories for training and validation purposes, respectively.
Below are three recommended configuration methods:
visible so that the program can automatically replace it with infrared.load_image function in ultralytics/data/base.py.Store visible and infrared data in directories at the same level, with each modality divided into train and val subdirectories. The directory structure is as follows:
dataset/ # Root directory of the dataset ├── train/ # Store training data │ ├── visible/ # Data related to visible light images │ │ ├── images/ # Visible light image files │ │ └── labels/ # Label files for visible light images (e.g., annotation information) │ └── infrared/ # Data related to infrared images │ ├── images/ # Infrared image files │ └── labels/ # Label files for infrared images (e.g., annotation information) └── val/ # Store validation data ├── visible/ # Data related to visible light images │ ├── images/ # Visible light image files │ └── labels/ # Label files for visible light images (e.g., annotation information) └── infrared/ # Data related to infrared images ├── images/ # Infrared image files └── labels/ # Label files for infrared images (e.g., annotation information) --------------------------------------------------------------------- # KAIST.yaml # train and val data as 1) directory: path/images/ train: dataset/train/visible/images # 7601 images val: dataset/val/visible/images # 2257 images # number of classes nc: 1 # class names names: [ 'person', ] -----------------------------------------------------------------------
The program will automatically recognize visible and infrared data through the directory structure.
Under the second-level directory, store visible and infrared data in directories at the same level, with each modality divided into train and val subdirectories. The directory structure is as follows:
dataset/ ├── images/ │ ├── visible/ │ │ ├── train/ # Store training visible light images │ │ └── val/ # Store validation visible light images │ └── infrared/ │ ├── train/ # Store training infrared images │ └── val/ # Store validation infrared images └── labels/ ├── visible/ │ ├── train/ # Store training visible light image labels │ └── val/ # Store validation visible light image labels └── infrared/ ├── train/ # Store training infrared image labels └── val/ # Store validation infrared image labels --------------------------------------------------------------------- # KAIST.yaml # train and val data as 1) directory: path/images/ train: dataset/images/visible/train # 7601 images val: dataset/images/visible/val # 2257 images # number of classes nc: 1 # class names names: [ 'person', ] -----------------------------------------------------------------------
images/: Stores all image data.
visible/: Contains visible light images.
train/: Visible light images for model training.val/: Visible light images for model validation.infrared/: Contains infrared images.
train/: Infrared images for model training.val/: Infrared images for model validation.labels/: Stores all image label information (e.g., annotation files, comments).
visible/: Contains labels for visible light images.
train/: Labels for the training set of visible light images.val/: Labels for the validation set of visible light images.infrared/: Contains labels for infrared images.
train/: Labels for the training set of infrared images.val/: Labels for the validation set of infrared images.The program will automatically recognize visible and infrared data through the directory structure.
Use TXT files to specify data paths. The TXT file content should include visible light image paths, and the program will automatically replace them with the corresponding infrared paths. TXT files need to specify the paths for the training and validation sets (default configuration method for YOLOv5, YOLOv8, YOLOv11).
dataset/ ├── images/ │ ├── visible/ # Store visible light images │ │ ├── image1.jpg │ │ └── image2.jpg │ │ └── ... │ └── infrared/ # Store visible light images │ ├── image1.jpg │ └── image2.jpg │ └── ... └── labels/ ├── visible/ # Store visible light labels │ ├── image1.txt │ └── image2.txt └── infrared/ # Store infrared light labels ├── image1.txt └── image2.txt --------------------------------------------------------------------- # VEDAI.yaml train: G:/wan/data/RGBT/VEDAI/VEDAI_train.txt # 16551 images val: G:/wan/data/RGBT/VEDAI/VEDAI_trainval.txt # 4952 images # number of classes nc: 9 # class names names: ['plane', 'boat', 'camping_car', 'car', 'pick-up', 'tractor', 'truck', 'van', 'others'] -----------------------------------------------------------------------
Example TXT File Content:
train.txt
dataset/images/visible/image1.jpg dataset/images/visible/image2.jpg dataset/images/visible/image3.jpg
val.txt
dataset/images/visible/image4.jpg dataset/images/visible/image5.jpg dataset/images/visible/image6.jpg
The program will replace visible with infrared in the paths to find the corresponding infrared images.
In the load_image function in ultralytics/data/base.py, there is a line of code that replaces visible with infrared in the visible light path. Therefore, as long as there is an infrared directory at the same level as the visible light directory, the program can correctly load the corresponding infrared data.
git clone https://github.com/wandahangFY/YOLOv11-RGBT.git
cd YOLOv11-RGBT
Configure your dataset directory or TXT file according to one of the three methods mentioned above.
(It is recommended to directly use the YOLOv11 or YOLOv8 environment that has already been set up on this computer, without the need to download again.)
pip install -r requirements.txt
python train.py --data your_dataset_config.yaml
Below are the Python script files for different training modes included in the project, each targeting specific training needs and data types.
4.1. train.py
train-rtdetr.py
train_Gray.py
train_RGBRGB.py
train_RGBT.py
Run the test script to verify if the data loading is correct:
python val.py
train and val subdirectories under each modality.visible so that the program can automatically replace it with infrared.load_image function in ultralytics/data/base.py.Here are the Baidu Netdisk links for the converted VEIAI, LLVIP, KAIST, M3FD datasets (you need to change the addresses in the yaml files. If you use txt files to configure yaml files, you need to replace the addresses in the txt files with your own addresses: open with Notepad, Ctrl+H). (Additionally, if you use the above datasets, please correctly cite the original papers. If there is any infringement, please contact the original authors, and it will be removed immediately.)
Baidu Netdisk Link: Link: https://pan.baidu.com/s/1xOUP6UTQMXwgErMASPLj2A Extraction Code: 9rrf
PRs or Issues are welcome to jointly improve the project. This project is a long-term open-source project and will continue to be updated for free in the future, so there is no need to worry about cost issues.

YOLO-MIF: Improved YOLOv8 with Multi-Information fusion for object detection in Gray-Scale images
https://www.sciencedirect.com/science/article/pii/S1474034624003574
D. Wan, R. Lu, Y. Fang, X. Lang, S. Shu, J. Chen, S. Shen, T. Xu, Z. Ye, YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework, (2025). https://doi.org/10.48550/arXiv.2506.14696.
@misc{wan2025yolov11rgbtcomprehensivesinglestagemultispectral, title={YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework}, author={Dahang Wan and Rongsheng Lu and Yang Fang and Xianli Lang and Shuangbao Shu and Jingjing Chen and Siyuan Shen and Ting Xu and Zecong Ye}, year={2025}, eprint={2506.14696}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.14696}, }
Thank you for your interest and support in this project. The authors strive to provide the best quality and service, but there is still much room for improvement. If you encounter any issues or have any suggestions, please let us know. Furthermore, this project is currently maintained by the author personally, so there may be some oversights and errors. If you find any issues, feel free to provide feedback and suggestions.
Other open-source projects are being organized and released gradually. Please check the author's homepage for downloads in the future. Homepage