This repository provides two scripts for training a YOLOv11 (here referred to as YOLOv11 in the code) model on the KITTI dataset and then using the trained model to detect objects in a video file.
- Overview
- Prerequisites
- Dataset Setup
- Training the Model
- Detecting Objects in a Video
- Command-Line Arguments Summary
- Results and Outputs
- Troubleshooting
The repository contains two scripts:
train_model.py
: Automates the process of preparing the KITTI dataset for YOLO format, training a YOLOv11 model, and evaluating its performance.detect_object.py
: Uses the trained YOLOv11 model to detect objects in a given video file, annotating the video frames with bounding boxes and labels.
Make sure you have installed:
- Python 3.7 or above
- PyTorch (with CUDA support if you have an NVIDIA GPU)
- Ultralytics YOLO library:
pip install ultralytics
- OpenCV for Python:
pip install opencv-python
- scikit-learn:
pip install scikit-learn
kagglehub
(for the dataset import in the script)
This example uses the KITTI dataset. The train_model.py
script uses kagglehub
to download and set up the dataset. If you prefer to manually download and place the dataset, update the dataset path (KITTI_BASE_DIR
).
-
Automatic Download (Using
kagglehub
):- Ensure you have a Kaggle API key and properly configured environment.
- The script
train_model.py
automatically useskagglehub
to download the dataset.
-
Manual Download (Optional):
- Download the KITTI dataset from Kaggle.
- Extract the dataset and place the files in a directory. Update
KITTI_BASE_DIR
intrain_model.py
to point to this directory.
Follow these steps to train the YOLOv11 model on the KITTI dataset:
-
Configure Training Parameters in
train_model.py
:MODEL_ARCH
: Path to the YOLO model configuration file (e.g.,yolov11n.yaml
).EPOCHS
: Number of training epochs (default: 10).BATCH_SIZE
: Batch size for training (default: 16).IMG_SIZE
: Input image size (width and height).PROJECT_NAME
: Directory name under which training results will be saved.EXPERIMENT_NAME
: Subdirectory name under the project directory to store this experiment's results.
-
Run Training Script:
python train_model.py
The script will:
- Download and prepare the KITTI dataset.
- Convert KITTI annotations to YOLO format.
- Split the dataset into training and validation sets.
- Generate a
data.yaml
file for YOLO training. - Train the YOLOv11 model.
- Validate the model and print metrics like precision, recall, and F1 score.
-
Check Training Outputs:
- The trained model weights (
best.pt
) and logs will be saved inPROJECT_NAME/EXPERIMENT_NAME/weights
. - Validation metrics and logs will be displayed in the console. The
best.pt
weights represent the best model observed during training.
- The trained model weights (
Example Output:
- The script prints progress and metrics. After training, you should see lines like:
Training completed! ... Validation completed! Validation Metrics: ...
Once the model is trained, use the detect_object.py
script to detect objects in a video.
-
Ensure Model Weights:
- The
best.pt
(or any trained.pt
file) is present in the directory where the script is run or specify its path using the--weights
argument.
- The
-
Run Inference on a Video:
python detect_object.py --input path/to/input_video.mp4
The script reads the video from
path/to/input_video.mp4
, performs object detection frame by frame, and outputs an annotated video namedinput_video_annotated.mp4
in the same directory by default.
Key Arguments:
--input
: Path to the input video file (required).--output
: Path to the output annotated video file. If not specified, the script appends_annotated
to the input filename.--weights
: Path to the trained YOLO model weights file. Default isbest.pt
.--conf-threshold
: Confidence threshold for detections (default:0.25
).--iou-threshold
: IoU threshold for NMS (default:0.45
).--show-live
: If set, displays the video with annotations as it processes.
Example Command:
python detect_object.py --input input_video.mp4 --weights YOLOv11-KITTI/exp1/weights/best.pt --show-live
This uses the trained weights at YOLOv11-KITTI/exp1/weights/best.pt
, performs detection on input_video.mp4
, displays annotated frames in real-time, and saves the annotated video.
The script does not require command-line arguments as it is configured within the code. Adjust the hyperparameters in the script directly.
Download my best model weights from here
python detect_object.py --input VIDEO_FILE.mp4 [--output OUTPUT_FILE.mp4] [--weights PATH_TO_BEST.pt] [--conf-threshold 0.25] [--iou-threshold 0.45] [--show-live]
Arguments:
--input
: (Required) Path to input video file.--output
: Path to save the annotated video file. Defaults to appending_annotated
before the file extension of--input
.--weights
: Path to the YOLO model weights file (.pt
). Defaults tobest.pt
.--conf-threshold
: Confidence threshold (default:0.25
).--iou-threshold
: IoU threshold for NMS (default:0.45
).--show-live
: Display processed frames in a window as the script runs. Pressq
to stop processing early.
-
Training Outputs:
- Directory
PROJECT_NAME/EXPERIMENT_NAME/weights
contains:best.pt
: Model weights of the best-performing epoch.last.pt
: Model weights of the final epoch.
- Validation metrics and logs printed in the console.
- Directory
-
Video Inference Outputs:
- The annotated video file is saved at the specified
--output
location or with_annotated
appended to the input file name. - The script prints status messages during processing and confirms the output file location upon completion.
- Optionally, if
--show-live
is used, a window displays real-time annotation.
- The annotated video file is saved at the specified
- No GPU Available: If
device = 'cpu'
is printed, ensure your environment has GPU support or that CUDA is installed. The model will still run on CPU but may be slower. - File Not Found: If the dataset or input video paths are incorrect, ensure paths are correct and files exist.
- Inconsistent Results: Adjust hyperparameters like
BATCH_SIZE
,EPOCHS
,conf_threshold
, andiou_threshold
to match your dataset characteristics. - Missing Dependencies: If modules like
cv2
orultralytics
are not found, install them usingpip install
.
Note: The script names or the YOLO version references may differ. Ensure consistency in the model's .pt
file usage and the YOLO version configured in MODEL_ARCH
.
By following this README, you can successfully train a YOLOv11 model using the KITTI dataset and then detect objects in any given video file using the trained model.