Traffic Sign Detection with YOLO

This model is a proof of concept designed to detect and classify United States traffic signs in real time to support drivers, Advanced Driver Assistance Systems (ADAS), and autonomous vehicles. The model uses YOLO Object Detection architectures and compares three versions: YOLOv8 (baseline), YOLOv11 (architectural improvements), and YOLOv26 (which introduces NMS-free predictions). All models were fine-tuned from pretrained YOLO weights using a custom dataset of 13,921 images across 21 classes. To improve performance in real-world driving conditions, training included data augmentations such as Blur, Motion Blur, and Camera Gain. The models were trained on an NVIDIA RTX 3080 GPU for 100 epochs with a batch size of 32 using an 82/12/6 Train/Validation/Test split. The goal of this project is to evaluate detection performance across different YOLO architectures while demonstrating a traffic sign detection system for real-world driving scenarios.


Training Data

Data Collection Methodology

The dataset was created by combining multiple publicly available traffic sign datasets from Roboflow Universe. The LISA dataset served as the baseline for class naming, and the class set was reduced from 47 to 21 based on the most commonly observed traffic signs across the collected datasets.

Annotation Process

The images in the combined datasets were already pre-annotated with tight bounding boxes around the traffic signs. The main modification involved was standardizing and updating the class names. Using the LISA dataset as the baseline for class naming, the original 47 classes were reduced to 21 classes representing the most commonly observed U.S. traffic signs. Existing annotations were reviewed and the class names were adjusted to match the new label set.

Dataset

Source Datasets:

New Custom Dataset:

  • 8189 (Non-Augment) + (5,732 Augment)
  • 21 Classes

Class Distribution

Class Train Valid Test Total
stop 2442 402 198 3042
pedestrianCrossing 1428 254 117 1799
speedLimit35 1110 158 62 1330
speedLimit25 1084 137 71 1292
speedLimit45 496 61 28 585
speedLimit30 470 51 25 546
yield 308 57 25 390
speedLimit40 302 38 20 360
speedLimit20 312 30 14 356
speedLimit65 276 34 9 319
speedLimit50 212 16 14 242
speedLimit15 210 16 4 230
speedLimit55 160 4 4 168
speedLimit70 114 6 2 122
doNotEnter 94 5 2 101
speedLimit80 82 4 0 86
speedLimit60 78 4 3 85
noLeftTurn 68 9 4 81
speedLimit75 66 0 0 66
noRightTurn 46 3 0 49
speedLimit85 46 3 0 49

Dataset Split

The dataset was divided into training, validation, and test sets using an 82% / 12% / 6% split:

  • Train: 11,415 images
  • Validation: 1,670 images
  • Test: 836 images

Augmentation

  • Blur (2x)
  • Motion Blur (100px)
  • Canera Gain (0.05)

Augmentations were applied to simulate real-world driving conditions such as motion and camera exposure changes that may occur in ADAS or autonomous driving systems.


Training Configuration

  • Hardware: NVIDIA RTX 3080 10GB
  • Architecture: YOLOv8s, YOLOv11s, YOLOv26s
  • Batch Size: 32
  • Epochs: 100
  • Imgsz: 640
  • Patience: 50

Evaluation Results

Model Performance Comparison

The table below shows the average class performance metrics for each YOLO model. All three models perform similarly across evaluation metrics, with only small differences between architectures.

Model Precision (P) Recall (R) mAP50 mAP50-95
YOLOv8s 0.921 0.938 0.945 0.772
YOLOv11s 0.894 0.973 0.948 0.777
YOLOv26s 0.919 0.922 0.951 0.781

Model Comparison

The three YOLO models show very similar overall performance, with only small differences across evaluation metrics. All models achieve high precision, recall, and mAP, indicating strong detection capability for the selected traffic sign classes. YOLOv11 performs slightly better overall, while YOLOv26 achieves nearly identical performance despite using an NMS-free prediction approach. YOLOv8 performs marginally lower but remains within a small margin of error.

F1 Curve

F1_Curve

The F1-confidence curves show strong balance between precision and recall across all three models. The peak F1 score for all classes reaches approximately 0.92, indicating high overall detection performance, with all three models performing very similarly. Most classes maintain high F1 scores across confidence thresholds. However, the speedLimit55 (green) and speedLimit20 (brown) classes show noticeably lower curves. This likely reflects fewer training examples for these classes and the visual similarity between speed limit signs, where only the number differs, making them more difficult for the model to distinguish.

Confusion Matrix

matrix

The confusion matrix show strong classification performance across all three models, with most predictions showing up along the diagonal. This indicates that the majority of traffic sign classes are correctly classified with very few misclassifications. The three models display very similar patterns, suggesting consistent class recognition performance across architectures. Minor misclassifications occur between some speed limit classes, where predictions occasionally fall into adjacent speed categories.

Sample Images

images

The sample predictions demonstrate that all three models can accurately detect and localize traffic signs across a variety of road environments. Bounding boxes are placed correctly around most signs, and detection performance is generally consistent across the three models, although YOLOv26 occasionally struggles to detect the speedLimit55 sign compared to the other architectures. Overall, the predictions show strong and reliable detection performance across different road scenes and lighting conditions.

Real Time Video

Upscaling Comparison (1080p, 1440p, 2160p)

The upscaling video uses a 0.25 confidence threshold on YOLOv26 and shows that increasing the image resolution can help the model detect smaller or distant traffic signs by making important visual features more visible. However, some false positives or misclassifications may still occur when the original image lacks clear detail.

v11 (TTA) vs v26 (non-TTA)

This video compares detection performance between YOLOv11 using Test Time Augmentation (TTA) and YOLOv26 without TTA. The TTA comparison uses a 0.30 confidence threshold at 2160p. While TTA evaluates multiple augmented versions of each frame to improve detection robustness, it also introduced additional false positives in this example, occasionally predicting traffic signs on background objects.


Limitations

Known Failure Cases

The model may struggle when traffic signs are small, distant, or partially occluded within the frame. In some scenarios, the model may also produce false positives by incorrectly detecting traffic signs on background objects such as trees or other environmental features.

Poor Performing Classes

Certain speed limit classes show slightly lower performance compared to other traffic signs. In particular, speedLimit55 and speedLimit20 demonstrate lower F1 scores and occasional misclassifications. This may be due to fewer training examples as well as the visual similarity between speed limit signs, where only the number differs.

Data Biases

The dataset is composed of multiple publicly available traffic sign datasets and primarily contains images of U.S. traffic signs. As a result, the model may be biased toward U.S. signage and may not generalize well to traffic signs from other countries or regions.

Environmental Limitations

Model performance may degrade under challenging environmental conditions such as poor lighting. Background objects and natural features in the environment can also occasionally trigger false positives.

Sample Size Limitations

Although the dataset contains over 10,000 images, some traffic sign classes contain significantly fewer training samples than others. This class imbalance can affect model performance and contribute to lower detection accuracy for certain classes.

Inappropriate Use Cases

This model should not be used for safety-critical real-world driving systems without extensive validation and testing. The model is intended as a proof-of-concept for traffic sign detection and may not meet the reliability requirements required for deployment in autonomous vehicles or production ADAS systems.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support