Model Description
Pills can often be hard to identify at a glance, and they can sometimes be mixed up with others as a result. This can be dangerous as taking the wrong pills by accident can have serious health effects. Having some kind of detection model between different pills can help identify what each pill is to help ensure additional safety
Dataset & Procedure
This uses the dataset “Pill Detection Computer Vision Model” by Mohamed Attia, with photos also curated by them from Roboflow Universe. The dataset can be found here. In the original dataset, images were auto-oriented, resized, and stretched to 416x416. The images are of pills of various brands and colors on different backgrounds and backdrops. The dataset contains 496 images total, with 351 train, 98 valid, and 47 test.
The dataset was manually combed through on its annotations to double check and validate each image to ensure that the quality of the annotations is up to par. The annotations were not modified as they were all found to be accurate after manual validation.
Epochs: 350, but exited prematurely at 123 due to results starting to plateau
Batches: 65
Preprocesssing: Resize to 640x640
Patience: 50
Training Framework: Ultralytics
Hardware: Google Colab with T4 GPU
| Class Name | Number of Annotations |
|---|---|
| Blue | 59 |
| Cipro 500 | 111 |
| Ibuphil 600 mg | 2 |
| Ibuphil Cold 400-60 | 71 |
| Pink | 58 |
| Red | 60 |
| White | 60 |
| Xyzall 5mg | 71 |
Evaluation Results
mAP50: 0.7769
mAP50-95: 0.6273
Precision: 0.5778
Recall: 0.9691
With the results of this model, there is a very high recall but low precision. The original goal of the model was to hit around 80% accuracy but that number was missed by a large margin, and adjustments and tweaks to the model did not appear to make much of a difference. This is likely due to issues with the dataset, which will be talked about in the "Limitations and Biases" section. The high recall but low accuracy means that while it will likely detect any pills in an image, it can lead to a lot of false positives which is unwanted.
Limitations and Biases
This model has several concerns that limit it and cause a bit of difficulty. The dataset itself has a lot of class imbalance that makes it difficult to detect some specific pill types, and uses some general color categories rather than exact branded pills. This makes it harder to have more accuracy, and more inconsistent for the information being input into the model. It's also a challenge to add onto the model due to the specific pill types that are listed for improving or adjusting the dataset, as pills are not necessarily the most easy to access to create more images for training. The sample size is a lot smaller than some other datasets as it is under 500 images, so the data usable for training is smaller than it can be. All of these issues likely contribute to the struggles in reaching higher accuracy, and as a result it may make more sense for creating a completely new and separate dataset manually rather than relying on the one provided if one were to try and improve upon this premise.
With these limitations in mind, the intended purpose of being used for pills detection and sorting does not work as well and it may be better off being used for other purposes. It may be potentially usable in counting pills or detecting if a pill is in an image, but the risk then comes with the false positives as that can and would throw results off and defeat the purpose of the model.

