Description:

An object detection model that aims to detect healthy vs. diseased sea stars (from Sea Star Wasting Disease, SSWD). Trained on images which were mostly taken by volunteer citizen scientists on various beaches.

Sea stars are important keystone species in many ecosystems, and the loss of them from the initial wave of SSWD last decade had dramatic consequences (for example, the increased prevelance of urchin barrens and decreased kelp forest coverage due to the lack of sea stars to keep urchins in check). The easier it is to be tracking the prevelance of sea stars and SSWD accross ecosystems - and potentially tracking the difference in healthy vs. diseased sea stars across different species - the better.

Note: as it currently stands, this model is... not very good. This is likely because it was trained on too few images, and the distinction between sea stars organized into "healthy" and "diseased" classes was not very clear. It is only somewhat okay at detecting sea stars and classifying them as diseased/healthy.

Photos were annotated in RoboFlow, and the model was trained using Yolov11.

See this roboflow project for the roboflow project with annotated images (with many, many classes, noting condition, species, juvinile, etc., plus quite a few tags.).

See this roboflow project instead for the roboflow project with all classes collapsed into 'diseased' and 'healthy'. (There is also a 'brooding' class, but that only applies to two images, which were removed from the dataset.) This is what was used to the create the model.

Dataset:

This model was trained on a collection of 374 images sourced from MARINe, mainly from their Sea Star Wasting Disease (SSWD) surveys. The images were acquired via a data request directly to the MARINe website. Special thanks to Rani Gaddam for collecting the images and sending them to me.

Most images were of live sea stars in the intertidal zone, above the tide. Some images were taken underwater, were of dead sea stars, or were of sea stars in lab conditions.
Most photos were of individual sea stars, while some were in groups. A few images were of very large groups of diseased sea stars in a lab setting - however, only two images tagged as "Large_Group" were actually fully annotated and added to the dataset.

Some images were not able to be uploaded to roboflow due to size limits, or due to roboflow seemingly randomly considering them duplicates and preventing upload. Thus, only 356 images were uploaded (with only 303 being anotated, and 301 being used in the current final model).

There appeared to be little useful metadata on the images themselves, but many image files had names like "p_ochraceus_diseased_dav_2013_1003_mr.JPG", with the species name, disease status, and date listed (sometimes with location and/or the name of the surveyor or image source).
However, many did not have disease status or species listed at all. Thus, while things like species and disease status was noted in the classes (see the annotations section), many additional notes like if a human was present, or if this was in situ or in the lab, was noted in the tags of images in RoboFlow. These additional notes were not used in the current version of the model, but could likely be used to better refine it in the future.

Annotations:

The original version of the dataset in RoboFlow has 48 classes (please follow the link and go to Classes & Tags to see all of them). This version was made just to set up annotations based on what was known from file names (species, disease status), and to extrapolate from there.
Since many images did not have species or disease status noted, some were specifically noted as juviniles while others weren't, some had special notes in their title like if there was regrowth, etc., this quickly became a complicated task.
Thus, many classes are very similar to others, just with 'probabably' in front of an attribute. For example, there is a "Healthy p_ochraceus" class with 46 annotations, there is a "probably_healthy p_ochraceus" class with 11 annotations, a "maybe_healthy p_ochraceus juvenile" class (note the differene between "probably_healthy" and "maybe_healthy") with 6 annotations, a "probably_healthy looks_diseased regenerating p_ochraceus" class with 3 annotations... etc.

18 classes here only have 1 annotation in them. Many of those classes are very similar, but not quite the same, and the distinction may matter depending on what you are focusing on, or what turns out to impact the training of the model (if an individual is a juvinile, if we're certain of the species, if it's lost an arm, if it's showing signs of regrowth...). Hence why there is a version where classes have been left largely untouched.

Only 7 classes have more than 10 annotations. Only 3 have more than 20, and all of those are of the Ochre Star (diseased p_ochraceus has 198, healthy p_ochraceus has 46, and diseased p_ochraceus arm has 26).

The other version has all classes collapsed into just "healthy" (133 annotations) and "diseased" (295 annotations). There is also a "brooding" class (2 annotations), which has been marked as unnatotated for the sake of not including it in the dataset for fear of confusing the model too much.
However, it is of note that the classes were collapsed here with little discretion, simply based on if an annotation was originally noted as "maybe" or "probably" diseased or healthy, rarely looking at the actual images to double check if something visually should be in the other category - so the distinction between diseased and healthy likely isn't very useful, at least in this version of the dataset.

In RoboFlow:

The second version of the dataset, with all classes collapsed into "diseased" and "healthy" classes, was used.

Pre-processing (applied to each image):

Auto-orientation of pixel data (with EXIF-orientation stripping)
Resize to 512x512 (Stretch)

Augmentation (applied to create 3 versions of each source image):

Randomly crop between 0 and 35 percent of the image
Random Gaussian blur of between 0 and 1 pixels
Salt and pepper noise was applied to 0.3 percent of pixels

Training:

See args.yaml.
Based code on the OceanCV textbook chapter 15 (credit to Ada Carter and Katie Bigham, the teachers of this course, which this project is a part of).
Batch size and image size were the only things changed from the original code.

100 epochs (in hindsight, likely could have run for longer).
Patience: 50
Batch size: 32
Image size: 512
Etc.

Performance:

Not... great. Could definitely be worse, and could definitely be better!
It's somewhat okay with diseased sea stars, and not good with healthy sea stars.
There is a surpringly high amount of confusion between the background and sea stars. This mainly presented as sea stars being missed entirely (usually the juveniles), as well as things like a rock being marked as a diseased sea star, in one example.

See the results_infographics_and_such folder for more details and visuals.

Notes:

Surpringly, the model seemed to do pretty well discerning diseased vs. healthy sea stars, given the conditions, but struggling discerning sea stars from the background.
It seemed to not be confused by the prescence of humans. Perhaps some of the images that were excluded from the dataset could be added back in, and that extra data could aid in training, if it turns out that humans and hands and such are largely irrelevant to training.

Why is it not good? What next?

Didn't let run for enough epochs - underfitted?
As previously mentioned, in this run, individuals were marked as “diseased” or “healthy” with little discretion.
- Some that were marked diseased were based on file names, which include only mildly or moderately diseased stars - perhaps not easy for the model to discern between healthy and diseased sea stars when they're so similar, placed in groups like this?
Both in situ and lab stars were included
- Wide variety of backgrounds for stars to be confused with?
Both juveniles and adults were included
Many repeat images – got too used to a small dataset?
Perhaps the limited number of samples for some species led to confusion?
Didn’t fiddle with IOU enough?
Other factors?
Classify based on disease being visually prevalent or not
- Test if having 2 categories, "severely diseased" and "healthy" (instead of "diseased" and "healthy"), where mildly diseased sea stars are excluded from the data (or perhaps even put under the 'healthy' category?) would be more accurate?
- Test if more than 2 categories would be useful (healthy, mild, moderate, severe, etc.)
- (Though, it is of note, that this project was mainly a test to see if SSWD - which is mainly visual but early stages can also in part be identified by behavioral changes like lethargy and curling up - could even be properly detected by a model like this. If a model were to be trained based on how diseased a sea star looks (e.g. comaparing healthy vs. severely diseased sea stars), and not how diseased it actually is, that may make the model less helpful for determining the actual prevelance of SSWD.)
Gather more images
- Especially of severely diseased stars, ones which are visually very diseased?
Finish annotating the images of the very large groups
Exclude images from lab environment, perhaps?
Test to see if there is confusion from different species
- Maybe test if making different classes for each species would result in a better model
  - May cause additional problems due to Ochre stars comprising the majority of the data. May need to source additional data of other species, both healthy and diseased.
Curious if a model just based on the ochre star images, which were the overwhelming majority, would be more accurate...
Test to see if there is confusion from adults vs juveniles?
Fiddle with IOU

Model Use Case:

In the model's current state, it appears it wouldn't really be ideal for anything - not even to see if a sea star is there, let alone if it's diseased or not.
The model, currently, would likely be best used by running it on a larger set of images of sea stars, and using those annotations and cleaning them up (as opposed to making new annotations from clean images, which would be slower), and creating a new model from there. That is likely what should be done if this project is redone with more images.

In a theoretical world where this model had more images, more distinct differences between the 'diseased' and 'healthy' classes (or alternative), and was likely run for longer, to the point where it would be able to reasonably accurately determine if something is a sea star or not and if it is diseased: perhaps it could be used in a similar case to the above, where it could be fed a large set of data of sea stars on a beach (likely from volunteer citizen scientists - perhaps data from iNaturalist, for example, could be used and/or tested? - or something like a low-flying drone over a beach?), and its annotations could be confimred and refined from there.
But of course that's all theoretical for now.

Sources and Credit:

A very special thank you to Ada Carter and Katie Bigham, as mentioned earlier. You both were wonderful teachers of this course, and I'm glad I took it! I learned a lot :)

Special thanks again to Rani Gaddam for collecting the images and sending them to me.

“Sea Star Wasting Disease”. Multi-Agency Rocky Intertidal Network, University of California Santa Cruz. seastarwasting.org. Last accessed March 2026.

“This study utilized data collected by the Multi-Agency Rocky Intertidal Network (MARINe): a long-term ecological consortium funded and supported by many groups. Please see the complete list of MARINe partners responsible for monitoring and funding these data. Special recognition should go to the agencies who have provided the majority of continuous funding for the project over several decades: Bureau of Ocean Energy Management, The National Park Service, The California Ocean Protection Council, Partnership for Interdisciplinary Studies of Coastal Oceans, and US Navy (Navy Marine Ecology Consortium).”

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support