🧠 GRPO Adapter Models for Medical Reasoning (LLaMA 3.1 8B)

This repository hosts four adapter models trained with Group Relative Policy Optimization (GRPO) on top of LLaMA 3.1 8B, targeting the task of medical imaging appropriateness classification. These models were trained to replicate and align with expert clinical reasoning provided by the American College of Radiology (ACR).


πŸ”¬ Model Variants and Reward Designs

Variant Name Reward Type(s) Description
Baseline βœ… Answer (binary)
βœ… Format (tag-based)
Standard RL setup: rewards correct label and properly formatted output using <think> and <answer> tags.
Citations βœ… Answer
βœ… Format
βž• External Context
Adds condensed medical evidence (abstracts from ACR-cited PubMed studies) to the context, testing whether grounding in real evidence improves performance.
LLM Eval βœ… Answer
βœ… Format
βœ… LLM-based reasoning alignment
βž• External Context
Uses Qwen1.5-1.8B to score the similarity of generated and gold reasoning, encouraging factually aligned justifications.
Custom Embedding βœ… Answer Γ— Reasoning Similarity
βœ… Format
βž• External Context
Novel reward using cosine similarity between embedding traces. Only grants reward if final answer is correct and reasoning closely aligns with gold trace structure.

πŸ“ Files

  • baseline\adapter_model.safetensors
  • citations\adapter_model.safetensors
  • llm-eval\adapter_model.safetensors
  • custom_embedding\adapter_model.safetensors

Each file contains an adapter for the LLaMA 3.1 8B model, trained on ~1800 variant/procedure pairs across ~30 medical conditions, using custom RL rewards.


🩺 About the Task

The task is to recommend whether a specific imaging procedure is:

  • Usually Appropriate
  • May Be Appropriate
  • Usually Not Appropriate

The agent is trained to not only predict the correct label but justify it step-by-step, mimicking the ACR’s clinical reasoning process and referencing relevant medical studies.


πŸ“¬ Collaboration & Citation

Interested in medical AI, reinforcement learning, or clinical reasoning? Let's connect!
This work is part of a larger research effort on interpretable LLM agents in healthcare. Please cite or reach out if using these adapters in your work. πŸ™Œ

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support