🧠 GRPO Adapter Models for Medical Reasoning (LLaMA 3.1 8B)

This repository hosts four adapter models trained with Group Relative Policy Optimization (GRPO) on top of LLaMA 3.1 8B, targeting the task of medical imaging appropriateness classification. These models were trained to replicate and align with expert clinical reasoning provided by the American College of Radiology (ACR).

🔬 Model Variants and Reward Designs

Variant Name	Reward Type(s)	Description
Baseline	✅ Answer (binary) ✅ Format (tag-based)	Standard RL setup: rewards correct label and properly formatted output using `<think>` and `<answer>` tags.
Citations	✅ Answer ✅ Format ➕ External Context	Adds condensed medical evidence (abstracts from ACR-cited PubMed studies) to the context, testing whether grounding in real evidence improves performance.
LLM Eval	✅ Answer ✅ Format ✅ LLM-based reasoning alignment ➕ External Context	Uses Qwen1.5-1.8B to score the similarity of generated and gold reasoning, encouraging factually aligned justifications.
Custom Embedding	✅ Answer × Reasoning Similarity ✅ Format ➕ External Context	Novel reward using cosine similarity between embedding traces. Only grants reward if final answer is correct and reasoning closely aligns with gold trace structure.

📁 Files

baseline\adapter_model.safetensors
citations\adapter_model.safetensors
llm-eval\adapter_model.safetensors
custom_embedding\adapter_model.safetensors

Each file contains an adapter for the LLaMA 3.1 8B model, trained on ~1800 variant/procedure pairs across ~30 medical conditions, using custom RL rewards.

🩺 About the Task

The task is to recommend whether a specific imaging procedure is:

Usually Appropriate
May Be Appropriate
Usually Not Appropriate

The agent is trained to not only predict the correct label but justify it step-by-step, mimicking the ACR’s clinical reasoning process and referencing relevant medical studies.

📬 Collaboration & Citation

Interested in medical AI, reinforcement learning, or clinical reasoning? Let's connect!
This work is part of a larger research effort on interpretable LLM agents in healthcare. Please cite or reach out if using these adapters in your work. 🙌

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support