π§ GRPO Adapter Models for Medical Reasoning (LLaMA 3.1 8B)
This repository hosts four adapter models trained with Group Relative Policy Optimization (GRPO) on top of LLaMA 3.1 8B, targeting the task of medical imaging appropriateness classification. These models were trained to replicate and align with expert clinical reasoning provided by the American College of Radiology (ACR).
π¬ Model Variants and Reward Designs
| Variant Name | Reward Type(s) | Description |
|---|---|---|
| Baseline | β
Answer (binary) β Format (tag-based) |
Standard RL setup: rewards correct label and properly formatted output using <think> and <answer> tags. |
| Citations | β
Answer β Format β External Context |
Adds condensed medical evidence (abstracts from ACR-cited PubMed studies) to the context, testing whether grounding in real evidence improves performance. |
| LLM Eval | β
Answer β Format β LLM-based reasoning alignment β External Context |
Uses Qwen1.5-1.8B to score the similarity of generated and gold reasoning, encouraging factually aligned justifications. |
| Custom Embedding | β
Answer Γ Reasoning Similarity β Format β External Context |
Novel reward using cosine similarity between embedding traces. Only grants reward if final answer is correct and reasoning closely aligns with gold trace structure. |
π Files
baseline\adapter_model.safetensorscitations\adapter_model.safetensorsllm-eval\adapter_model.safetensorscustom_embedding\adapter_model.safetensors
Each file contains an adapter for the LLaMA 3.1 8B model, trained on ~1800 variant/procedure pairs across ~30 medical conditions, using custom RL rewards.
π©Ί About the Task
The task is to recommend whether a specific imaging procedure is:
- Usually Appropriate
- May Be Appropriate
- Usually Not Appropriate
The agent is trained to not only predict the correct label but justify it step-by-step, mimicking the ACRβs clinical reasoning process and referencing relevant medical studies.
π¬ Collaboration & Citation
Interested in medical AI, reinforcement learning, or clinical reasoning? Let's connect!
This work is part of a larger research effort on interpretable LLM agents in healthcare. Please cite or reach out if using these adapters in your work. π