This model is a result of applying DPO on petkopetkov/Qwen2.5-0.5B-Instruct-med-diagnosis using the relatively small dataset available as nuriyev/medical-question-answering-rl-labeled-qwen-0.5B-binarized_v2 at HuggingFace. It was evaluated using qualitative ranking and in some cases slightly outperforms the original petkopetkov/Qwen2.5-0.5B-Instruct-med-diagnosis.

The following web interface https://github.com/MahammadNuriyev62/doctor-llm was developed to test the model, make it publicly available and further collect the user feedback for further RLHF.

Usage

pip install -U transformers

Run with the pipeline API

from transformers import pipeline
import torch

system_prompt = (
    "You are a medical assistant trained to provide general health information. "
    "Follow these rules:\n"
    "1. Only answer the question asked.\n"
    "2. Do not deviate from medical facts.\n"
    "3. Be concise and accurate."
)

prompt = "What is contact dermatitis, and what are some of the typical symptoms associated with this condition, including the type of hypersensitivity reaction that causes it?"

chat = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt},
]

pipe = pipeline(
  task="text-generation",
  model="nuriyev/Qwen2.5-0.5B-Instruct-medical-dpo",
  torch_dtype=torch.bfloat16,
  device_map="auto",
  max_new_tokens=1024,
)

response = pipe(chat)

print(response[0]["generated_text"][0])