Uploaded model
- License: apache-2.0
- Finetuned from model : unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
A BioMedical Snippet Extraction Model for Question Answering
Usage
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest
# Initialize vLLM Engine with LoRA support
model_path = "unsloth/meta-llama-3.1-8b-instruct-bnb-4bit"
lora_path = "sag-uniroma2/llama3.1_adapter_biorag_snippet_extraction"
lora_adapter_id = 1
llm = LLM(
model=model_path,
enable_lora=True,
max_loras=1,
max_lora_rank=64,
gpu_memory_utilization=0.85,
trust_remote_code=True,
disable_custom_all_reduce=True,
enforce_eager=True
)
# Setup LoRA request
lora_request_obj = LoRARequest(
lora_name=str(lora_adapter_id),
lora_int_id=lora_adapter_id,
lora_path=lora_path
)
# Define sampling parameters
sampling_params = SamplingParams(temperature=0.0, max_tokens=256)
# Define instruction
instruction = """You are an expert biomedical researcher skilled in extracting relevant information from scientific literature.
Your task is to identify and extract key snippets from a given PubMed abstract or title that provide useful information to answer a specific biomedical question.
Instructions:
- Understand the question: Carefully analyze the biomedical question to grasp its key concepts, entities, and relationships.
- Analyze the document: Read the provided title or abstract carefully, identifying sentences or phrases that contain relevant information.
- Extract the snippet: If a portion of the text is relevant, extract it exactly as it appears in the original text and enclose it within the tags [BS] and [ES].
- Handle irrelevant cases: If the document does not contain any relevant information, return only [BS] [ES] with no content inside.
- Be precise: Ensure that extracted snippets are complete, self-contained, and directly relevant, without modifying or adding words."""
# Prepare input
question = "YOUR_BIOMEDICAL_QUESTION_HERE"
document_text = "PUBMED_ABSTRACTS_HERE"
prompt = f"{instruction}\n\n# Question: {question}\n# Abstract/Title: {document_text}\n# Snippets:"
# Generate snippet extraction
outputs = llm.generate(
[prompt],
sampling_params,
lora_request=lora_request_obj
)
# Parse and extract snippet
generated_text = outputs[0].outputs[0].text
snippet = generated_text.strip()
# Remove EOS tokens
common_eos_tokens = ["<|eot_id|>", "</s>", "<|endoftext|>"]
for eos in common_eos_tokens:
if snippet.endswith(eos):
snippet = snippet[:-len(eos)].strip()
# Extract content between tags
import re
extracted_snippets = re.findall(r'\[BS\](.*?)\[ES\]', snippet, re.DOTALL)
for snippet_content in extracted_snippets:
clean_snippet = snippet_content.strip()
if clean_snippet:
print(f"Extracted snippet: {clean_snippet}")
Description
This Snippet Extraction Module is a fine-tuned language model powered by vLLM inference engine, designed to automatically extract relevant snippets from PubMed abstracts and titles in response to biomedical questions. The model uses parameter-efficient LoRA (Low-Rank Adaptation) fine-tuning on the Llama-3.1-8B-Instruct as a base model.
Model Details
- Base Model: Llama-3.1-8B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 64
- Inference Engine: vLLM (for optimized generation)
Key Features
- ✅ Biomedical Domain Expertise: Fine-tuned on BioASQ biomedical question-answering dataset
- ✅ Exact Span Extraction: Extracts text exactly as it appears with [BS] and [ES] tags
- ✅ High-Performance Inference: vLLM engine enables batch processing and fast generation
- ✅ Dual-Document Processing: Independently processes both titles and abstracts for comprehensive extraction
Performance
- Tested on BioASQ 13B Phase A test set
- Optimized for precision and recall in biomedical snippet retrieval
- Batch processing capability for efficient document-scale extraction
Use Cases
- Biomedical Question Answering: Extract supporting evidence snippets for QA systems
- Literature Mining: Identify relevant passages in biomedical literature repositories
- Clinical Decision Support: Extract relevant clinical evidence from scientific literature
- Document Summarization: Identify key information-bearing passages in scientific papers
Input Format
The model expects a formatted prompt with:
- instruction: Detailed task definition and extraction guidelines
- question: The biomedical question requiring an answer
- document_text: PubMed abstracts or titles to analyze
Output Format
Extracts snippets enclosed in tags:
- [BS] ... [ES]: Extracted relevant snippet
- [BS] [ES]: Empty tag pair when no relevant information found
GitHub
For implementation details, training scripts, and integration guides:
GitHub Repository: LocalBioRAG
GitHub Repository: BioASQ2025-UNITOR)
Citation
If you use this model, please cite:
@InProceedings{10.1007/978-3-032-21324-2_31,
author="Borazio, Federico
and Labbate, Francesco
and Croce, Danilo
and Basili, Roberto",
editor="Campos, Ricardo
and Jatowt, Adam
and Lan, Yanyan
and Aliannejadi, Mohammad
and Bauer, Christine
and MacAvaney, Sean
and Anand, Avishek
and Ren, Zhaochun
and Verberne, Suzan
and Bai, Nan
and Mansoury, Masoud",
title="Integrating AI and IR Paradigms for Sustainable and Trustworthy Accurate Access to Large Scale Biomedical Information",
booktitle="Advances in Information Retrieval",
year="2026",
publisher="Springer Nature Switzerland",
address="Cham",
pages="398--412",
isbn="978-3-032-21324-2"
}
@inproceedings{unitor,
title={{UniTor at BioASQ 2025: Modular Biomedical QA with Synthetic Snippets and Multiple Task Answer Generation}},
author={Borazio, Federico and Shcherbakov, Andriy and Croce, Danilo and Basili, Roberto},
year=2025,
booktitle={CLEF 2025 Working Notes},
editor= {Faggioli, Guglielmo and Ferro, Nicola and Rosso, Paolo and Spina, Damiano}
}
Disclaimer
This model is fine-tuned for biomedical snippet extraction from PubMed literature. While it performs well on BioASQ data, results may vary on other biomedical datasets or domains. The model is optimized for precision in identifying relevant text spans. Always validate extracted snippets for critical applications in clinical or research settings. For production use, consider the computational requirements: vLLM inference requires adequate GPU memory (recommended ≥24GB for batch processing).
