ft-DeepSeek-R1-0528-Qwen3-8B-5epoch

Reinforced Qwen-8B for Academic Reasoning, Literature Review, and Research Question Generation

Overview

ft-DeepSeek-R1-0528-Qwen3-8B-5epoch is a hybrid model combining:

  • the Qwen3-8B architecture
  • DeepSeek-R1-style reasoning behavior
  • the same MIS/Finance/Econ academic-domain fine-tuning pipeline used in ft-Qwen3-8B-5epoch

This model emphasizes step-wise reasoning quality, gap detection, and research question discovery. It builds on the supervised fine-tuning stage and is designed to work with future reinforcement-learning (RL) refinements using the RQSim reward.


Improvements Over ft-Qwen3-8B-5epoch

  • Stronger reasoning coherence
  • Better multi-step academic argumentation
  • More structured gap identification
  • More consistent ranking/prioritization of potential research questions
  • Cleaner and more interpretable thought process (chain-of-thought-friendly)

Specialization

1. Literature Review Understanding

The model learns from expert-written literature reviews in social science journals, enabling it to mimic academic rhetorical structure:

  • theoretical background
  • problem framing
  • empirical gaps
  • future directions

2. Research Question Generation

The model is trained to generate RQs that mirror actual research questions in target papers by learning relationships between referenced papers and the target paper's RQ.

3. RQSim-aligned Development

The overall project uses RQSim both as an evaluation metric and a future RL reward:

R = α · RQSim − β · IrrelevancePenalty

This model is intended to be used as the base for future PPO training.


Training Data & Methodology

Identical data pipeline as ft-Qwen3-8B-5epoch:

  • curated MIS/Finance/Econ academic papers (pre-2020 training set)
  • reference metadata: abstract, publication year, authors, title
  • target literature review
  • target research question
  • supervised fine-tuning for 5 epochs

Future work (planned): RL training with RQSim as reward.


Suggested Use Cases

  • Literature-based academic idea discovery
  • Identifying conceptual gaps across referenced studies
  • Generating structured, context-aware research questions
  • Drafting literature review subsections
  • Supporting MIS / Finance / Economics academic workflows

Example Prompt

You are given metadata for papers in Literature.
Identify thematic gaps by comparing ideas, methods, and years of publication.
Propose a numbered list of research questions addressing these gaps.
Output only research questions.

Literature:
1. [Title] by [Authors] ([Year]) - Abstract: [...]
2. ...

Limitations

  • Reinforcement learning stage is not included yet.
  • Model is domain-specialized; performance in biomedical/engineering areas not guaranteed.
  • Requires 4-bit quantization for optimal VRAM usage in consumer GPUs.

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support