ft-DeepSeek-R1-0528-Qwen3-8B-5epoch

Reinforced Qwen-8B for Academic Reasoning, Literature Review, and Research Question Generation

Overview

ft-DeepSeek-R1-0528-Qwen3-8B-5epoch is a hybrid model combining:

the Qwen3-8B architecture
DeepSeek-R1-style reasoning behavior
the same MIS/Finance/Econ academic-domain fine-tuning pipeline used in ft-Qwen3-8B-5epoch

This model emphasizes step-wise reasoning quality, gap detection, and research question discovery. It builds on the supervised fine-tuning stage and is designed to work with future reinforcement-learning (RL) refinements using the RQSim reward.

Improvements Over ft-Qwen3-8B-5epoch

Stronger reasoning coherence
Better multi-step academic argumentation
More structured gap identification
More consistent ranking/prioritization of potential research questions
Cleaner and more interpretable thought process (chain-of-thought-friendly)

Specialization

1. Literature Review Understanding

The model learns from expert-written literature reviews in social science journals, enabling it to mimic academic rhetorical structure:

theoretical background
problem framing
empirical gaps
future directions

2. Research Question Generation

The model is trained to generate RQs that mirror actual research questions in target papers by learning relationships between referenced papers and the target paper's RQ.

3. RQSim-aligned Development

The overall project uses RQSim both as an evaluation metric and a future RL reward:

R = α · RQSim − β · IrrelevancePenalty

This model is intended to be used as the base for future PPO training.

Training Data & Methodology

Identical data pipeline as ft-Qwen3-8B-5epoch:

curated MIS/Finance/Econ academic papers (pre-2020 training set)
reference metadata: abstract, publication year, authors, title
target literature review
target research question
supervised fine-tuning for 5 epochs

Future work (planned): RL training with RQSim as reward.

Suggested Use Cases

Literature-based academic idea discovery
Identifying conceptual gaps across referenced studies
Generating structured, context-aware research questions
Drafting literature review subsections
Supporting MIS / Finance / Economics academic workflows

Example Prompt

You are given metadata for papers in Literature.
Identify thematic gaps by comparing ideas, methods, and years of publication.
Propose a numbered list of research questions addressing these gaps.
Output only research questions.

Literature:
1. [Title] by [Authors] ([Year]) - Abstract: [...]
2. ...

Limitations

Reinforcement learning stage is not included yet.
Model is domain-specialized; performance in biomedical/engineering areas not guaranteed.
Requires 4-bit quantization for optimal VRAM usage in consumer GPUs.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16