ft-DeepSeek-R1-0528-Qwen3-8B-5epoch
Reinforced Qwen-8B for Academic Reasoning, Literature Review, and Research Question Generation
Overview
ft-DeepSeek-R1-0528-Qwen3-8B-5epoch is a hybrid model combining:
- the Qwen3-8B architecture
- DeepSeek-R1-style reasoning behavior
- the same MIS/Finance/Econ academic-domain fine-tuning pipeline used in ft-Qwen3-8B-5epoch
This model emphasizes step-wise reasoning quality, gap detection, and research question discovery. It builds on the supervised fine-tuning stage and is designed to work with future reinforcement-learning (RL) refinements using the RQSim reward.
Improvements Over ft-Qwen3-8B-5epoch
- Stronger reasoning coherence
- Better multi-step academic argumentation
- More structured gap identification
- More consistent ranking/prioritization of potential research questions
- Cleaner and more interpretable thought process (chain-of-thought-friendly)
Specialization
1. Literature Review Understanding
The model learns from expert-written literature reviews in social science journals, enabling it to mimic academic rhetorical structure:
- theoretical background
- problem framing
- empirical gaps
- future directions
2. Research Question Generation
The model is trained to generate RQs that mirror actual research questions in target papers by learning relationships between referenced papers and the target paper's RQ.
3. RQSim-aligned Development
The overall project uses RQSim both as an evaluation metric and a future RL reward:
R = α · RQSim − β · IrrelevancePenalty
This model is intended to be used as the base for future PPO training.
Training Data & Methodology
Identical data pipeline as ft-Qwen3-8B-5epoch:
- curated MIS/Finance/Econ academic papers (pre-2020 training set)
- reference metadata: abstract, publication year, authors, title
- target literature review
- target research question
- supervised fine-tuning for 5 epochs
Future work (planned): RL training with RQSim as reward.
Suggested Use Cases
- Literature-based academic idea discovery
- Identifying conceptual gaps across referenced studies
- Generating structured, context-aware research questions
- Drafting literature review subsections
- Supporting MIS / Finance / Economics academic workflows
Example Prompt
You are given metadata for papers in Literature.
Identify thematic gaps by comparing ideas, methods, and years of publication.
Propose a numbered list of research questions addressing these gaps.
Output only research questions.
Literature:
1. [Title] by [Authors] ([Year]) - Abstract: [...]
2. ...
Limitations
- Reinforcement learning stage is not included yet.
- Model is domain-specialized; performance in biomedical/engineering areas not guaranteed.
- Requires 4-bit quantization for optimal VRAM usage in consumer GPUs.
- Downloads last month
- 2