CriteriaPO/qwen2.5-3b-dpo-ablation-finegrained-all-vanilla-400k
Updated
CriteriaPO/qwen2.5-3b-dpo-ablation-finegrained-500k-vanilla-100k
Text Generation
•
3B
•
Updated
•
9
CriteriaPO/qwen2.5-3b-dpo-ablation-finegrained-200k
Text Generation
•
3B
•
Updated
•
18
CriteriaPO/qwen2.5-3b-dpo-vanilla-in-finegrained
Text Generation
•
3B
•
Updated
•
13
CriteriaPO/qwen2.5-3b-orpo-coarse-2e
CriteriaPO/qwen2.5-3b-orpo-mini-2e
Text Generation
•
3B
•
Updated
•
2
CriteriaPO/qwen2.5-3b-dpo-coarse-40-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-finegrained-5-vanilla
Text Generation
•
3B
•
Updated
•
5
CriteriaPO/qwen2.5-3b-dpo-finegrained-40-vanilla
Text Generation
•
3B
•
Updated
•
4
CriteriaPO/qwen2.5-3b-dpo-finegrained-20-vanilla
Text Generation
•
3B
•
Updated
•
3
CriteriaPO/qwen2.5-3b-dpo-finegrained-10-vanilla
Text Generation
•
3B
•
Updated
•
4
CriteriaPO/qwen2.5-3b-dpo-coarse-20-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-coarse-10-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-coarse-5-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-mini-40-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-mini-20-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-mini-5-vanilla
Text Generation
•
3B
•
Updated
•
5
CriteriaPO/qwen2.5-3b-dpo-mini-10-vanilla
Text Generation
•
3B
•
Updated
•
1
CriteriaPO/qwen2.5-3b-dpo-finegrained
Text Generation
•
3B
•
Updated
•
219
CriteriaPO/qwen2.5-3b-dpo-coarse
Text Generation
•
3B
•
Updated
•
189
CriteriaPO/qwen2.5-3b-dpo-mini
Text Generation
•
3B
•
Updated
•
178
CriteriaPO/qwen2.5-3b-dpo-vanilla
Text Generation
•
3B
•
Updated
•
309
CriteriaPO/llama3.2-3b-dpo-coarse
Text Generation
•
3B
•
Updated
•
154
CriteriaPO/llama3.2-3b-dpo-finegrained
Text Generation
•
3B
•
Updated
•
148
CriteriaPO/llama3.2-3b-dpo-vanilla
Text Generation
•
3B
•
Updated
•
338
CriteriaPO/llama3.2-3b-dpo-mini
Text Generation
•
3B
•
Updated
•
149
CriteriaPO/qwen2.5-3b-sft-10
Text Generation
•
3B
•
Updated
•
969
CriteriaPO/llama3.2-3b-sft-10
Text Generation
•
3B
•
Updated
•
450