·
AI & ML interests
Alignment, RLHF, LLM
Organizations
models 19
SiliangZ/zephyr-7b-dpo-full
7B • Updated
• 1
SiliangZ/mistral-irl-iter2-iterative-dpo
Text Generation
• 7B • Updated
• 1
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7
Text Classification
• 7B • Updated
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6
Text Classification
• 7B • Updated
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7
Text Classification
• 7B • Updated
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6
Text Classification
• 7B • Updated
• 4
SiliangZ/RM_mistral_irl2_initilized_from_sft_lr_5e7_idpo
7B • Updated
• 1
SiliangZ/RM_mistral_irl2_initilized_from_irl1_rm_lr_5e7_idpo
7B • Updated
SiliangZ/RM_mistral_7b_sft_beta_ultrachat_200k_mistral_sft_temp07_lr_5e7
7B • Updated
SiliangZ/mistral-7b-sft-beta-rm-mistral-sft-temp07-lr-5e7-iter1
Text Generation
• 7B • Updated
• 4
datasets 26
SiliangZ/mistral_irl3_rm_data_idpo
Viewer
• Updated
• 208k • 3
SiliangZ/mistral_irl3_rm_data_combined_idpo
Viewer
• Updated
• 624k • 4
SiliangZ/mistral_irl2_rm_data_combined_idpo
Viewer
• Updated
• 416k • 4
SiliangZ/AIHF_Online_RLHF_Iter2
Viewer
• Updated
• 20k • 9
SiliangZ/AIHF_Online_RLHF_Iter1
Viewer
• Updated
• 20k • 9
SiliangZ/ultrafeedback_with_demo_sft_pairs_temp07
Viewer
• Updated
• 269k • 6
SiliangZ/ultrachat_200k_mistral_sft_iter1_iter2_temp1_generations
Viewer
• Updated
• 624k • 9
SiliangZ/ultrachat_200k_mistral_sft_temp1_iter1
Viewer
• Updated
• 416k • 9
SiliangZ/ultrachat_200k_mistral_sft_and_mistral_irl1_round1_temp07
Viewer
• Updated
• 416k • 9
• 1
SiliangZ/ultrachat_200k_mistral_sft_temp1
Viewer
• Updated
• 231k • 7