·
AI & ML interests
Alignment, RLHF, LLM
Organizations
SiliangZ/zephyr-7b-dpo-full
7B • Updated
• 1
SiliangZ/mistral-irl-iter2-iterative-dpo
Text Generation
• 7B • Updated
• 1
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7
Text Classification
• 7B • Updated
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6
Text Classification
• 7B • Updated
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7
Text Classification
• 7B • Updated
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6
Text Classification
• 7B • Updated
• 4
SiliangZ/RM_mistral_irl2_initilized_from_sft_lr_5e7_idpo
7B • Updated
• 1
SiliangZ/RM_mistral_irl2_initilized_from_irl1_rm_lr_5e7_idpo
7B • Updated
SiliangZ/RM_mistral_7b_sft_beta_ultrachat_200k_mistral_sft_temp07_lr_5e7
7B • Updated
SiliangZ/mistral-7b-sft-beta-rm-mistral-sft-temp07-lr-5e7-iter1
Text Generation
• 7B • Updated
• 4
SiliangZ/RM_iter1_temp07_and_temp1_ACC_707
7B • Updated
SiliangZ/IRL_RM_Iter1_temp07_temp1_zephyr_init
7B • Updated
SiliangZ/IRL_RM_Iter1_temp07_temp1
7B • Updated
SiliangZ/IRL_Iter0_RM_ultrachat_200k_vs_sft_with_spin_iter0_checkpoint_232
Text Generation
• 7B • Updated
• 1
SiliangZ/zephyr-7b-sft-full
Updated
SiliangZ/IRL_Iter0_Policy_Epoch5_RM_Data_SPIN_Iter0
Text Generation
• 7B • Updated
• 3
SiliangZ/mistral_7b_reward_spin_iter0_data
7B • Updated
SiliangZ/mistral_7b_reward_ultrafeedback_last_checkpoint
7B • Updated
SiliangZ/mistral_7b_reward_preference700k
7B • Updated