Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
Yiming Tang
tangyiming
Follow
AI & ML interests
None yet
Recent Activity
new
activity
16 days ago
Qwen/Qwen3-Next-80B-A3B-Instruct:
Megatron Swift dpo training on Qwen/Qwen3-Next-80B-A3B-Instruct always always return nan loss. Why?
View all activity
Organizations
None yet
tangyiming
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
Qwen/Qwen3-Next-80B-A3B-Instruct
16 days ago
Megatron Swift dpo training on Qwen/Qwen3-Next-80B-A3B-Instruct always always return nan loss. Why?
#45 opened 16 days ago by
tangyiming