Okay, I found it myself. Please check the answer to ensure I understand it.
optimizer state is for momentum, variance, they store all the information for updating parameters.
while there are also some intermediate during the computation. whenever updated, they would be destroyed.
Judy Liang
Judy07
AI & ML interests
None yet
Recent Activity
commentedon an article 8 days ago
Visualize and understand GPU memory in PyTorch commentedon an article 8 days ago
Visualize and understand GPU memory in PyTorch upvoted an article 8 days ago
Visualize and understand GPU memory in PyTorchOrganizations
None yet