Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
wayyresearch
/
aetheris
like
2
Follow
Wayy Research Co.
2
Text Generation
PyTorch
65 languages
mamba
ssm
state-space-model
mixture-of-experts
Mixture of Experts
multilingual
distillation
knowledge-distillation
aya
hybrid-architecture
wayy-research
arxiv:
2312.00752
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
aetheris
8.88 GB
Ctrl+K
Ctrl+K
1 contributor
History:
57 commits
rcgalbo
Update model: SFT step 3000, loss=3.8530
8f48dbe
verified
44 minutes ago
aetheris
Sync latest aetheris source code
about 22 hours ago
tokenizer
Add Aya tokenizer files (avoid gated repo dependency)
about 22 hours ago
.gitattributes
Safe
1.58 kB
Add Aya tokenizer files (avoid gated repo dependency)
about 22 hours ago
README.md
9.19 kB
Update model card with full architecture and training details
1 day ago
config.yaml
Safe
316 Bytes
Full vocab config for SFT model
1 day ago
pytorch_model.pt
pickle
Detected Pickle imports (3)
"torch._utils._rebuild_tensor_v2"
,
"torch.FloatStorage"
,
"collections.OrderedDict"
What is a pickle import?
2.89 GB
xet
Update model: SFT step 3000, loss=3.8530
44 minutes ago
stage1_checkpoint.pt
Suspicious
pickle
Detected Pickle imports (4)
"torch.BFloat16Storage"
,
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
,
"torch.FloatStorage"
What is a pickle import?
1.64 GB
xet
Stage 1 checkpoint: [Step 50/20000] loss=7.7500
5 days ago
stage1_metadata.json
Safe
414 Bytes
Stage 1 checkpoint: [Step 50/20000] loss=7.7500
5 days ago
stage2_best.pt
Safe
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch.BFloat16Storage"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
1.44 GB
xet
Upload final Stage 2 best checkpoint (loss=2.7305, 20K steps)
4 days ago
stage2_checkpoint.pt
Suspicious
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
,
"torch.BFloat16Storage"
What is a pickle import?
1.44 GB
xet
Stage 2 checkpoint: [Step 18500/20000] loss=3.1250
5 days ago
stage2_final.pt
Safe
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
,
"torch.BFloat16Storage"
What is a pickle import?
1.44 GB
xet
Upload Stage 2 final checkpoint (step 20000)
4 days ago
stage2_metadata.json
Safe
263 Bytes
Update Stage 2 metadata: COMPLETE, best loss=2.7305
4 days ago
student_config.yaml
Safe
668 Bytes
Stage 1 initial: step 1000, loss=0.29, cka=0.60
5 days ago
training_config.yaml
Safe
2.74 kB
Stage 1 initial: step 1000, loss=0.29, cka=0.60
5 days ago