Small Language Models for Kazakh: models, tokenizers, and datasets for Kazakh language modeling.
Saken Tukenov PRO
stukenov
AI & ML interests
None yet
Recent Activity
updated a dataset 3 days ago
rtrk/kazakh-traditional-audio liked a model 4 days ago
ptrdvn/kakugo-3B-kir liked a dataset 4 days ago
ptrdvn/kakugo-kirOrganizations
Soz: Kazakh Language Models from Scratch
Building foundational language models for Kazakh — models, tokenizers, and training corpora.
-
stukenov/sozkz-corpus-balanced-kk-gpt2-v1
Viewer • Updated • 480k • 19 -
stukenov/sozkz-corpus-tokenized-kk-llama50m-v1
Viewer • Updated • 5.9M • 22 -
stukenov/sozkz-core-llama-30m-kk-base-v1
Text Generation • 33.5M • Updated • 3 -
stukenov/sozkz-core-llama-50m-kk-base-v1
Text Generation • 50.3M • Updated • 7
Kazakh SLM
Small Language Models for Kazakh: models, tokenizers, and datasets for Kazakh language modeling.
Kazakh GEC: Grammar Error Correction
Kazakh grammatical error correction — 13 progressive training runs on mT5-small and mT5-base.
Soz: Kazakh Language Models from Scratch
Building foundational language models for Kazakh — models, tokenizers, and training corpora.
-
stukenov/sozkz-corpus-balanced-kk-gpt2-v1
Viewer • Updated • 480k • 19 -
stukenov/sozkz-corpus-tokenized-kk-llama50m-v1
Viewer • Updated • 5.9M • 22 -
stukenov/sozkz-core-llama-30m-kk-base-v1
Text Generation • 33.5M • Updated • 3 -
stukenov/sozkz-core-llama-50m-kk-base-v1
Text Generation • 50.3M • Updated • 7