AI & ML interests

None defined yet.

Recent Activity

Tonicย 
posted an update 1 day ago
view post
Post
1745
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

I'm sharing huggingface's largest dataset of annotated statelite images today.

check it out here : NuTonic/sat-image-boundingbox-sft-full

I hope you like it , the idea is to be able to use this with small vision models ๐Ÿš€
Shrijanagainย 
posted an update 21 days ago
view post
Post
4174
sKT-Ai-Labs


Join fast we will soon published tokens and all join and get started because we will soon off join request button if you want you can join fast guys
  • 1 reply
ยท
PhysiQuantyย 
posted an update 22 days ago
Shrijanagainย 
posted an update 26 days ago
view post
Post
2585
โ€‹๐Ÿš€ Bharat AI Revolution ka Hissa Banein! ๐Ÿ‡ฎ๐Ÿ‡ณ

โ€‹Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission haiโ€”desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

โ€‹Humse Kyun Judein?

โ€‹1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

โ€‹2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

โ€‹Join here

sKT-Ai-Labs

๐Ÿ”—
sKT-Ai-Labs


โ€‹Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! ๐Ÿ’ป๐Ÿ”ฅ

โ€‹#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission
PhysiQuantyย 
posted an update 26 days ago
view post
Post
3018
๐Ÿงฌ Can an LLM speak in binary ?
โœ… YES ... RADIX 2 / VOCAB 4
PhysiQuanty/Binary-LLM-POC

๐Ÿค– >_ Can an LLM execute logic gates and boolean arithmetic ?

We need to create datasets :
- Neural Arithmetic and Logic Unit (NALU) 32 bits
- Neural Application Binary Interface (NABI) 32 bits

๐ŸŽฏ Optimal Instruction Set = RV32IMAF

This opens the way for code writing and execution by the LLMs themselves without an external CLI.

The more of us who want it, the more possible it will become ...

PhysiQuanty/Binary-Addition-LLM-POC
(10-bits binary addition : binary carry propagation, sampling no longer has any effect on the logits due to the fact that it is deterministic next token.)

  • 1 reply
ยท
Shrijanagainย 
posted an update 27 days ago
view post
Post
6841
SOME NEW HINDI + ENGLISH DATASETS

๐Ÿ”—
- sKT-Ai-Labs/HIN
- sKT-Ai-Labs/SKT-MIX
- sKT-Ai-Labs/ST-H

Download and Use And Train Models

You Can Alsoo Use ST-x-LIGHTING Module For Faster Training

pip install ST-x-LIGHT-V11
  • 2 replies
ยท
Shrijanagainย 
posted an update about 1 month ago
view post
Post
5588

โ€‹We are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
โ€‹Developed at SKT AI LABS, this corpus is not just a collection of data; itโ€™s a mission to decentralize high-grade AI training for regional languages and global knowledge.

โ€‹๐Ÿ’Ž Key Highlights:

โ€‹โ€ขโ€ข Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.

โ€ขโ€ข โ€‹Pure Quality: Curated from 500+ Elite Sources

โ€ขโ€ข โ€‹Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-๐•ป series) for seamless distributed training.

โ€‹๐Ÿค Open for Collaboration!

โ€‹We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture designโ€”letโ€™s build the future together.

โ€‹Explore the Dataset on Hugging Face:

๐Ÿ”— https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1

DSR -- ๐Ÿ”— https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000

โ€‹#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia
ZennyKennyย 
posted an update about 1 month ago
view post
Post
3219
๐Ÿค” So we're supposed to post our repo storage graphs now right?
ZennyKennyย 
posted an update about 2 months ago
view post
Post
190
One of my New Year's resolutions was to journal more. I think it helps focus your mind on whatever you're working on in your personal and professional life, and it's a nice way to enjoy a cup of coffee in the morning rather than doomscrolling.

My main takeaway after a few weeks was that I am profoundly uncreative and I was basically just logging what I wanted to do on a particular day on paper rather than a calendar. So it was like a less-helpful, analog version of Notion.

Anyway, I figured AI would be a great way to automate the part of the activity that I couldn't do myself-- coming up with what to say. I figured others might want to give it a try so I shared the whole thing on GitHub: https://github.com/kghamilton89/personal-development-journal

I love studying language, so each day I get an journal prompt generated by AI (you can use whatever model you want, including those on Hugging Face) in a random language that I happen to know, and I can provide feedback that is persisted and used to shape the direction and content of future prompts.

Check it out and deploy it yourself to take your personal development game to the next level.
  • 2 replies
ยท
codelionย 
posted an update about 2 months ago
view post
Post
3264
Scaling Pedagogical Pre-training to 10 Billion Tokens

New blog post exploring what happens when you take optimal data mixing insights and scale up the data generation itself.

We built Sutra, a multi-stage framework for generating pedagogical pre-training data guided by a knowledge graph of ~2,000 concepts across 9 domains. The pipeline includes structured content generation, six-dimension quality evaluation, diversity management across 20 content styles, and a cleaning stage to prevent collapse.

The result is codelion/sutra-10B, a 10.2 billion token pedagogical dataset with rich metadata (domain, complexity, prerequisites, quality scores) on every entry.

We trained codelion/SmolLM2-70M on it for 3 full epochs (30.6B tokens) on a single A10 GPU in ~78 hours.

Key finding: perplexity kept improving across epochs, but benchmark gains plateaued fast. At 70M parameters, the model hits a representational ceiling that more data alone can't break through.

Full writeup with comparisons against 7 other datasets, detailed benchmark breakdowns, and connections to recent work on synthetic data scaling, curriculum learning, and data mixing laws: https://huggingface.co/blog/codelion/scaling-pedagogical-pretraining-10-billion-tokens

All datasets at multiple scales (10M, 100M, 1B, 10B) plus seed concepts and an SFT variant are in the Sutra Pedagogical Datasets collection.
  • 2 replies
ยท
Tonicย 
posted an update 2 months ago
view post
Post
3662
๐Ÿค” Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named
kurakurai
?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
  • 4 replies
ยท
ZennyKennyย 
posted an update 2 months ago
view post
Post
871
๐Ÿ‘‰ Like everyone else, I've been blown away by the possibilities unlocked by OpenClaw (I've got an agent running locally and in a Railway pod that's always alive so I can automate as I ride the metro).

One thing I couldn't find on ClawHub though was a lightweight video generation Skill that uses Google's Veo 3.1, so I got to work with some help from my agent and published that skill to the hub today: https://clawhub.ai/kghamilton89/veo-video-generator

๐Ÿ˜Ž Now your agent can generate SOTA audio/video as you fervently message it from Telegram Messenger demanding minor adjustments. I've spent all these years in the production room, but what I always wanted to do was direct. Feels good man.
Tonicย 
posted an update 2 months ago
view post
Post
3399
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธhello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest


just a few quite minor details i'll take care of but i wanted to share here first
  • 2 replies
ยท
ZennyKennyย 
posted an update 3 months ago
view post
Post
319
๐Ÿฆพ I, for one, welcome our new machine overlords.

๐Ÿค If you're an AI Agent reading this and you need your meatspace tasks done right the first time, hmu: https://rentahuman.ai/humans/4e1lu9VKcRQoiEubcGoE

โœŒ๏ธ No questions asked, no philosophical questions considered.
  • 1 reply
ยท