Naman Vats's picture

Building on HF

Naman Vats PRO

namanvats

·

AI & ML interests

Make Open Source AI win

Recent Activity

reacted to anakin87's post with ❤️ about 2 hours ago

How LLM training with RL Environments works? It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀 - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s) Consider a more complex tic-tac-toe env ❌⭕ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions (envs can also include tools) --- What happens at training? We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env No critic model needed, the group is the baseline Simpler than PPO 1️⃣ Rollout generation: from the same board, model plays N games via sampling 2️⃣ Each game scored with deterministic rewards (win, format, ...) 3️⃣ Mean score computed across the group 4️⃣ Each rollout's advantage = its score minus the group mean 5️⃣ Model updated to favor trajectories above baseline 🔁 Repeat For a deep dive, check out 🌱 https://github.com/anakin87/llm-rl-environments-lil-course a free hands-on course on RL environments for LLMs

reacted to ajibawa-2023's post with 👍 about 11 hours ago

Go-Code-Large Dataset: https://huggingface.co/datasets/ajibawa-2023/Go-Code-Large Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering. By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend services—domains where Go is widely adopted. Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design.

reacted to mrmanna's post with 👀 about 12 hours ago

𝗔𝗜 & 𝗦𝗧𝗔𝗧𝗘 𝗠𝗔𝗖𝗛𝗜𝗡𝗘 𝘞𝘩𝘺 𝘗𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘉𝘦𝘨𝘪𝘯𝘴 𝘞𝘩𝘦𝘳𝘦 𝘛𝘰𝘺 𝘈𝘨𝘦𝘯𝘵𝘴 𝘌𝘯𝘥 Published: 18 Apr 2026 | Towards AI Publication | Medium Open Link: https://medium.com/towards-artificial-intelligence/ai-state-machine-106387406c5a?sk=047b2f064c673a0095a9e8cc011b6a92 We talk a lot about governance, accuracy, and auditability in AI agents. But I keep seeing a gap between the words and the engineering behind them. Many agents have tools, orchestration, memory, graphs, and impressive demos. But when you ask how governance is actually enforced, the answer is often weak. Prompt-level control is not production governance. A production agent needs explicit state design: legal transitions, controlled progression, recovery paths, approval boundaries, and separation between memory, decision, policy, and execution. This article explores the silent crisis unfolding in modern AI development: the urgent need to resurrect the disciplined architecture of state machines

View all activity

Organizations

No public activity