Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AI 
posted an update 1 day ago
view post
Post
3476
🌍 World Model Bench — does your world model actually think?

FID measures realism. FVD measures smoothness. But neither tells you whether the model understood the scene.

We just released WM Bench — the first benchmark for cognitive intelligence in world models. The core question: when a beast charges from 3 meters away, does the model know to sprint — not walk? Does it respond differently to a human vs an animal? Does it remember the left corridor was blocked two steps ago?

Those are cognitive questions. No existing benchmark asks them. So we built one.

3 Pillars · 10 Categories · 100 Scenarios · 1,000-point scale

- 👁 P1 Perception (25%) — Can it read the scene?
- 🧠 P2 Cognition (45%) — Does it predict threats, escalate emotions, utilize memory?
- 🔥 P3 Embodiment (30%) — Does the body respond with the right motion?

All evaluation is via simple JSON I/O — no 3D engine, no special hardware. Any model with an API can participate.

We also built PROMETHEUS as a live reference implementation — runs in your browser on a T4, no install needed. Combines FloodDiffusion motion generation with a LLM cognitive brain (Perceive → Predict → Decide → Act). Scored 726/1000 (Grade B) on Track C — the only directly verified model so far. Submissions from other teams very welcome.

---

🗂 Dataset → FINAL-Bench/World-Model
🌍 Demo → FINAL-Bench/World-Model
🏆 Leaderboard → FINAL-Bench/worldmodel-bench
📝 Article → https://huggingface.co/blog/FINAL-Bench/world-model

Part of the FINAL Bench Family — alongside FINAL Bench (Feb 2026). Feedback on rubrics and missing models always welcome!
Shrijanagain 
posted an update 1 day ago
DedeProGames 
posted an update 1 day ago
view post
Post
707
Introducing GRM2, a powerful 3 billion parameter model designed for long-term reasoning and high performance in complex tasks.

Even with only 3 billion parameters, it outperforms qwen3-32b in several benchmarks and complex reasoning tasks.

With just 3 billion parameters, it can also generate extensive and complex code with over 1000 lines, utilize tools comparable to larger models, and is perfect for agentic tasks.

GRM2 is licensed under Apache 2.0, making it ideal as a base for FineTune in other tasks.

GRM2 Model Page: OrionLLM/GRM2-3b
Official GRM2 GGUFs Quantizations: OrionLLM/GRM2-3b-GGUF
reaperdoesntknow 
posted an update 3 days ago
view post
Post
3196
We present a methodology for training small language models on CPU at FP32 precision
that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training.
Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross-
architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language
models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces-
sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper-
iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for
FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate
per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum
training progressing from language to logic to transfer to depth; (4) continuous belt-fed data
ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via
AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with
emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard
compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that trans-
formers were designed for GPU hardware rather than mathematical optimality, and that archi-
tectures designed for geometric correctness—metric-space attention, triangle inequality enforce-
ment, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models,
CPU training produces more capable models at a fraction of the cost.
  • 6 replies
·
Shrijanagain 
posted an update about 12 hours ago
view post
Post
681
​🚀 Bharat AI Revolution ka Hissa Banein! 🇮🇳

​Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission hai—desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

​Humse Kyun Judein?

​1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

​2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

​Join here

sKT-Ai-Labs

🔗
sKT-Ai-Labs


​Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! 💻🔥

​#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission
Ujjwal-Tyagi 
posted an update about 16 hours ago
view post
Post
1162
I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection
  • 3 replies
·
shriarul5273 
posted an update about 21 hours ago
view post
Post
1809
🚀 Releasing gradio-sync3dcompare v0.0.22 — a Gradio custom component for synchronized 3D model comparison

🔁 One component. Side-by-side. Perfectly in sync.

✨ What's included

🗂️ Supports GLB and PLY files
🔵 Renders as point clouds or native meshes
🎥 Synchronized orbit, zoom, and pan across all viewports
📐 Auto point sizing with manual override
🔍 Configurable zoom range and reset controls

📦 pip install gradio-sync3dcompare

🛠️ Built on Gradio 6.10.0 — drops into any gr.Blocks app with a single import.

🤗 Try the live demo on Hugging Face Spaces: shriarul5273/gradio_sync3d_compare

⭐ GitHub: https://github.com/shriarul5273/Sync3DCompare


🎬 See it in action in the video below.
The video shows a real-world comparison of two 3D point clouds reconstructed from stereo depth estimation — one from FoundationStereo and one from RAFTStereo. Both models are exported as GLB files directly from the depth output and loaded side-by-side into the component. Every orbit, zoom, and pan is perfectly mirrored across both viewports, making it easy to spot structural differences between the two reconstructions at any angle.

💬 Feedback on supported formats, rendering features, or comparison workflows is very welcome!
PhysiQuanty 
posted an update 1 day ago
view post
Post
1732
🧬 Can an LLM speak in binary ?
✅ YES ... RADIX 2 / VOCAB 4
PhysiQuanty/Binary-LLM-POC

🤖 >_ Can an LLM execute logic gates and boolean arithmetic ?

We need to create datasets :
- Neural Arithmetic and Logic Unit (NALU) 32 bits
- Neural Application Binary Interface (NABI) 32 bits

🎯 Optimal Instruction Set = RV32IMAF

This opens the way for code writing and execution by the LLMs themselves without an external CLI.

The more of us who want it, the more possible it will become ...

PhysiQuanty/Binary-Addition-LLM-POC
(10-bits binary addition : binary carry propagation, sampling no longer has any effect on the logits due to the fact that it is deterministic next token.)

  • 1 reply
·
reaperdoesntknow 
posted an update 1 day ago
view post
Post
657
# Three Teachers, One Student: Dual-Cognition Reasoning at 1.7B

We distilled Qwen3-30B-A3B into 1.7B students that critique their own reasoning. H100, BF16, Apache 2.0. Here's our pipeline.

**Stage 1 — Three Teachers, Three Profiles.** Same 30B base, three variants: Instruct (structured output), Thinking (extended deliberation), Coder (STEM decomposition). Each distillation uses proof-weighted KD — 2.25× amplified loss on reasoning tokens, decaying to 1.1×. The student learns *where to think harder*, not just what to output.

**Stage 2 — Topology-Aware KD (TKD).** Standard KD treats the teacher's distribution as smooth. Language isn't smooth — it has topic shifts, reasoning pivots, register changes. We use Discrepancy Calculus to detect these structural boundaries, then amplify loss at jumps (3σ threshold) and cut training windows at low-discrepancy positions. The student preserves the teacher's structural knowledge, not just surface statistics.

**Stage 3 — Ghost Imprinting.** Sequential distillation from different teachers leaves residual fields in weight space that neither teacher put there individually. The Cantor component of BV decomposition, applied to parameters. Models distilled Thinking→Coder exhibit deliberation patterns from the Thinking teacher that survived Coder overwriting. Emergent capability from structural residuals.

**Stage 4 — DualMind.** One model, two voices, shared weights:
<explore>  — free derivation, speculation
<examine>  — adversarial self-critique
<response> — clean synthesis

The multi-model collision array collapsed into a single architecture. Role tokens, no extra parameters.
For the full method:
reaperdoesntknow/DualMind_Methodolgy
doi:10.57967/hf/8184.

  • 1 reply
·
salma-remyx 
posted an update 2 days ago
view post
Post
1365
How do you find ideas to try next?
I'm tracking multiple topics tied to the projects we're building at Remyx. Every morning I get a feed of papers ranked by relevance to those topics.
No more good ideas lost because they didn't trend on X.

Build your own feed for free: https://engine.remyx.ai
Read more: https://docs.remyx.ai/resources/explore