A ~1M Parameter Model
with 2K Context
TinyMemoryLM is a Mythos-style recurrent-depth transformer trained on RTX 5090. Features a Prelude → shared recurrent block → Coda architecture with depth-wise LoRA, SleepGate periodic memory consolidation, and latent thinking layers on larger tiers. Small model, surprisingly stubborn.
Want to help test new architectures before they ship? AIExpermentLab is our public playground where anyone can contribute — open an issue to suggest a new architecture or open a PR to help build it. All experiments feed into what eventually becomes a released model.
Contribute on GitHub →In Partnership With
Our Competitors
These are the labs we're watching. The ones pushing the envelope. The ones we hope never exceed our compute budget.
Know another competitor we should be monitoring? Open a community discussion.
Download CompactAI Studio
Run our AI models locally on your machine. Chat with models, browse available models, and download them for offline use.
Built with Electron.
Architecture Features
A fresh take on the transformer architecture.
Mythos Recurrent-Depth Architecture
Prelude layers → a single shared recurrent block (looped N times with depth-wise LoRA) → Coda layers. Glint runs 4 loops; Shard and Prism run 8. LoRA adapters inject per-loop variation into the shared block so each pass sees different weights without adding full layer parameters.
SleepGate (FANT-style Memory)
A cap-slot memory buffer with a zero-initialized output gate. Every N steps the model consolidates memory: similar entries merge (threshold 0.99 for Glint), stale entries are evicted by age. Glint uses 64 memory slots and 4 heads. TRIM-KV retention eviction is disabled at this scale — age-based circular overwrite is sufficient.
Latent Thinking Layers
COCONUT-style continuous chain-of-thought: dedicated global-attention transformer blocks that run after the main stack. Shard gets 6 think layers; Prism gets 7. A depth incentive penalizes low cosine similarity between think-block input and output, pressuring the model to actually transform its representations rather than pass through lazily.
RTX 5090 Optimized
Tuned for RTX 5090 with flash attention, bf16 mixed precision, and auto batch tuning targeting 95% VRAM utilization. NVFP4 block-scaling quantization via TransformerEngine for Shard and Prism. Glint runs without quantization or gradient checkpointing — stability over speed.
Looping Regularization
OpenMythos-style protection against loop collapse. A regularization loss penalizes cosine similarity between the prelude and coda block outputs, preventing the recurrent block from degenerating to a no-op. Lambda 0.05 for Glint, 0.08 for Shard, 0.1 for Prism — scaled to roughly 2% of typical CE loss.
The Architecture
Prelude processes input. Shared recurrent block loops with LoRA adapters. Coda cleans up. SleepGate remembers the important parts.
Model Series
Three tiers. Different sizes, same architecture family. We named them ourselves and we are quite pleased about it.
Lightweight and experimental. Updated frequently. The scrappy underdog with SleepGate memory.
Balanced and stable. 6 latent thinking layers. The responsible middle child who actually thinks.
Maximum quality. 7 latent thinking layers. The overachiever who never sleeps and actually does the work.
Flare-TTS
A text-to-speech model built and trained by our partner LH Tech AI. 28M parameters. Available now on Hugging Face.
Flare-TTS-28M
A compact TTS model trained from scratch by LH Tech AI, released under the CompactAI-O org. 28M parameters. Built to be small, fast, and usable on consumer hardware. Credit goes entirely to LH Tech AI for the build and training.
Built by LH Tech AI · Hosted under CompactAI-O · 28M parameters
Sample Output
What happens when you train on English text and hope for the best.
AIFinder
A tool that snitches on AI models. Every AI has a writing accent. AIFinder detects it.
Which AI Wrote This?
Paste any AI-generated text and AIFinder will guess which lab made it. Google, Anthropic, OpenAI, DeepSeek, xAI, and more. It learns from corrections. The more you use it, the smarter it gets.
Free API available | 60 requests/min | No API key required
YES WE KNOW IT SUCKS
The tool guesses wrong sometimes. It confuses Anthropic with OpenAI. It confidently identifies Google as DeepSeek. It is basically a parrot with an opinion.
Pro tip: Ask it math and reasoning questions. We trained it on huge amounts of TeichAI datasets (check them out at huggingface.co/TeichAI). It is noticeably better at detecting which math-happy lab produced the output.
That said, I have an AI working on fixing it. I could not be bothered to do it manually.
7+ hours
The AI is trying its best. Poor thing.






