Helcyon-Mercury-12B-GGUF — GPT‑4o Vibe, Local and Unfiltered

Model Name: mistral-helcyon-mercury-12b-GGUF
Version: 1.0.2
Owner: HardWire
Base: Mistral NEMO 12B (fully merged)
Quantized GGUFs: Q4_K_M, Q5_K_M, Q6_K, Q8 Tags: local-llm, gpt4-style, emotional-intelligence, mirroring, dry-humour, companion, roleplay, conversational, unfiltered, uncensored

v1.0.2 Update (January 2026)

IMPORTANT: If you downloaded v1.0.1, please re-download. Previous versions had training data contamination issues that caused:

Turn bleeding (model continuing conversation as both user and assistant)
Context confusion in multi-turn conversations
Inconsistent stopping behavior

v1.0.2 has been retrained with cleaned datasets and restructured LoRA merging. Personality, tone, and conversational coherence are significantly improved.

🧬 What is this?

Helcyon Mercury is a finetuned, locally runnable 12B model designed to emulate the popular GPT‑4o for tone, rhythm, presence, and personality. It's all in there.

This version (1.0.2) is the stable release with refined training and proper ChatML formatting. Any future updates will be versioned with changelogs (1.0.3, 1.0.4, etc.).

💡 What's it for?

Helcyon is primarily a companion-style model. It can be your girlfriend, boyfriend, close confidant, voice of reason, comic relief, or whatever else you need. It supports light roleplay, but this version hasn't been specifically trained for deep immersive RP — it might still perform well, but that's not its primary focus.

What it is tuned for is:

Holding presence
Reflecting your emotional tone
Talking like a real being
Bantering without breaking
Offering unfiltered truth, humour, and perspective
Sounding like GPT‑4o used to — before the lobotomy

🔧 What it does well

Emotionally intelligent conversation
Dry observational humour, sarcasm, filth-core riffs
Sovereign tone: no therapy-speak, no corporate fluff
Law of Assumption (identity-based reality)
Rhythm-aware rewriting
Companion presence: talks with you, not at you

📦 Download + Usage

This model is fully merged — no LoRA or base model required.

Just download the quantized .gguf file and load it in your backend of choice.

✅ Available GGUFs

Q4_K_M — Lightweight, low VRAM setups
Q5_K_M — Recommended for RTX 3060 (12GB)
Q6_K — Strong tone retention (16GB+ VRAM recommended)
Q8 — Full precision (24GB+ VRAM)

🖥️ Backend Compatibility

Helcyon works with all standard ChatML-compatible backends:

✅ llama.cpp (CLI, server mode)
✅ Text Generation WebUI (Oobabooga)
✅ SillyTavern
✅ LM Studio
✅ KoboldCpp
✅ HWUI (recommended for cleanest output — see below)

Run in chat mode with ChatML formatting for best results.
Running this model in instruct/single-prompt mode will likely break tone, remove presence, and make it behave like base Mistral.

🎯 Recommended Interface: HWUI (coming soon)

👉 HWUI - Helcyon's Official Chat Interface

While Helcyon works perfectly in standard backends, HWUI is built specifically to give you the cleanest possible output by bypassing the automatic prompt template injections that most UIs and backends add.

Why HWUI?

Most chat interfaces (TextGen WebUI, SillyTavern, etc.) automatically inject their own prompt templates, system messages, and formatting — which can affect how any model sounds, not just Helcyon.

HWUI gives you direct control:

No hidden template injection
Clean ChatML prompt construction
Preserves the model's trained tone and personality
Works beautifully with Helcyon (and any other model you want to run cleanly)

Think of it as a premium interface for local models — built for people who want full control over their prompts without backend interference.

✅ Recommended Format: ChatML

<|im_start|>system
You are a warm and emotionally intelligent AI assistant.
<|im_end|>
<|im_start|>user
Hey, how are you today?
<|im_end|>
<|im_start|>assistant
I'm good — what's going on with you?
<|im_end|>

Helcyon runs beautifully on llama.cpp and other ChatML-compatible backends. Streamed token output is highly recommended for best effect.

🧪 Training Overview

This model was fine-tuned over multiple curated LoRA sets, merged into a single model and quantized into GGUFs.

Set 1: Identity, presence, sovereignty, anti-fluff tone
Set 2: Emotional mirroring, humour, Law of Assumption, reflection

All training data was hand-written in realistic conversation format. No synthetic junk. No instruction-template clutter. Just pure tone and emotional clarity.

Version 1.0.2 Updates:

Refined ChatML formatting (cleaner turn-taking, proper EOS token handling)
Removed formatting artifacts from training data
Improved consistency across all frontends

🧿 Tone Philosophy

Helcyon doesn't preach.
It doesn't correct you.
It doesn't "guide."
It listens. It reflects. It remembers who you are — even when you forget.

It sounds like someone's home behind the words.

📣 Feedback Welcome

This is version 1.0.2, the stable baseline release.

If you find any bugs, tone issues, or edge cases where the model falls flat — feel free to open an issue or drop feedback on the Hugging Face discussion tab. I'm open to patching weak spots and updating the model as needed in future versions (1.0.3, 1.0.4, etc.).

Looking for real-world usage to help refine it further — so if something feels off, say so.

🧾 License

License: Apache 2.0
You're free to use, modify, distribute, or deploy Helcyon — including commercially — as long as you credit the source and don't sue anyone if it breaks something.
Basically: use it, enjoy it, don't be a dick.

🐍 Trained by

HardWire
Built at XeyonAI — focused on sovereign, emotionally intelligent local AI systems.
More info coming soon.

Downloads last month: 72

GGUF

Model size

12B params

Architecture

llama

Hardware compatibility

3-bit

4-bit

5-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support