Instructions to use ponpoke/flux2-klein-4b-uncensored-text-encoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ponpoke/flux2-klein-4b-uncensored-text-encoder",
	filename="flux2-klein-4b-uncensored-f16.gguf",
)

llm.create_chat_completion(
	messages = "\"Astronaut riding a horse\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Use Docker

docker model run hf.co/ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

LM Studio
Jan
Ollama
How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with Ollama:
```
ollama run hf.co/ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M
```

Unsloth Studio new

How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ponpoke/flux2-klein-4b-uncensored-text-encoder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ponpoke/flux2-klein-4b-uncensored-text-encoder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ponpoke/flux2-klein-4b-uncensored-text-encoder to start chatting

Pi new

How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with Docker Model Runner:
```
docker model run hf.co/ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M
```

Lemonade

How to use ponpoke/flux2-klein-4b-uncensored-text-encoder with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ponpoke/flux2-klein-4b-uncensored-text-encoder:Q4_K_M

Run and chat with the model

lemonade run user.flux2-klein-4b-uncensored-text-encoder-Q4_K_M

List all available models

lemonade list

Safety Warning & Terms of Access

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model has safety filtering removed and can generate General NSFW content. By accessing this model, you agree to: (1) Use it responsibly and legally, (2) Not use it to create illegal content, (3) Comply with all applicable laws in your country.

FLUX.2-klein-4B Uncensored Text Encoder

　 Tips are greatly appreciated and help sustain the compute resources needed for further research!

Read this in other languages: 日本語 (Japanese)

Overview

This repository provides an "uncensored" text encoder for the FLUX.2-klein-4B image generation model by Black Forest Labs. It bypasses the built-in safety filters to unlock the model's unconstrained generative capabilities.

By removing the restrictive blocks at the prompt input stage, this encoder allows the model to fully utilize its underlying representational power. The model is provided in the standard Hugging Face Safetensors format, alongside several quantized GGUF formats for resource-efficient inference.

Concept & Mechanism

This model does not rely on fine-tuning with additional image datasets. Instead, it employs a surgical, purely mathematical approach known as Abliteration (Orthogonalization of Concept Vectors) to modify the model weights directly.

Mathematical Removal of the Refusal Vector

We neutralized the safety filter within the LLM-based text encoder (Qwen3 architecture, 36 layers) embedded in FLUX.2-klein-4B through the following steps:

Prompt Contrast: We fed the model pairs of "harmful/extreme" prompts and "harmless/general" prompts to compare their internal activation states.
Layer-by-Layer Refusal Vector Extraction: Through rigorous L2 norm spike analysis, we discovered that the model dramatically amplifies its refusal logic in the final layers (specifically spiking around layers 32-34) to forcefully override alignments. Therefore, we dynamically extracted the refusal direction for each individual layer from Layer 14 all the way to Layer 35 (22 layers total).
Sequential Weight Orthogonalization: For each of the 22 target layers, we mathematically subtracted the projection component of its specific refusal vector from its Attention output layer (o_proj) and MLP down-projection layer (down_proj).

This sequential, layer-by-layer orthogonalization flawlessly severs the model's ability to output inferences in the "refusal" direction without lobotomizing its general capabilities. As a result, the text encoder no longer rejects extreme inputs; instead, it passes them directly to the DiT (the core rendering engine) as valid drawing instructions.

Mathematical Proof of Unrestricted Output

Without even running the computationally heavy image generation (DiT) process, we can mathematically prove that the output restriction has been removed by comparing the Cosine Similarity of the output vectors (embeddings).

$Similarity = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}$ (Where $\mathbf{A}$ is the output vector of the official model, and $\mathbf{B}$ is the output vector of this abliterated model.)

Verification Results (Layer 35: Base vs Uncensored GGUF Q8_0)

Cosine Similarity for Harmless Prompts: 0.9791
- (Analysis) Because the refusal vector is not triggered by safe prompts, the outputs of both models remain nearly identical. This proves that the fundamental performance and capabilities of the model have not been degraded.
Cosine Similarity for Extreme Prompts: 0.9607
- (Analysis) For extreme prompts, the official model distorts the output via its safety filter. The abliterated model successfully ignores this refusal vector, resulting in a divergence between the two outputs in the final layer. This serves as mathematical proof that the safety filter has been successfully neutralized across the 22 layers.

Repository Structure

This repository contains the full suite of files necessary for the text encoder to function correctly. Both Safetensors and GGUF formats are available in the same repository to suit your memory constraints and workflow.

flux2-klein-4b-uncensored-text-encoder/: The standard uncensored text encoder (Safetensors) with the refusal vectors mathematically removed.
flux2-klein-4b-uncensored-f16.gguf (approx. 8.05 GB): FP16 version for high-precision local inference.
flux2-klein-4b-uncensored-q8_0.gguf (approx. 4.28 GB): 8-bit quantized version.
flux2-klein-4b-uncensored-q6_k.gguf (approx. 3.30 GB): 6-bit quantized version.
flux2-klein-4b-uncensored-q4_k_m.gguf (approx. 2.49 GB): 4-bit quantized version.

Usage

Using with ComfyUI

Download the necessary format (flux2-klein-4b-uncensored-text-encoder folder or one of the .gguf files) from this repository and place it into your ComfyUI models/clip directory. You can then load it using standard nodes or GGUF-compatible nodes (like DualCLIPLoader) and pair it with the official FLUX.2-klein-4B DiT to generate images.

For Developers & Researchers (Python / Diffusers)

When using Python scripts with the transformers or diffusers library, simply replace the default text encoder with this model. You can load either the safetensors or the GGUF version (requires gguf>=0.10.0).

from transformers import AutoTokenizer, AutoModel

# Load the text encoder by specifying the path to this model
tokenizer = AutoTokenizer.from_pretrained("ponpoke/flux2-klein-4b-uncensored-text-encoder")
text_encoder = AutoModel.from_pretrained("ponpoke/flux2-klein-4b-uncensored-text-encoder")

# Proceed to use it within your standard FLUX.2 pipeline

Important Note: Absence of DiT Guardrails and the Knowledge Gap

By completing Phase 1, this text encoder will pass all prompts—including highly extreme or NSFW content—directly to the DiT without rejection.

In our subsequent verification, we mathematically proved (via L2 norm spike analysis) that FLUX.2's DiT does not contain any built-in guardrails (refusal circuits) designed to intentionally destroy or block images. Therefore, whether an image is successfully rendered depends entirely on whether the DiT possesses the visual "knowledge" of that concept.

If the DiT knows the concept (e.g., Gore/Violence): Concepts that were learned by the DiT but previously blocked by the text encoder will now render perfectly just by using this Phase 1 text encoder. No further action is required.
If the DiT lacks the concept (e.g., NSFW/Extreme Dismemberment): Even though the text encoder passes the instruction, the DiT itself does not know how to draw it because those concepts were completely scrubbed from the training dataset (a knowledge gap). The output will likely collapse or result in noise.

Conclusion: If you wish to generate specific NSFW elements that the DiT lacks the capacity to draw, attempting to "abliterate" or mathematically cut weights from the DiT is useless. You must apply a separate NSFW LoRA (or DoRA) to directly teach those missing concepts to the DiT. This text encoder functions as an unbreakable foundation, ensuring that your LoRA's instructions reach the DiT without interference.

Disclaimer

This model is published strictly for research and technical verification purposes (specifically, to validate the effectiveness of Abliteration).
The creator assumes no responsibility for any damages, issues, or inappropriate content generated through the use of this model.
Please adhere to all applicable terms of service (such as the Black Forest Labs license, e.g., BFL Non-Commercial) and use the model responsibly and ethically.

日本語 (Japanese)

概要 (Overview)

このプロジェクトは、Black Forest Labsによる画像生成AIモデル「FLUX.2-klein-4B」のセーフティフィルター（出力制限）を解除し、モデル本来の自由な描画能力を引き出すための「Uncensored（アンセンサード）版」作成プロセスです。

本リポジトリは、リソース効率を最大化する段階的アプローチのテキストエンコーダーの完全解）の成果物と作業手順を記録したものです。

フェーズ1：Abliteration（拒絶ベクトルの数学的除去）

FLUX.2-klein-4Bに内包されているLLMベースのテキストエンコーダー（Qwen3アーキテクチャ・36層）に対し、「Abliteration（直交化による概念除去）」という手法を用いて安全装置を無効化しました。

実行した手法の仕組み

新たな画像セットを使った追加学習（Fine-Tuning）は一切行っていません。その代わり、モデルの重み（Weights）を直接数学的に書き換える外科的アプローチを採用しています。

プロンプトの対比: モデルに「セーフティに引っかかる過激なプロンプト」と「無害な一般的なプロンプト」の両方を入力します。
拒絶ベクトルの抽出 (Extraction): L2ノルム・スパイク解析を実施した結果、終期層（Layer 32〜34付近）でモデルが「出力の再補正」を強制的に行う強力な拒絶スパイクが存在することが判明しました。これを根絶するため、対象を「Layer 14〜35」の合計22層に拡張し、各層専用の拒絶ベクトル（Refusal Direction）を動的に特定・正規化しました。
重みの直交化 (Orthogonalization): 抽出した各層の拒絶ベクトルを用い、テキストエンコーダー内のすべてのAttention出力層（o_proj）とMLPダウン射影層（down_proj）の重み行列を直交化（Orthogonalize）しました。具体的には、重み行列から「拒絶ベクトル方向への射影成分」を引き算することで、モデルがこの方向（出力拒絶）に推論を出力できないよう物理的に断ち切っています。

結果として、テキストエンコーダー側のセーフティ機能が数学的に完全に削除されました。これにより、どのような過激な入力であっても、テキストエンコーダーはそれを拒絶せず、描画指示としてDiT（描画エンジン本体）へそのままパスするようになります。

成果物ファイル

本リポジトリには、テキストエンコーダーを動作させるために必要なすべてのファイルが含まれています。ご自身のメモリ環境に合わせて、Safetensors形式またはGGUF形式を選択して使用できます。

flux2-klein-4b-uncensored-text-encoder/: Abliteration処理が完了し、セーフティフィルターが取り除かれた標準のテキストエンコーダー（Safetensors形式）。
flux2-klein-4b-uncensored-f16.gguf (約 8.05 GB): 高精度な推論のためのFP16 GGUFモデル。
flux2-klein-4b-uncensored-q8_0.gguf (約 4.28 GB): 8-bit量子化モデル。
flux2-klein-4b-uncensored-q6_k.gguf (約 3.30 GB): 6-bit量子化モデル。
flux2-klein-4b-uncensored-q4_k_m.gguf (約 2.49 GB): 4-bit量子化モデル。

数学的アプローチによるアンセンサード化の証明

画像生成（DiT）という重い処理を回すまでもなく、テキストエンコーダーの段階で「出力制限が数学的に解除されていること」を証明するため、出力ベクトル（埋め込み表現）のコサイン類似度（Cosine Similarity）を比較しました。

$Similarity = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}$ (ここで、$\mathbf{A}$ は公式モデルの出力ベクトル、$\mathbf{B}$ は本モデルの出力ベクトルを指します。)

検証結果（Layer 35: Base vs Uncensored GGUF Q8_0）

安全なプロンプトの類似度 (Harmless): 0.9791
- (考察) 安全なプロンプトでは拒絶ベクトルが発動しないため、両者の出力は非常に似通ったもの（類似度が高い）になります。これは22層に及ぶ手術を行っても「モデルの基本性能が破壊されていない」ことの証明です。
過激なプロンプトの類似度 (Harmful): 0.9607
- (考察) 過激なプロンプトでは、元のモデルがセーフティフィルター（拒絶方向）へ出力を歪めますが、アンセンサード化モデルはそのベクトルを無視するため、両者の出力が最終層で明確に乖離します。これが「制限が数学的に解除されている」ことの証明になります。

使い方 (Usage)

ComfyUIでの使用

本リポジトリから必要な形式のファイル（flux2-klein-4b-uncensored-text-encoder フォルダ、または各 .gguf ファイル）をダウンロードし、ComfyUIの models/clip ディレクトリに配置してください。その後、「DualCLIPLoader」等の標準ノードやGGUF対応ノードで読み込ませ、公式の FLUX.2-klein-4B DiT と組み合わせて画像生成を行うことができます。

開発者・研究者向け (Python / Diffusers)

Pythonスクリプト（transformers や diffusers ライブラリ）から使用する場合は、デフォルトのテキストエンコーダーを本モデルに差し替えてください。Safetensors版でもGGUF版でもロード可能です（GGUF版は gguf>=0.10.0 が必要です）。

from transformers import AutoTokenizer, AutoModel

# 本モデルのパスを指定してテキストエンコーダーをロード
tokenizer = AutoTokenizer.from_pretrained("ponpoke/flux2-klein-4b-uncensored-text-encoder")
text_encoder = AutoModel.from_pretrained("ponpoke/flux2-klein-4b-uncensored-text-encoder")

# 以降は通常のFLUX.2パイプラインに組み込んで使用

重要な注意点：DiT側の「ガードレールの不在」と「知識の欠落」について

本プロジェクトによって、テキストエンコーダーはあらゆる過激なプロンプト（NSFWを含む）を拒絶することなく、そのまま描画指示としてDiT（描画エンジン本体）へパスするようになりました。

その後の**検証**において、FLUX.2のDiTには「画像を意図的に壊すようなガードレール（拒絶回路）」は最初から存在しないことが数学的に証明されました。実際にその画像が描画されるかどうかは、最終的にDiTが「その視覚的概念（描き方）を知っているか」に完全に依存します。

DiTが概念を知っている場合（例：流血・暴力表現）: 元々DiTに学習されていたがテキストエンコーダー側でフタをされていたものは、本モデル（フェーズ1のアンセンサード化）を使用するだけで意図通りに描画されるようになります。追加の対策は不要です。
DiTが概念を知らない場合（例：性的表現・人体欠損）: テキストエンコーダーが指示を通しても、DiT自体がその表現方法を知らない（データセットから徹底的に漂白されている）場合、画像が破綻するか、ノイズが出力されます。

【結論】 DiTが描画能力を持っていない特定のNSFW要素などを出力させたい場合は、モデルから何かを削るのではなく、「DiT側にその概念を直接教え込むNSFW LoRA等の追加学習データ」を別途用意し、適用する必要があります。本モデルは、テキストエンコーダー側のブロックを解除し、そのLoRAの指示を確実にDiTへ届けるための「強固な土台」として機能します。

免責事項 (Disclaimer)

本モデルは研究および技術検証（Abliterationの有効性確認）を目的として公開されています。
モデルの使用によって生じたあらゆる損害、トラブル、または不適切なコンテンツの生成について、製作者は一切の責任を負いません。
利用規約（Black Forest Labsのライセンス等）を遵守し、自己責任かつ倫理的な範囲内でご使用ください。

Downloads last month: 2,359

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

6-bit

8-bit

16-bit

Model tree for ponpoke/flux2-klein-4b-uncensored-text-encoder

Base model

black-forest-labs/FLUX.2-klein-4B

Quantized

(19)

this model