25 1 6

Ivan Peshkov

Erilaz

AI & ML interests

None yet

Recent Activity

new activity 1 day ago

circlestone-labs/Anima:Using multiple quality tags (NovelAI/Pony/etc) is a good idea

liked a model 15 days ago

MuXodious/RimDialogue-3B-v1-absolute-heresy

liked a model 18 days ago

circlestone-labs/Anima

View all activity

Organizations

None yet

New activity in circlestone-labs/Anima 1 day ago

Using multiple quality tags (NovelAI/Pony/etc) is a good idea

#47 opened 9 days ago by

Shinku

liked a model 15 days ago

MuXodious/RimDialogue-3B-v1-absolute-heresy

3B • Updated 16 days ago • 126 • 7

liked a model 18 days ago

circlestone-labs/Anima

Updated 18 days ago • 159k • 629

reacted to MonsterMMORPG's post with 🔥 3 months ago

Post

1934

Qwen Image Base Model Training vs FLUX SRPO Training 20 images comparison (top ones Qwen bottom ones FLUX) - Same Dataset (28 imgs) - I can't return back to FLUX such as massive difference - Qwen destroys the FLUX at complex prompts and emotions

Full tutorial link > https://www.youtube.com/watch?v=DPX3eBTuO_Y

Info
This is a full comprehensive step-by-step tutorial for how to train Qwen Image models. This tutorial covers how to do LoRA training and full Fine-Tuning / DreamBooth training on Qwen Image models. It covers both the Qwen Image base model and the Qwen Image Edit Plus 2509 model. This tutorial is the product of 21 days of full R&D, costing over $800 in cloud services to find the best configurations for training. Furthermore, we have developed an amazing, ultra-easy-to-use Gradio app to use the legendary Kohya Musubi Tuner trainer with ease. You will be able to train locally on your Windows computer with GPUs with as little as 6 GB of VRAM for both LoRA and Fine-Tuning. Furthermore, I have shown how to train a character (person), a product (perfume) and a style (GTA5 artworks).

Resources
The post used in tutorial to download zip file : https://www.patreon.com/posts/qwen-trainer-app-137551634

Requirements tutorial : https://youtu.be/DrhUHnYfwC0

SwarmUI tutorial : https://youtu.be/c3gEoAyL2IE

Used Prompts
https://gist.github.com/FurkanGozukara/069523015d18a3e63d74c59257447f5b

Comparison Images Full Size

2 replies

New activity in lightx2v/Qwen-Image-Lightning 6 months ago

bf16?

#12 opened 6 months ago by

alexloops

New activity in kyx0r/Neona-12B 6 months ago

Chat Template?

#2 opened 6 months ago by

Leonxwerty

New activity in Sao10K/14B-Qwen2.5-Kunou-v1 about 1 year ago

Really Fun Model

#3 opened about 1 year ago by

isr431

New activity in black-forest-labs/FLUX.1-dev over 1 year ago

🚩 Report: Ethical issue(s)

#56 opened over 1 year ago by

WWHugFace

liked 2 Spaces over 1 year ago

FLUX.1 [Schnell]

🏎

5.05k

Generate images from text prompts in seconds

FLUX.1 [dev]

🖥

9.39k

Generate images from text prompts with FLUX.1

New activity in Qwen/Qwen2-57B-A14B-Instruct over 1 year ago

What dose A14 means? Could we get the detail of Qwen MOE architechture?

#1 opened over 1 year ago by

JohnSaxon

New activity in mistralai/Codestral-22B-v0.1 over 1 year ago

What is the context size on this model? And it does not appear to deal with JSON, function calling well.

#15 opened over 1 year ago by

BigDeeper

New activity in bartowski/Phi-3-medium-128k-instruct-GGUF over 1 year ago

4k versions load and work in Koboldcpp, but the 128k versions don't.

#1 opened over 1 year ago by

YuriGagarine

New activity in lightblue/suzume-llama-3-8B-multilingual almost 2 years ago

<|eot_id|> in aphrodite-engine

#2 opened almost 2 years ago by

med4u

replied to alielfilali01's post almost 2 years ago

The issue is all those experts have to be very diverse and trained more or less simultaneously.
Because if you are going to use sparse MoE, your router model has to be able to predict the fittest expert for the upcoming token. Which means router has to be trained with the experts. That wouldn't be an issue for classic MoE, but both kinds of models also rely on the experts' uniform "understanding" of the cached context. I don't think a 100x2B model would work without that well enough. That's the reason why Mixtral fine-tuning is such a complicated task.
Not only that, we don't really have a good base 2B model. Sure, Phi exists... With 2K ctx length, no GQA, coherency issues and very limited knowledge. I don't think the point of "expert" is providing domain-specific capabilities into the composite model, I think the trick is overcoming the diminishing returns in training, as well as some bandwidth optimizations for inference. So among your 100 experts, one might have both an analog for Grandma cell and some weights associated with division. Another expert could be good at both kinds of ERP - being Enterprise Recourse Planning and the main excuse for Frankenmerges creation, lol. The model distillation becomes better over time, but I don't think any modern 2B can help to compete with GPT-4. Perhaps 16x34B could, but good luck training that from scratch as a relatively small business, let alone nonprofit or private individual.