Trainer
This trainer was developed by the Eden team, you can try our hosted version of the trainer in our app. It's a highly optimized trainer that can be used for both full finetuning and training LoRa modules on top of Stable Diffusion. It uses a single training script and loss module that works for both SDv15 and SDXL!
The outputs of this trainer are fully compatible with ComfyUI and AUTO111, see documentation here. A full guide on training can be found in our docs.
<p align="center"> <strong>Training images:</strong><br> <img src="assets/xander_training_images.jpg" alt="Image 1" style="width:80%;"/> </p> <p align="center"> <strong>Generated imgs with trained LoRa:</strong><br> <img src="assets/xander_generated_images.jpg" alt="Image 2" style="width:80%;"/> </p>The trainer can be run in 4 different ways:
- as a hosted service on our website
- as a hosted service through replicate
- as a ComfyUI node
- as a standalone python script
Using in ComfyUI:
- Example workflows for how to run the trainer and do inference with it can be found in
/ComfyUI_workflows
- Importantly this trainer uses a chatgpt call to cleanup the auto-generated prompts and inject the trainable token, this will only work if you have a .env file containing your OPENAI key in the root of the repo dir that contains a single line:
OPENAI_API_KEY=your_key_string
Everything will work without this, but results will be better if you set this up, especially for 'face' and 'object' modes.
The trainer supports 3 default modes:
- style: used for learning the aesthetic style of a collection of images.
- face: used for learning a specific face (can be human, character, ...).
- object: will learn a specific object or thing featured in the training images.
Setup
Install all dependencies using
pip install -r requirements.txt
then you can simply run:
python main.py train_configs/training_args.json
to start a training job.
Adjust the arguments inside training_args.json
to setup a custom training job.
You can also run this through Replicate using cog (~docker image):
- Install Replicate 'cog':
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
- Build the image with
cog build
- Run a training run with
sh cog_test_train.sh
- You can also go into the container with
cog run /bin/bash
Full unet finetuning
When running this trainer in native python, you can also perform full unet finetuning using something like (adjust to your needs)
python main.py train_configs/full_finetuning_example.json
TODO's
Bugs:
- pure textual inversion for SD15 does not seem to work well... (but it works amazingly well for SDXL...) ---> if anyone can figure this one out I'd be forever grateful!
- figure out why training is 3x slower through comfyui node versus just running main.py as a python job..?
- Fix aspect_ratio bucketing in the dataloader (see https://github.com/kohya-ss/sd-scripts)
Bigger improvements:
- integrate Flux / SD3
- Add multi-concept training (multiple things represented by multiple tokens, trained into a single LoRa)
- add stronger token regularization (eg CelebBasis spanning basis)
- implement perfusion ideas (key locking with superclass): https://research.nvidia.com/labs/par/Perfusion/
- implement prompt-aligned: https://prompt-aligned.github.io/