FLUX.1 LoRA training (20240824)

The following uses the https://github.com/ostris/ai-toolkit GitHub repository to train a local LoRA on user-provided images. We will then create images using the generated LoRA with ComfyUI. Running this tool requires an Nvidia GPU with 24GB of VRAM. We will train on Ubuntu 24.04 with a recent Nvidia driver installed, git, brew (to install useful commands), and Python (python3 with pip3 and the venv package installed, either via apt or brew)

Aug 18, 2024
🎨
Training a LoRA for flux.1-dev and flux.1-schnell on a 24GB GPU and image generation using ComfyUI
 
Revision: 20240824-0 (init: 20240818)
 
The following uses the https://github.com/ostris/ai-toolkit GitHub repository to train a local LoRA on user-provided images, which we can use to generate pictures using ComfyUI.
Running this training requires an Nvidia GPU with 24GB of Video RAM (VRAM).
We will train on Ubuntu 24.04 with a recent Nvidia driver installed, git, brew (to install useful commands), and Python (python3 with pip3 and the venv package installed, either via apt or brew)
 
 
Recent developments in the Flux model ecosystem include advancements in FP8 and NP4 quantization formats and enhancements in using LoRA. Earlier this week, source code to enable the training of a Flux LoRA on user-provided images was announced: https://github.com/ostris/ai-toolkit
 
We will use it to create a LoRA for Flux.1-Dev and Flux.1-Schnell. We will then exercise the trained LoRA with ComfyUI (workflows embedded in the generated images) to train the models.
Requirements:
  • An Nvidia GPU with at least 24GB of Video RAM
  • Using Flux.1-Dev, requires acceptance of the terms of use.
  • A HuggingFace account to download the weights.
Some of those topics were covered in FLUX.1dev with ComfyUI and Stability Matrix.
 
In the following, we will train on the tok prompt trigger word and incorporate it into our files and directory names.
We will train our LoRA to 4000 steps (you are not required to go this far; good results can be obtained at lower settings, such as 2000). From a previous test, this takes over 4 hours on an NVIDIA RTX 3090 and about 2h20 on an RTX 4090.
 
During training, if the Linux system has a Windows Manager running, a web browser, or any service (on Dockge, for example, Ollama or others) that uses the GPU, it is required to terminate as many of those as possible. Use nvidia-smi to check what uses the GPU’s memory and reduce the VRAM consumption before training. Most of the following steps can be done over ssh in a tmux terminal (training can take a few hours; being able to re-attach to a session will be helpful) and in a Visual Studio Code Remote connection.
During a test run on an Ubuntu Desktop system accessed remotely, only two processes (Xorg and gnome-shell) were present, and nvidia-smi only listed 141MB used of our 24GB VRAM GPU).

Preliminary steps

Code retrieval and virtualenv creation

Obtain the source code, create a virtual environment, and install the requirements.txt.
We will use a base directory called Flux_Lora in the home (~) of our user account, where we will place the required components.
mkdir ~/Flux_Lora cd ~/Flux_Lora # Obtain the source and the code's submodules git clone https://github.com/ostris/ai-toolkit.git cd ai-toolkit git submodule update --init --recursive # Create and enable a python virtual environment to place all the needed packages python3 -m venv venv source venv/bin/activate # for future use, re-enabling the venv can be done by running the source command again # Install the required packages pip3 install torch pip3 install -r requirements.txt pip3 install mediapipe peft # mediapipe appears to be needed for Dev, while peft will be needed for Schnell

Hugging Face token

To download content from HuggingFace.co (HB) requires a read token. Content from HF will be placed within the value of the HF_HOME environment variable, which by default is ~/.cache/huggingface. This environment variable can be altered to match your preferences; see https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhome for details. We will use the default to maximize caching opportunities.
We will require a token for HF for login. See https://huggingface.co/docs/hub/security-tokens for details on how to get this “download-only” read token.
We then use the huggingface-cli to set our token, per https://huggingface.co/docs/huggingface_hub/en/guides/cli#huggingface-cli-login
# Install the CLI using brew brew install huggingface-cli # Confirm the token is valid and add it to the default HF_HOME at ~/.cache/huggingface huggingface-cli login # Answer no to "Add token as git credential" as this is a "download-only" token # This will store the token in ~/.cache/huggingface/token # All models retrieved from HF's hub will end up in ~/.cache/huggingface/hub
FLUX.1-dev is hosted on HF, with the note “This repository is publicly accessible, but you have to accept the conditions to access its files and content.” Follow the steps detailed on https://huggingface.co/black-forest-labs/FLUX.1-dev to accept the terms if you intend to use this model for the training. If the terms are not accepted, the model will not be accessible.
FLUX.1-schnell uses a different license. See https://huggingface.co/black-forest-labs/FLUX.1-schnell for details.
The HF token being available, will prove useful for miscellaneous downloads of side files from HF as they might occur.

Dataset preparation

Prepare a folder for a set of image files:
mkdir ~/Flux_Lora/training_images-tok
When using this folder during training, the tool will create a _latent_cache folder within, to store .safetensors characterizing the images to train upon.
Place your images in this location; the recommended image size is 1024x1024. The script requires those to be in .jpgor .png format. During its preliminary steps, the tool will resize them for processing size with a maximum per side pixel size of 512, 768 and 1024.
When training on a person, it is recommended to:
  • have at least 12 face photographs
  • each photograph should be with altering conditions so the model training is able to differentiate the person from the background or its clothings. As such, use shots with altering angles of the face, lighting conditions, background, clothes, ornaments (limit the ones hiding your hair, but if wearing glass, take images with and without those).
  • the script requests that you name each file image[number].[extension] increasing number as needed: image1.jpg, image2.png … image25.jpg
  • for each image file there must be a corresponding .txt file with a description of the image. Because the developer allowed for text replacement using the [trigger] keyword, each file can contain the same text, for example, ours contains a photo of a [trigger] man. The content can be extended to be more descriptive but this is sufficient to train the LoRA.
    • Each image must have a matching .txt file: image1.jpg with image1.txt … image25.jpg and image25.txt

LoRA Training

Flux.1-Dev

The training step requires a configuration file to be adapted. For flux.1-dev, an example configuration file is in ~/Flux_Lora/ai-toolkit/config/examples/train_lora_flux_24gb.yaml
cd ~/Flux_Lora/ai-toolkit # create an output directory for the trained LoRA(s) mkdir ../trained_LoRA # copy the example training yaml and rename it cp config/examples/train_lora_flux_24gb.yaml config/flux_dev-tok.yaml
Edit ~/Flux_Lora/ai-toolkit/config/flux_dev-tok.yaml with your preferred editor (we will use a VSCode with a remote ssh into the system to limit using additional VRAM). The configuration file supports the use of relative path (../../directorybut not the ~ character —representing the user’s home directory), so we will specify folders relative to the ai-toolkit directory in our configuration file.
 
Going through the comments in the YAML file informs us of the expected use case of each of the entry in the configuration file. As such, here are the limited changes we did:
  • In config: we modified name: "flux_dev-tok" to reflect that we are creating a flux.1-dev LoRA for a tok prompt trigger word.
  • We are modifying training_folder: "../trained_LoRA" to use our created directory. The script will create a flux_dev-tok folder within (it is using the value of name), to store the training and samples.
  • Uncomment and set the trigger_word to a value that you will use in your prompts. p3r5on or tok are commonly used values. This value will be recognized by the image generation text prompts. Using different values for different persons (it is recommended to always use a single non-existing word). Here we will use trigger_word: "tok” . This trigger word will be replaced inside the imageNUM.txt automatically as well as in the sample prompts section.
  • The save section specifies how often the training saves .safetensors checkpoints. Each of those are usable as is, so if you decide a certain checkpoint gives better results (based on the samples generated), you can use that file instead of the final file. It is recommended to keep the sample_every generation matching the save_every value.
  • In the datasets section we adapt the folder_path to where our training images are located: folder_path: "../training_images-tok"
  • In the train section, we increase the number of steps to steps: 4000.
    • The tool will generate checkpoints at save_every steps. If the samples seem to deteriorate as training steps increase, we can select an earlier checkpoint file instead of the file one.
  • We are not altering the model section.
    • The file if already present in the HF cache will not be redownloaded.
    • You must have agreed to the terms for the FLUX.1-dev model for it to be accessible.
  • in the sample section, we keep the sample_every to match the save_every value. We can comment out (#) some prompt while adding [trigger] to some. The idea is to see how good the model is at training the LoRA on the images we provided, not to demonstrate the capabilities of FLUX.1-dev. For example:
    • prompts: - "a woman holding a coffee cup, in a beanie, sitting at a cafe" - "a [trigger] man holding a sign that says 'AI is fun'" - "a [trigger] man as a Jedi warrior, deep blue lightsaber, dark purple background with some multicolor neon light reflecting" - "a [trigger] man as a green lantern, cosmic background, creating a dragon from his ring, majestic, rule of third" - "a [trigger] man as Conan the barbarian, snow-covered fields, sword in hand, defiant, dynamic pose, golden ratio, asymmetric composition, photorealism, cinematic realism" - "a portrait of a [trigger] man as Superman flying in space, yellow sun in the background, majestic, comic style" - "a [trigger] man as a cyberpunk warrior, anime style" - "professional photograph portrait of a [trigger] man, black and white" - "a [trigger] man wearing sunglasses riding a high-speed motorbike, hyper-maximalist, octane render" - "a [trigger] man, lucid dream-like 3d model, game asset, blender, unreal engine, rule of thirds, wide angle shot, looking off in distance style, glowing background, vivid neon wonderland, particles, blue, green, orange"
      Here, only our first prompt will be consistent from sample generation to sample generation (the seed being set and not random). All the other generations will show us the improvements of the model at generating a [trigger] man. We have a total of 10 prompts.
For reference, the entire configuration file is available at flux_dev-tok.yaml
 
The next step consists in starting the training and waiting for it to complete. In a terminal on our tmux:
# Make sure you are running the create virtual env # if needed: cd ~/Flux_Lora/ai-toolkit; source venv/bin/activate time python3 ./run.py config/flux_dev-tok.yaml
The tool will validate the yaml file, download required models (over 30GB), generate latent content from the training images and start training.
To see CPU and GPU usage, we can use nvitop (usable from pipx, itself installable using brew install pipx). In a new tmux terminal: pipx run nvitop
notion image
During training, it is possible to see the sample image being generated at each sample_every steps. Those will be in ~/Flux_Lora/trained_LoRA/flux_dev-tok/samples with file names such as 1723999194004__000000000_0.jpg decomposed as “unix timestamp __ steps performed _ prompt number”. Being “step 0” this image is one of the initial “before training” sample, and being “prompt 0”, it is the one that we expect to be consistent from generation to generation (a woman holding a coffee cup […])
 
Each save_every steps, a new .safetensors file is stored in the output directory. This is a fully usable LoRA. For example, at step 250, we get flux_dev-tok_000000250.safetensors. Based on the samples generated, we can decide to cancel training early if a given save provides better results than another.
 
On a 4090, this training took 2h20 minutes on 25 input images.
The final weight is located at ~/Flux_Lora/trained_LoRA/flux_dev-tok/flux_dev-tok.safetensors and is under 200MB.

Flux.1-Schnell

We will follow similar steps as above to train on the Apache licensed Schnell model.
cd ~/Flux_Lora/ai-toolkit # create an output directory for the trained LoRA(s) if it does not already exist mkdir ../trained_LoRA cp config/examples/train_lora_flux_schnell_24gb.yaml config/flux_schnell-tok.yaml
Compared to “dev” training configuration, the only change from the previous guide is we adapt name: "flux_schnell-tok" to reflect the “schnell” model.
The other modifications we did stay the same. We keep
  • training_folder: "../trained_LoRA"
  • trigger_word: "tok”
  • folder_path: "../training_images-tok"
  • steps: 4000
We also keep our altered prompts
There are changes in yaml file in the model section to reflect the use of Schell and the need for an assistant_lora to support this training.
For reference, the used schnell configuration file is available at flux_schnell-tok.yaml
 
After saving the file, we can then run the training:
# Make sure you are using the created virtualenv # if needed: ~/Flux_Lora/ai-toolkit; source venv/bin/activate # Run the training time python3 ./run.py config/flux_schnell-tok.yaml
On a 4090, this training took the same 2h20 minutes on 25 input images than it took for Dev.
The final weight is located at ~/Flux_Lora/trained_LoRA/flux_schnell-tok/flux_schnell-tok.safetensors and is under 200MB.

Using the trained models

With ComfyUI

To use the trained models, we will use ComfyUI.
We invite you to check another post titled FLUX.1dev with ComfyUI and Stability Matrix for details on what models to download and how to set it up. We also invite you to check Stable Diffusion Art’s “Beginner’s Guide to ComfyUI” as well as OpenArt’s ComfyUI Academy for details on using the tool.
ComfyUI embeds the used workflow within its generated images. Using those embedded workflows is done by dragging and dropping the image into the ComfyUI WebUI.
In the following, we will show the workflow and share the image generated using this workflow as examples of how to use our generated LoRA.
When using your own prompt, make sure to load the LoRA and use the trigger_word (ours was tok) in order to include it in the generated image.

Flux.1-Dev LoRA with ComfyUI

Flux.1-Schnell LoRA with ComfyUI

Extended workflow

The following workflow requires a custom_nodes to be installed.
We recommend that you install ComfyUI Manager application in your ComfyUI to simplify the task of adding the custom node to your installation. If you use the Dockge install from "FLUX.1dev with ComfyUI and Stability Matrix” the tool is installed during the first run of the https://github.com/mmartial/ComfyUI-Nvidia-Docker container (as per the instructions, we must let the tool install all its requirements, let the WebUI be functional, then restart the container for the security level of ComyUI Manager to allow to be modified
If you use an alternate ComfyUI installation, please see the installation details of ComfyUI manager at https://github.com/ltdrdata/ComfyUI-Manager for details on how to install it in your setup.
With the ComfyUI Manager node installed, we will have additional functions to our ComfyUI.
 
The workflow in question looks as follows:
An image generated using this workflow (which is embedded in this image):
 
With the image on your system and the ComfUI (with ComfyUI Manager installed), drag and drop the image onto the ComfUI canvas.
After dropping the image, we will be shown a notice about a missing custom node:
notion image
Selecting the Manager
notion image
Will show the “ComfyUI Manager Menu”
notion image
Selecting the “Install Missing Custom Nodes” will allow us to “Install” the missing node.
notion image
After installation and a “Reload” (from the same menu where the “Manager” was, a browser page reload might be required as well), we should now be able to “Queue Prompt” and see the results of this extended workflow.

Revision History

  • 20240824-0: Added more complex workflow, integrating the use of ComfyUI Manager.
  • 20240818-0: Initial release
Â