-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I can't find the issue... And searching the error doesn't produce results. #105
Comments
I seem to get a similar error:
relevant code |
It seems that if you set split_mode to false you do not get to this code path. split_mode is on by default, but if you hover over the option it is indicated as EXPERIMENTAL. This resolves the issue for me. The hoover also describes a required networks arg: 'train_blocks=single'. But the config 'sanity check' indicates networks_arg appears automatically assigned with the mentioned when split_mode=true. so this seems unrelated to the error. (setting this in extra_optimizer_args or in additional_args fields: neither seem to resolve the issue, only disabling split_mode does). from the guide: |
Not sure if this is less memory efficient, but to trigger the error much earlier you can apply this patch diff --git a/flux_train_network_comfy.py b/flux_train_network_comfy.py
index 87638d1..ad4863e 100644
--- a/flux_train_network_comfy.py
+++ b/flux_train_network_comfy.py
@@ -68,6 +68,7 @@ class FluxNetworkTrainer(NetworkTrainer):
if args.split_mode:
model = self.prepare_split_model(model, args, weight_dtype, accelerator)
+ self.wrapper = self.prepare_wrapper(accelerator, model)
clip_l = flux_utils.load_clip_l(args.clip_l, weight_dtype, "cpu", disable_mmap=args.disable_mmap_load_safetensors)
clip_l.eval()
@@ -135,6 +136,34 @@ class FluxNetworkTrainer(NetworkTrainer):
return flux_lower
+ def prepare_wrapper(self, accelerator, flux_lower):
+ class FluxUpperLowerWrapper(torch.nn.Module):
+ def __init__(self, flux_upper: flux_models.FluxUpper, flux_lower: flux_models.FluxLower, device: torch.device):
+ super().__init__()
+ self.flux_upper = flux_upper
+ self.flux_lower = flux_lower
+ self.target_device = device
+
+ def forward(self, img, img_ids, txt, txt_ids, timesteps, y, guidance=None, txt_attention_mask=None):
+ self.flux_lower.to("cpu")
+ clean_memory_on_device(self.target_device)
+ self.flux_upper.to(self.target_device)
+ img, txt, vec, pe = self.flux_upper(img, img_ids, txt, txt_ids, timesteps, y, guidance, txt_attention_mask)
+ self.flux_upper.to("cpu")
+ clean_memory_on_device(self.target_device)
+ self.flux_lower.to(self.target_device)
+ return self.flux_lower(img, txt, vec, pe, txt_attention_mask)
+
+ wrapper = FluxUpperLowerWrapper(self.flux_upper, flux_lower, accelerator.device)
+ if not getattr(wrapper, "prepare_block_swap_before_forward", None):
+ logger.warn ("not a as opposed to class Flux(nn.Module) class?")
+ logger.warn(repr(wrapper))
+ logger.warn(wrapper.__class__)
+ raise ValueError("wrapper has no attribute prepare_block_swap_before_forward")
+
+ logger.info("wrapper prepared")
+ return wrapper
+
def get_tokenize_strategy(self, args):
_, is_schnell, _, _ = flux_utils.analyze_checkpoint_state(args.pretrained_model_name_or_path)
@@ -287,25 +316,9 @@ class FluxNetworkTrainer(NetworkTrainer):
accelerator, args, epoch, global_step, flux, ae, text_encoders, sample_prompts_te_outputs, validation_settings)
clean_memory_on_device(accelerator.device)
return image_tensors
-
- class FluxUpperLowerWrapper(torch.nn.Module):
- def __init__(self, flux_upper: flux_models.FluxUpper, flux_lower: flux_models.FluxLower, device: torch.device):
- super().__init__()
- self.flux_upper = flux_upper
- self.flux_lower = flux_lower
- self.target_device = device
-
- def forward(self, img, img_ids, txt, txt_ids, timesteps, y, guidance=None, txt_attention_mask=None):
- self.flux_lower.to("cpu")
- clean_memory_on_device(self.target_device)
- self.flux_upper.to(self.target_device)
- img, txt, vec, pe = self.flux_upper(img, img_ids, txt, txt_ids, timesteps, y, guidance, txt_attention_mask)
- self.flux_upper.to("cpu")
- clean_memory_on_device(self.target_device)
- self.flux_lower.to(self.target_device)
- return self.flux_lower(img, txt, vec, pe, txt_attention_mask)
- wrapper = FluxUpperLowerWrapper(self.flux_upper, flux, accelerator.device)
+ wrapper = self.wrapper
+ wrapper.flux_upper.training
clean_memory_on_device(accelerator.device)
image_tensors = flux_train_utils.sample_images(
accelerator, args, epoch, global_step, wrapper, ae, text_encoders, sample_prompts_te_outputs, validation_settings
@@ -511,4 +524,4 @@ if __name__ == "__main__":
args = train_util.read_config_from_file(args, parser)
trainer = FluxNetworkTrainer()
- trainer.train(args)
\ No newline at end of file
+ trainer.train(args) |
Possibly the fix (instead of the other patch, which is for debugging) is just: diff --git a/library/flux_train_utils.py b/library/flux_train_utils.py
index 6f6c1f2..8a4b461 100644
--- a/library/flux_train_utils.py
+++ b/library/flux_train_utils.py
@@ -307,7 +307,8 @@ def denoise(
comfy_pbar = ProgressBar(total=len(timesteps))
for t_curr, t_prev in zip(tqdm(timesteps[:-1]), timesteps[1:]):
t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
- model.prepare_block_swap_before_forward()
+ if hasattr(model, "prepare_block_swap_before_forward"):
+ model.prepare_block_swap_before_forward()
pred = model(
img=img,
img_ids=img_ids,
@@ -321,7 +322,8 @@ def denoise(
img = img + (t_prev - t_curr) * pred
comfy_pbar.update(1)
- model.prepare_block_swap_before_forward()
+ if hasattr(model, "prepare_block_swap_before_forward"):
+ model.prepare_block_swap_before_forward()
return img
# endregion
@@ -611,4 +613,4 @@ def add_flux_train_arguments(parser: argparse.ArgumentParser):
type=float,
default=3.0,
help="Discrete flow shift for the Euler Discrete Scheduler, default is 3.0. / Euler Discrete Schedulerの離散フローシフト、デフォルトは3.0。",
- )
\ No newline at end of file
+ ) The comment here seems to suggest prepare_block_swap_before_forward is just an optimization, and therefore likely only implemented in class Flux(nn.Module) where it is really required. If we have a different class, hopefully we can just skip this (as of yet untested, I have just set split_mode to false, this time) |
The workflow worked, and I trained several LoRA's using it, tested some stuff. When a workflow works, I save a copy and archive it. Not even the archive with all the original settings that worked before works, produces the same error.
The main error it throws is :-
FluxTrainValidate
'FluxUpperLowerWrapper' object has no attribute 'prepare_block_swap_before_forward'
Here is the total error. This was from a fresh restart as well.
`
ComfyUI Error Report
Error Details
Stack Trace
System Information
Devices
Logs
Attached Workflow
Please make sure that workflow does not contain any sensitive information such as API keys or passwords.
Additional Context
(Please add any additional context or steps to reproduce the error here)
`
The text was updated successfully, but these errors were encountered: