-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffusion model setting when trained on whole network #13
Comments
"When starting training the diff-network, the accuracy drops to 10%."
|
Thank you for your response. AE reconstruction model accuracy can achieve nearly 78% in the first 30,000 training epochs. But in the last 30,000 training epochs, generated models best accuracy is 13%. So does this mean the diff-model was not trained well? |
It looks like the diff-model wasn't trained well. |
What I am doing here is reproducing the experiment "Generalization on entire model parameters" in the paper. Do you plan to publish the related details and settings? Thank you. |
I tried to reproduce the same experiment, It actually take something like 300,000 epoch for LDM training for 40k in_dim to produce reasonable result. |
I am also trying to reproduce this experiment, but I can't even successfully reconstruct the weights using VAE, have you changed any settings of VAE training/task training? When does the reconstruction accuracy reach ~80%? |
VAE training is rather eaiser, just default hyperparams, maybe tune lr and noise factor. I notice that the default latent niose factor in ae_ddpm.yaml is 0.5, which I consider too large. I train a VAE with 40k in_dim(all parameters of a 4 layers CNN) case, the reconstruction accuracy reach the level close to input parameters at about epoch 25,000. |
Hi, I have tried several lr/noise settings but still can not reproduce the result. For parameter dataset generation, do you use different initialization for 200 model checkpoints? If so, can I kindly ask for the specific settings like lr/scheduler/noise you used in the experiment? |
I use autoencoder.Latent_AE_cnn_big, changing its inputs dim and the optimizer learning rate 1e-2. |
Hi, I have another question that when I test the trained model directly, I cannot get the same results as the one in test (following completed training). The results are seem to be from a initial model instead of a trained one. Did you meet the same question? |
Yeah. This is the problem I am dealing with, but no progress for a few days. |
Shouldn't it be: Train first x epochs till almost converge -> Train y(200 as default) more epochs for params data? I use default seed, lr=1e-3, default lr_scheduler, both noise factors set to 1e-3 |
In the paper, there is a sentence "Different from the aforementioned training data collection strategy, we individually train these architectures from scratch with 200 different random seeds" in "Generalization on entire model parameters" part. I think this means that for parameter generation part, 200 models as default should be trained respectively from random initialization. I also find that using 200 model checkpoints from one initialization will cause the diffusion & the vae model to severely overfit, as all the training samples are very similar. |
Hello @waldun-m @FelixFeiyu @JoycexxZ , could you kindly share some information about how to config your training datasets with me? I am particularly interested in the details regarding the model parameters, specifically the shape of the flattened 1-dimensional tensor. What length does your input dimension have? I'm curious to understand how the input dimension influences the configuration of the autoencoder and diffusion models concerning model layers and channels. Thanks. |
For example, I aim to train a NND to generate a complete model with nearly ten million (or potentially even more) parameters. How can I configure the autoencoder and diffusion model to better suit these parameters during training? Are there any existing scaling laws for parameter generation tasks like LLM? |
The problem seems that the trained network used in the diffusion model is not stored successively because there are no related parameters when I checked the output .cpkt file. Therefore after adding the codes to store the network after training and to load it before testing, running the test part directly can achieve the same result. However, I didn't find why the network cannot be stored in the original code automatically. |
Hi~ Can you share about how to store and load network in a proper way? Thanks a lot. |
Hi~ May I ask if the problem you encountered has been resolved? |
Hi,
I tried to use a three-layer CNN on CIFAR10 to reproduce the work like what the paper mentions.I chose the autoencoder.Latent_AE_cnn_big as ae_model in the ae_ddpm.yaml. The classification accuracy of the reconstructed CNN can achieve a comparable level 79%. However, when starting training the diff-network, the accuracy drops to 10%.
Is the diff-model I used or other setting correct?
Thank you
The text was updated successfully, but these errors were encountered: