Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练需要的显存,显卡数量和训练时间? #9

Open
huangb23 opened this issue Nov 26, 2024 · 4 comments
Open

训练需要的显存,显卡数量和训练时间? #9

huangb23 opened this issue Nov 26, 2024 · 4 comments

Comments

@huangb23
Copy link

如题,这两个模型加起来20B 应该得用A100-80G训练了吧? 方便给一个训练样本数,显卡数量和训练时间吗?

@erwold
Copy link
Owner

erwold commented Nov 26, 2024

实际上Qwen2VL-7B只需要开放最后一层训练,外加一个connector将Qwen2VL-7B的特征维度对齐到Flux;Flux每个stage只训练部分层;每个stage训练的时候,所有可训练参数加起来,大概不到2B
更具体的训练配置:8xA100,3百万张图片,batchsize=128,100k steps

@huangb23
Copy link
Author

需要多少天呢?A100是80G的吗

@saynn
Copy link

saynn commented Dec 2, 2024

实际上Qwen2VL-7B只需要开放最后一层训练,外加一个connector将Qwen2VL-7B的特征维度对齐到Flux;Flux每个stage只训练部分层;每个stage训练的时候,所有可训练参数加起来,大概不到2B 更具体的训练配置:8xA100,3百万张图片,batchsize=128,100k steps

请问FLUX每个stage只训练部分层,是出于什么原因呢?

@FangGet
Copy link

FangGet commented Dec 6, 2024

@erwold 请问您FLUX开放部分层训练的时候,对某一层而言,是所有参数都开放,还是只开放某些MLP呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants