-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VGG16 scoring model taking up > 500 GB of RAM #1305
Comments
Hi Shreya, thanks for opening an issue! Usually in Brain-Score, earlier layers of the model are more computationally expensive (RAM-wise) to score, as they tend to be much bigger than later model layers. Also, it could be the case that for VGG16, the more granular layers are bigger themselves, or are full convolutional layers as opposed to perhaps a pooling or relu layer (I am not entirely sure here, as I would need a refresher on VGG16 architecture). As for the issue of random weights scoring higher, I am linking @mschrimpf in who may be able more scientific insight as to what might be occurring. |
Dear Mike, Thanks a lot for your answer! It is really helpful, yes, more layers indeed lead to higher computational complexity. I just wanted to know how could an untrained network Best regards, |
Hi @mike-ferguson Best regards, |
Hi @ShreyaKapoor18, sorry for the late reply, I was Out of Office for a couple of days; as to your question, I think it depends! For the most part I think people usually do like the |
Hi Mike, Thanks for your reply. I did just that, passing only the conv layers but am still running OOM. Best regards, |
@ShreyaKapoor18 Gotcha- have you tried submitting recently on our website? If you do that, I should be able to see the logs and troubleshoot and see what exactly is eating up so much memory. We also have a new procedure to map layers that should drastically cut down on RAM usage, but it is only available through our submitting through our site (or a PR) at the moment (still working on deploying that fix to local scoring schemas) |
Hi @mike-ferguson Best regards, |
Hi Shreya, a number of PRs were automatically made yesterday with a model identifier called "vgg16_less_variation_iteration=1". Do these happen to be yours? I'll link a few of the latest ones below. If yes, they also seem to be running into the same issue with the use of You can refer to the resnet50_tutorial model. As to why you were not notified: We only send out emails on the status of scoring however the model submissions were failing on the previous step (i.e., unit tests), which we do not notify. You point out an important gap in our communication, and we'll try to at least implement some notification system (whether that is a GitHub comment or full email) that resolves this. Let me know if this helps or not. Latest web submissions to the repo If you want to see the reason for failure, you can scroll through the checks at the bottom of the PR. You will see "Brain-Score Plugins Unit tests (AWS Jenkins, AWS Execution) — Build Failure". You can click on details > Console Output or the Console Output (Parsed). The common errors:
I would personally try to resolve the first error and see if that also handles the open_clip error. If open_clip appears again, we may need to find the version you are using with your local environment and just add it to setup. |
Hi @KartikP Yes, these were my submissions. Best regards, |
@ShreyaKapoor18 I'm just keeping an eye out on your web submissions. It seems like you had made some of the changes but not all of the required (https://github.com/brain-score/vision/pull/1499/files). The console log for this PR can be seen here:
Specifically, the init file needs to import your get_model() and get_layers() from the model.py file. To explain the workflow we have, when you make a web submission, it creates a Pull Request into this repo on your behalf. If you go to the Pull Request tab, you will see your web submission and all the files that are associated with it. You can find the status of your web submission there. Once unit tests pass, we perform layer mapping, then the PR is merged, and then once merged it will be scored. Once scored, you will get an email on success or failure, and then your score will appear in the leaderboard after 24 hrs. I'm also seeing some newer PRs from your web submission that has a broken init file that is attempting to run tests instead of add your model to the registry. https://github.com/brain-score/vision/pull/1503/files If you would like, you can message me on Slack and we can confirm the code before submitting. |
Thanks @KartikP this is awesome! Best regards, |
Hi everyone, I have submitted multiple models and they seem to work! However when I want to use the method to submit multiple models at once using I get duplicate IDs error. To check that I ran the following:
|
Hi,
I was trying to score a VGG16 model on our own cluster, i.e. by running a local instance
When I use the following layer names
i.e. more high level layers the RAM consumption seems to be OK and works fine
However, when I go to the detailed level layer names
It takes computational space > 500 GB of RAM and runs OOM
The thing is with the high level layers
my scores are
in V1 the score for no training is higher than imagenet trained, which is a weird effect since
the weights are random. I know sometimes a random weight could also just match because
of a statistical artefact, but this occurs in 2 iterations
I am using the following public benchmarks for scoring my model
Any help would be gladly appreciated
Best regards,
Shreya
The text was updated successfully, but these errors were encountered: