-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peculiar behavior of N2V2 prediction #147
Comments
UPDATE: I did some analysis and it turned out that there is a bug in one of the libraries used by us to get energy usage values. This bug specifically affects HPC Clusters and was a major contributor in slowing down the prediction process on the HPC Cluster. I ran the benchmark again without the library and the numbers look a lot more promising but the local server is still upto 16% faster than the HPC cluster for bigger files even though the HPC Cluster has the superior GPU. I am wondering if the filesystems play a major role for the N2V model? We use a distributed filesystem on the HPC Cluster and a SATA HDD on the local server to read and write images. |
Hi @somas193 ! Tough to answer, our HPC has very slow read speed from our centralized storage... Now, the current implementation of N2V just loads all data in memory so access to local files should only be a bottleneck at the very beginning of the training, once the training has started the limiting factor should be transfer of data between RAM and GPU. How does the GPU utilization compare between HPC and local server? |
There is no FP16 magic going on. |
Hi @jdeschamps, thanks for the reply. I have a small correction in the information provided by me in the post. We use N2V and not N2V2 since our data is a Z-stack. The storage used on the HPC Cluster in this case isn't very slow but also isn't the fastest available storage option. It is based on the BeeGFS parallel file system. I do not have any measured numbers for the GPU utilization but based on my observation it is pretty high (hovering around 90% or more) on the local server. However, I have no data for GPU utilization on the HPC Cluster. Also, I use the piece of code given below to control memory allocation on the GPU:
Do you think this can cause problems from the performance perspective? The version of Tensorflow is 2.10. |
Would it be correct to say that N2V does not make use of features like tensor cores or mixed precision available on NVIDIA GPUs? Is it just using just vanilla FP32 computations in training and inference? |
Hi @somas193, Sorry I missed you previous question.
I can't say, it is not something that we really use (and we don't really use TensorFlow nowadays, especially with recent hardware and libraries). What happens if you turn it off?
Yes, just vanilla FP32. The library was written a few years ago, and does not benefit from the "latest" features that people commonly use nowadays. |
I am benchmarking the performance of N2V2 prediction using files of different sizes (ranging from 330MB to 5.13GB) on a local server having an A6000 (48GB GDDR6) GPU and an HPC Cluster having A100 (40GB HBM2) GPUs. The performance is similar for smaller files and for bigger files the prediction is faster by upto 50% on the local server even though we expected the opposite. I am wondering if N2V2 uses FP32/FP16 in the backend, if it can make use of Tensor cores and also if there is frequent data transfer between the GPU memory, CPU cache and RAM? Would someone be able to provide details regarding this?
The text was updated successfully, but these errors were encountered: