-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simulation environment of multi node #18
Comments
Hi jiaduxie, could you provide a little bit more details about the problems you are running in? The Readme.md of this repository has already some points on how to set up the environment and run the model. The exact steps you have to take depend on the cluster you are using. What kind of job scheduler does it use? Is Python 3 available? Does it have MPI support? Have you already installed NEST? A rough sketch:
If all of this works, you should be ready to run your own experiments. I hope this helps! |
Oh,thinks Jari. I run the model in NEST of conda environment .But I installed the conda version of nest with MPI version,it also no?I try install NEST from source code,The MPI in NEST manually installed by myself conflict with the local MPI of the server?I'll ask you if you have any questions. |
I am only aware of a conda version of NEST which does not have MPI support. But maybe it exists. To check whether your NEST version supports MPI and OpenMP, could you run in your environment the following command and post the output:
My conda installed NEST gives me in the start_updating_ information that neither MPI nor OpenMP is available:
Concerning manual compilation. How did you try to compile NEST? Could you post what steps you have tried so far? |
I haven't started trying to compile manually. $python -c "import nest; nest.Simulate(1.)" Creating default RNGs This program is provided AS IS and comes with Problems or suggestions? Sep 10 22:20:26 NodeManager::prepare_nodes [Info]: Sep 10 22:20:26 SimulationManager::start_updating_ [Info]: Sep 10 22:20:26 SimulationManager::run [Info]: |
It seems alright. Have you installed the packages from requirements.txt? Have you tired running a simulation? |
Yes,I have installed the packages from requirements.txt?Can you help me see if the command to execute multi-node simulation is like this?: The hostfile is following: |
I have no experience with hostfiles, but it looks reasonable to me. Have you adjusted The run_example_downscaled.py is meant to be run on a local machine, for example a laptop. If you would like to experiment on a compute cluster you should exchange
In this case you need to invoke the script serially:
The parallelized part is then specified in the jobscript_template in config.py. |
Hei,jarsi .If run a complete model on a cluster of two servers, about how much memory each machine needs to support? |
The model consumes approximately 1 TB of memory. So with two servers each server would need to provide 500 GB. |
Okay, thank you. Then,when you run the entire model, you use several servers and how much memory each is. |
Hei,jarsi.In what system are you running multiple nodes in parallel.My system is ubuntu, slurm configuration is not good.Do you have any guidance on configuring the environment? |
Hi, we do not set up the systems ourselves. We use for example JURECA from the Forschungszentrum Juelich It has everything we need already installed. What kind of system are you using? |
I am a server under linux system, the release version is ubuntu.In addition to running under JURECA, do you have your own running on a general server? |
Hi,jarsi,I am now simulating a small network for testing on two machines, and run it with the following command. It seems that the two machines run by themselves without interaction.
In addition,Have you run his model of multi-area-model in your own cluster environment? |
This is weird. Have you adjusted the Maybe you could also post what is in your I have run the model on a local cluster. I usually just need to modify the |
multi_test.py: from nest import * |
This is difficult for me to debug. On my machine I can run this without running into errors. It works with the conda installed nest ( On you machine, are you using any resource manager such as e.g., SLURM, PBS/Torque, LSF, etc. Or are you responsible for defining everything correctly using hostfiles? What kind of system are you using? |
The cluster environment I use is composed of nine ordinary server machines. The system is Linux, and the release version number is debain.You run this model on a supercomputer, right? Have you ever run in your own environment? Is it necessary to install SLURM resource scheduling system?I also had a lot of problems in the process of installing SLURM, so I won't install it. |
It is not necessary to install SLURM. But I have most experience with it as all clusters I have used so far had SLURM installed. Installing a resource manager is not trivial and should be the job of a system admin, not the user. Do you have a system administrator you could ask for help? How do other people run distributed jobs on this cluster? Could you also try the following commands and report whether something changes:
|
Because my cluster environment here is composed of general servers, there is no resource scheduling system such as SLURM installed. It seems that the command you said can not complete the simulation well. (pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpiexec -np 2 -host work0,work1 python multi_test.py
|
Just to make sure, you are using nest installed via conda, right? What do the following commands give you: |
Yes, I installed nest under conda.I seem to have installed it (pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ conda list (pynest_mpi) work@lyjteam-server: |
Ok thanks, the output of the last command is missing. Using |
Maybe you could also check the output of: |
(pynest_mpi) work@lyjteam-server: |
I think the problem is that once the jobs start to run on a node the mpi library cannot be found. This is because the
|
Hi, have you made progress? I think the problems you are seeing are related to your mpi libraries. As the conda nest is compiled against openMPI, you must also use openMPI and not mpich. This means that
Does any of these approaches work or change the error message? |
I've tried it and it's still not good. Did you use conda to install nest or compile from source code? |
The total model has approximately 4 million neurons. The formula for downscaling is N_scaling * 4 million = 0.243* 4 million = 0.972 million. I also posted a modified version of this script above. It addresses this. |
According to this ratio, the number of synapses is 1 billion, right? |
I checked the json file and calculated that the number of neurons is 1 million and the number of synapses is 1 billion |
What problem did your revised version solve? |
You asked how I would start the simulation. This is the way I think you should do it. But it is just a suggestion. It prepares the simulation with one process and one thread. It prepares the simulation such that multiple mpi processes can work easily on the data (eg every process has its own configuration file. This solves concurrent data access problems). When everything is finished the job is submitted and all processes can start their work. |
I also need to use the script to submit the compressed version now? Just as you said below:
|
Ideally you do not need to worry about this at all. This part: |
Yes,I think you modified the file multiarea_helpers.py a month ago. I don't know what the modification of this file will change? |
It is explained in the corresponding pull request. The inhibitory synaptic weights were scaled wrongly. |
Then the result I run is different from the previous one? I'm doing an experiment, but there is a previous one, and the number of neuron pulses activated is different from the previous one. |
Hi,jarsiI want to recompile and install NEST now. What aspects should I pay attention to? Does the python version need to specify which version to run the multi-area-model model? There is also a version of mpi.the guide |
Ideally the following commands are sufficient.
If it compiles you can check if it works via |
I have compiled it once, and I want to delete it and install it again. Is there any good way? |
Hi,jarsi.I have not configured a multi-node simulation environment. This work is very important to me. Can you give me the contact information of the supercomputer center, such as email? |
Hi,jarsi.Can you provide me with an email from the manager of the Forschungszentrum Juelich supercomputer center?I want them to help me with environment configuration. |
These system administrators are responsible for Juelich machines. They won't have time to take care of machines they are not responsible for. Have you extensively googled your errors? Have you asked your colleagues how they simulate on multiple nodes? Can you talk to the person who installed the cluster? If all of this does not help you can ask for help with a detailed explanation for example on stackoverflow.com Have you tried compiling NEST without conda, only with system libraries and system python? Have you successfully ran a mpi test program (not NEST) across nodes and can prove that this works and that your problem is (or is not) nest related? l Are you affiliated with any organization (e.g. university) or project (e.g. human brain project) which might provide compute resources? |
Hi,jarsi.The input value of I_e is zero when running the complete mode(approx. 4.13 million neurons and 24.2 billion synapses)l? Is there only a few activation pulses after running? |
in multi_area_model.py add_DC_drive is set to 0. If either K_scaling or N_scaling is not equal to 1 add_DC_drive is adjusted to make up for spikes that would be there in the fullscale model. The value from add_DC_drive is then used for I_e. So in a fullscale scenario I would expect I_e to be 0. Is it working now? |
No,multi-node has not run the simulation now.I want to use my own simulator to realize this macaque brain model.I think your implementation uses a Poisson distribution generator as an external input. I now use add_DC_drive as an external input (DC is added to I_e), instead of using a Poisson distribution generator as an external input('poisson_input': False). Is the implementation model correct?So in a fullscale scenario I want to add_DC_drive as I_e,Is this correct? |
Hi,jarsi.Now,I use add_DC_drive as an external input (DC is added to I_e), instead of using a Poisson distribution generator as an external input('poisson_input': False).It's ok?
|
Hello,jarsi.Now I can use the program to run multiple nodes and run through the code under the nest platform. But I can't run your code. I installed slurm. Can you help me? |
runing :
sacct job -> output
|
Normally slurm jobs produce a stdout and stderr file. See this and the following line. There you find more information why your job failed. Could you please post the output? |
There is no output. The following two labels are printed by me. There is no error output in the. O file, so I can't find the cause of the error.
|
Hi,jarsi.Is my file configuration and operation right?
run_example_fuallscale.py is:
|
e1d019b3cdbbaef181f9679c7ecc984b.238.e is :
|
Hello, Can you help me to solve the problem?thanks |
Hi @atiye-nejad the conda package is not built with mpi support. To activate it, I recommend you to install NEST from source. Please take a look at the documentation here: |
but because of some problems especially with nestml, nest experts recommended me installing nest through conda. |
@atiye-nejad It is generally not a good idea to raise new issues at the end of an existing (and different issue). That makes it difficult to follow up properly, and experts might not even notice your post. I saw that you asked a very closely related question on the NEST User mailing list and I suggest we continue the discussion there. |
Do you have a recommended configuration tutorial for the multi node nest simulation environment? It can also be the brief steps of environment configuration and the required installation package.
The text was updated successfully, but these errors were encountered: