diff --git a/README.md b/README.md index 731ce3061..ec27bc97a 100644 --- a/README.md +++ b/README.md @@ -32,13 +32,10 @@ The full documentation is online at [https://mabarnes.github.io/moment_kinetics] ``` this significantly decreases the load time but prevents code changes from taking effect when `moment_kinetics.so` is used without repeating the precompilation (to use this option, add an option `-Jmoment_kinetics.so` when starting julia). 4) To run julia with optimization, type - ``` - $ julia -O3 --project run_moment_kinetics.jl - ``` - Default input options are specified in `moment_kinetics_input.jl`. The defaults can be modified for a particular run by setting options in a TOML file, for example `input.toml`, which can be passed as an argument ``` $ julia -O3 --project run_moment_kinetics.jl input.toml ``` + Options are specified in a TOML file, e.g. `input.toml` here. The defaults are specified in `moment_kinetics_input.jl`. * To run in parallel, just put `mpirun -np ` in front of the call you would normally use, with `` the number of processes to use. * It may be more convenient when running `moment_kinetics` more than once to work from the Julia REPL, e.g. ``` @@ -50,27 +47,49 @@ The full documentation is online at [https://mabarnes.github.io/moment_kinetics] ``` julia> run_moment_kinetics("input.toml") ``` -5) To make plots and calculate frequencies/growth rates, run +5) To restart a simulation using `input.toml` from the last time point in the existing run directory, + ``` + $ julia -O3 --project run_moment_kinetics --restart input.toml + ``` + or to restart from a specific output file - either from the same run or (if the settings are compatible) a different one - here `runs/example/example.dfns.h5` + ``` + $ julia -O3 --project run_moment_kinetics input.toml runs/example/example.dfns.h5 + ``` + The output file must include distribution functions. When not using parallel I/O there will be multiple output files from different MPI ranks - any one of these can be passed. + * To do the same from the Julia REPL + ``` + $ julia -O3 --project + julia> run_moment_kinetics("input.toml", restart=true) + ``` + or + ``` + julia> run_moment_kinetics("input.toml", restart="runs/example/example.dfns.h5") + ``` + * When calling the `run_moment_kinetics()` function you can also choose a particular time index to restart from, e.g. + ``` + julia> run_moment_kinetics("input.toml", restart="runs/example/example.dfns.h5", restart_time_index=42) + ``` +6) To make plots and calculate frequencies/growth rates, run ``` $ julia --project run_post_processing.jl runs/ ``` passing the directory to process as a command line argument. Input options for post-processing can be specified in post_processing_input.jl. -6) Parameter scans (see [Running parameter scans](#running-parameter-scans)) or performance tests can be performed by running +7) Parameter scans (see [Running parameter scans](#running-parameter-scans)) or performance tests can be performed by running ``` $ julia -O3 --project driver.jl ``` If running a scan, it can be parallelised by passing the number of processors as an argument. Scan options are set in `scan_inputs.jl`. -7) Post processing can be done for several directories at once using +8) Post processing can be done for several directories at once using ``` $ julia --project post_processing_driver.jl runs/ runs/ ... ``` passing the directories to process as command line arguments. Optionally pass a number as the first argument to parallelise post processing of different directories. Input options for post-processing can be specified in `post_processing_input.jl`. -8) In the course of development, it is sometimes helpful to upgrade the Julia veriosn. Upgrading the version of Julia or upgrading packages may require a fresh installation of `moment_kinetics`. To make a fresh install with the latest package versions it is necessary to remove (or rename) the `Manifest.jl` file in the main directory, and generate a new `Manifest.jl` with step 1) above. It can sometimes be necessary to remove or rename the `.julia/` folder in your root directory for this step to be successful. +9) In the course of development, it is sometimes helpful to upgrade the Julia veriosn. Upgrading the version of Julia or upgrading packages may require a fresh installation of `moment_kinetics`. To make a fresh install with the latest package versions it is necessary to remove (or rename) the `Manifest.jl` file in the main directory, and generate a new `Manifest.jl` with step 1) above. It can sometimes be necessary to remove or rename the `.julia/` folder in your root directory for this step to be successful. -9) One may have to set an environment variable to avoid error messages from the Qt library. If you execute the command +10) One may have to set an environment variable to avoid error messages from the Qt library. If you execute the command ``` $ julia --project run_post_processing.jl runs/your_run_dir/ diff --git a/machines/README.md b/machines/README.md index ac977fdf3..01031eda2 100644 --- a/machines/README.md +++ b/machines/README.md @@ -67,7 +67,7 @@ $ ./submit-restart.sh .toml ``` will submit a job to run and post-process a restart using input file. The simulation will restart from the last time point of the previous run -(`restart_moment_kinetics.jl` supports more flexibility, but for now you would +(`run_moment_kinetics.jl` supports more flexibility, but for now you would need to write your own submission script to pass the options needed for that). Default parameters for the runs (number of nodes, time limit, etc.) were set up diff --git a/machines/archer/jobscript-restart.template b/machines/archer/jobscript-restart.template index c4018292c..52501d169 100644 --- a/machines/archer/jobscript-restart.template +++ b/machines/archer/jobscript-restart.template @@ -22,6 +22,6 @@ export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK echo "running INPUTFILE $(date)" -srun --distribution=block:block --hint=nomultithread --ntasks=$SLURM_NTASKS bin/julia -Jmoment_kinetics.so --project -O3 --check-bounds=no restart_moment_kinetics.jl INPUTFILE RESTARTFROM +srun --distribution=block:block --hint=nomultithread --ntasks=$SLURM_NTASKS bin/julia -Jmoment_kinetics.so --project -O3 --check-bounds=no run_moment_kinetics.jl --restart INPUTFILE RESTARTFROM echo "finished INPUTFILE $(date)" diff --git a/machines/marconi/jobscript-restart.template b/machines/marconi/jobscript-restart.template index 53ba20059..420687c93 100644 --- a/machines/marconi/jobscript-restart.template +++ b/machines/marconi/jobscript-restart.template @@ -18,6 +18,6 @@ source julia.env echo "running INPUTFILE $(date)" -mpirun -np $SLURM_NTASKS bin/julia -Jmoment_kinetics.so --project -O3 --check-bounds=no restart_moment_kinetics.jl INPUTFILE RESTARTFROM +mpirun -np $SLURM_NTASKS bin/julia -Jmoment_kinetics.so --project -O3 --check-bounds=no run_moment_kinetics.jl --restart INPUTFILE RESTARTFROM echo "finished INPUTFILE $(date)" diff --git a/restart_moment_kinetics.jl b/restart_moment_kinetics.jl deleted file mode 100644 index c82916954..000000000 --- a/restart_moment_kinetics.jl +++ /dev/null @@ -1,7 +0,0 @@ -# provide option of running from command line via 'julia run_moment_kinetics.jl' -using Pkg -Pkg.activate(".") - -using moment_kinetics - -restart_moment_kinetics() diff --git a/src/command_line_options.jl b/src/command_line_options.jl index ddee48f9d..430fbb8da 100644 --- a/src/command_line_options.jl +++ b/src/command_line_options.jl @@ -16,7 +16,7 @@ const s = ArgParseSettings() arg_type = String default = nothing "restartfile" - help = "Name of NetCDF file to restart from" + help = "Name of output file (HDF5 or NetCDF) to restart from" arg_type = String default = nothing "--debug", "-d" @@ -24,14 +24,18 @@ const s = ArgParseSettings() "integer values activate more checks (and increase run time)" arg_type = Int default = 0 - # Options for tests - "--long" - help = "Include more tests, increasing test run time." + "--restart" + help = "Restart from latest output file in run directory (ignored if " * + "`restartfile` is passed)" action = :store_true "--restart-time-index" help = "Time index in output file to restart from, defaults to final time point" arg_type = Int default = -1 + # Options for tests + "--long" + help = "Include more tests, increasing test run time." + action = :store_true "--verbose", "-v" help = "Print verbose output from tests." action = :store_true diff --git a/src/communication.jl b/src/communication.jl index 8e9033e19..3259a4a06 100644 --- a/src/communication.jl +++ b/src/communication.jl @@ -125,8 +125,12 @@ function setup_distributed_memory_MPI(z_nelement_global,z_nelement_local,r_nelem end # throw an error if user specified information is inconsistent if (nrank_per_zr_block*nblocks < nrank_global) - error("ERROR: You must choose global number of processes to be an integer multiple of the number of \n - nblocks = (r_nelement_global/r_nelement_local)*(z_nelement_global/z_nelement_local)") + error("ERROR: You must choose global number of processes to be an integer " + * "multiple of the number of\n" + * "nblocks($nblocks) = (r_nelement_global($r_nelement_global)/" + * "r_nelement_local($r_nelement_local))*" + * "(z_nelement_global($z_nelement_global)/" + * "z_nelement_local($z_nelement_local))") end # assign information regarding shared-memory blocks diff --git a/src/moment_kinetics.jl b/src/moment_kinetics.jl index d1ddaeaf5..880284a0f 100644 --- a/src/moment_kinetics.jl +++ b/src/moment_kinetics.jl @@ -2,7 +2,7 @@ """ module moment_kinetics -export run_moment_kinetics, restart_moment_kinetics +export run_moment_kinetics using MPI @@ -89,11 +89,13 @@ using .type_definitions: mk_int """ main function that contains all of the content of the program """ -function run_moment_kinetics(to::TimerOutput, input_dict=Dict()) +function run_moment_kinetics(to::TimerOutput, input_dict=Dict(); restart=false, + restart_time_index=-1) mk_state = nothing try # set up all the structs, etc. needed for a run - mk_state = setup_moment_kinetics(input_dict) + mk_state = setup_moment_kinetics(input_dict; restart=restart, + restart_time_index=restart_time_index) # solve the 1+1D kinetic equation to advance f in time by nstep time steps if run_type == performance_test @@ -136,27 +138,38 @@ end """ overload which takes a filename and loads input """ -function run_moment_kinetics(to::TimerOutput, input_filename::String) - return run_moment_kinetics(to, read_input_file(input_filename)) +function run_moment_kinetics(to::TimerOutput, input_filename::String; restart=false, + restart_time_index=-1) + return run_moment_kinetics(to, read_input_file(input_filename); restart=restart, + restart_time_index=restart_time_index) end """ overload with no TimerOutput arguments """ -function run_moment_kinetics(input) - return run_moment_kinetics(TimerOutput(), input) +function run_moment_kinetics(input; restart=false, restart_time_index=-1) + return run_moment_kinetics(TimerOutput(), input; restart=restart, + restart_time_index=restart_time_index) end """ overload which gets the input file name from command line arguments """ function run_moment_kinetics() - inputfile = get_options()["inputfile"] - if inputfile == nothing - run_moment_kinetics(Dict()) + options = get_options() + inputfile = options["inputfile"] + restart = options["restart"] + if options["restartfile"] !== nothing + restart = options["restartfile"] + end + restart_time_index = options["restart-time-index"] + if inputfile === nothing + this_input = Dict() else - run_moment_kinetics(inputfile) + this_input = inputfile end + run_moment_kinetics(this_input; restart=restart, + restart_time_index=restart_time_index) end """ @@ -223,111 +236,9 @@ function get_backup_filename(filename) end backup_dfns_filename == "" && error("Failed to find a name for backup file.") backup_prefix_iblock = ("$(basename)_$(counter)", iblock) + original_prefix_iblock = (basename, iblock) return dfns_filename, backup_dfns_filename, parallel_io, moments_filename, - backup_moments_filename, backup_prefix_iblock -end - -""" - restart_moment_kinetics(input_filename::String, - restart_filename::Union{String,Nothing}=nothing, - time_index::Int=-1) - -Restart moment kinetics from an existing run. Space/velocity-space resolution in the -input must be the same as for the original run. - -`input_filename` is the input file to use. - -`restart_filename` can be used to pick a particular distribution-functions-output file to -restart from. By default will use the most recent one (the one without the numerical -suffix) in the run directory. - -`time_index` can be passed to select the time index from `restart_filename` to restart -from. By default the latest time point is used. -""" -function restart_moment_kinetics(input_filename::String, - restart_filename::Union{String,Nothing}=nothing, - time_index::Int=-1) - restart_moment_kinetics(read_input_file(input_filename), restart_filename, - time_index) - return nothing -end -function restart_moment_kinetics() - options = get_options() - inputfile = options["inputfile"] - if inputfile === nothing - error("Must pass input file as first argument to restart a run.") - end - restartfile = options["restartfile"] - if restartfile === nothing - error("Must pass output file to restart from as second argument.") - end - time_index = options["restart-time-index"] - - restart_moment_kinetics(inputfile, restartfile, time_index) - - return nothing -end -function restart_moment_kinetics(input_dict::Dict, - restart_filename::Union{String,Nothing}=nothing, - time_index::Int=-1) - - if restart_filename === nothing - run_name = input_dict["run_name"] - base_directory = get(input_dict, "base_directory", "runs") - output_dir = joinpath(base_directory, run_name) - io_settings = get(input_dict, "output", Dict{String,Any}()) - binary_format = get(io_settings, "binary_format", hdf5) - if binary_format === hdf5 - ext = "h5" - elseif binary_format === netcdf - ext = "cdf" - else - error("Unrecognized binary_format '$binary_format'") - end - restart_filename = glob(joinpath(output_dir, run_name * ".dfns*." * ext))[1] - end - - try - # Move the output file being restarted from to make sure it doesn't get - # overwritten. - dfns_filename, backup_dfns_filename, parallel_io, moments_filename, - backup_moments_filename, backup_prefix_iblock = - get_backup_filename(restart_filename) - # Ensure every process got the filenames and checked files exist before moving - # files - MPI.Barrier(comm_world) - if (parallel_io && global_rank[] == 0) || (!parallel_io && block_rank[] == 0) - mv(dfns_filename, backup_dfns_filename) - mv(moments_filename, backup_moments_filename) - end - # Ensure files have been moved before any process tries to read from them - MPI.Barrier(comm_world) - - # Set up all the structs, etc. needed for a run. - mk_state = setup_moment_kinetics(input_dict, - restart_prefix_iblock=backup_prefix_iblock, - restart_time_index=time_index) - - try - time_advance!(mk_state...) - finally - # clean up i/o and communications - # last 2 elements of mk_state are `io` and `cdf` - cleanup_moment_kinetics!(mk_state[end-2:end]...) - end - catch e - # Stop code from hanging when running on multiple processes if only one of them - # throws an error - if global_size[] > 1 - println("Abort called on rank $(block_rank[]) due to error. Error message " - * "was:\n", e) - MPI.Abort(comm_world, 1) - end - - rethrow(e) - end - - return nothing + backup_moments_filename, backup_prefix_iblock, original_prefix_iblock end """ @@ -339,8 +250,8 @@ reload data from time index given by `restart_time_index` for a restart. `debug_loop_type` and `debug_loop_parallel_dims` are used to force specific set ups for parallel loop ranges, and are only used by the tests in `debug_test/`. """ -function setup_moment_kinetics(input_dict::Dict; restart_prefix_iblock=nothing, - restart_time_index=-1, +function setup_moment_kinetics(input_dict::Dict; + restart::Union{Bool,AbstractString}=false, restart_time_index::mk_int=-1, debug_loop_type::Union{Nothing,NTuple{N,Symbol} where N}=nothing, debug_loop_parallel_dims::Union{Nothing,NTuple{N,Symbol} where N}=nothing) @@ -402,7 +313,7 @@ function setup_moment_kinetics(input_dict::Dict; restart_prefix_iblock=nothing, allocate_pdf_and_moments(composition, r, z, vperp, vpa, vzeta, vr, vz, evolve_moments, collisions, num_diss_params) - if restart_prefix_iblock === nothing + if restart === false restarting = false # initialize f(z,vpa) and the lowest three v-space moments (density(z), upar(z) and ppar(z)), # each of which may be evolved separately depending on input choices. @@ -416,10 +327,60 @@ function setup_moment_kinetics(input_dict::Dict; restart_prefix_iblock=nothing, else restarting = true + run_name = input_dict["run_name"] + base_directory = get(input_dict, "base_directory", "runs") + output_dir = joinpath(base_directory, run_name) + if restart === true + run_name = input_dict["run_name"] + io_settings = get(input_dict, "output", Dict{String,Any}()) + binary_format = get(io_settings, "binary_format", hdf5) + if binary_format === hdf5 + ext = "h5" + elseif binary_format === netcdf + ext = "cdf" + else + error("Unrecognized binary_format '$binary_format'") + end + restart_filename_pattern = joinpath(output_dir, run_name * ".dfns*." * ext) + restart_filename_glob = glob(restart_filename_pattern) + if length(restart_filename_glob) == 0 + error("No output file to restart from found matching the pattern " + * "$restart_filename_pattern") + end + restart_filename = restart_filename_glob[1] + else + restart_filename = restart + end + + # Move the output file being restarted from to make sure it doesn't get + # overwritten. + dfns_filename, backup_dfns_filename, parallel_io, moments_filename, + backup_moments_filename, backup_prefix_iblock, original_prefix_iblock = + get_backup_filename(restart_filename) + + # Ensure every process got the filenames and checked files exist before moving + # files + MPI.Barrier(comm_world) + + if abspath(output_dir) == abspath(dirname(dfns_filename)) + # Only move the file if it is in our current run directory. Otherwise we are + # restarting from another run, and will not be overwriting the file. + if (parallel_io && global_rank[] == 0) || (!parallel_io && block_rank[] == 0) + mv(dfns_filename, backup_dfns_filename) + mv(moments_filename, backup_moments_filename) + end + else + # Reload from dfns_filename without moving the file + backup_prefix_iblock = original_prefix_iblock + end + + # Ensure files have been moved before any process tries to read from them + MPI.Barrier(comm_world) + # Reload pdf and moments from an existing output file code_time, previous_runs_info, restart_time_index = reload_evolving_fields!(pdf, moments, boundary_distributions, - restart_prefix_iblock, restart_time_index, + backup_prefix_iblock, restart_time_index, composition, r, z, vpa, vperp, vzeta, vr, vz) _block_synchronize() end diff --git a/src/post_processing.jl b/src/post_processing.jl index 9cee0bf5a..1c38015bc 100644 --- a/src/post_processing.jl +++ b/src/post_processing.jl @@ -373,12 +373,12 @@ are passed, the plots/movies are given names beginning with `compare_` and are c in the `comparison_plots/` subdirectory. By default plots output from all restarts in a directory. To select a single run, pass the -`run_index` argument - the value corresponds to the `_` suffix given to output files by -`restart_moment_kinetics()`. `run_index` can be an integer (which is applied to all -directories in `prefix...`), or a tuple of integers (which should have the same length as -the number of directories passed to `prefix...`). Use `run_index=-1` to get the most -recent run (which does not have a `_` suffix). Note that `run_index` is only used when -a directory (rather than the prefix of a specific output file) is passed to `prefix...` +`run_index` argument - the value corresponds to the `_` suffix given to output files +when restarting. `run_index` can be an integer (which is applied to all directories in +`prefix...`), or a tuple of integers (which should have the same length as the number of +directories passed to `prefix...`). Use `run_index=-1` to get the most recent run (which +does not have a `_` suffix). Note that `run_index` is only used when a directory +(rather than the prefix of a specific output file) is passed to `prefix...` """ function analyze_and_plot_data(prefix...; run_index=nothing) if length(prefix) == 0 diff --git a/submit-restart.sh b/submit-restart.sh index 759c247c4..3186173a4 100755 --- a/submit-restart.sh +++ b/submit-restart.sh @@ -94,19 +94,10 @@ RUNNAME=$(util/get-run-name.jl $INPUTFILE) RUNDIR=runs/$RUNNAME/ mkdir -p $RUNDIR -# Get default file to restart from, which is the latest run in $RUNDIR -if [[ -z $RESTARTFROM ]]; then - # "shopt -s extglob" is needed to let us use the ?() syntax within a script - # (it doesn't seem to be needed in an interactive shell!). See - # https://www.linuxjournal.com/content/pattern-matching-bash - shopt -s extglob - RESTARTFROM=$(ls $RUNDIR/$RUNNAME.dfns*.?(h5|cdf) | head -n 1) -fi - if [[ $POSTPROC -eq 0 ]]; then - echo "Submitting $INPUTFILE for restart from $RESTARTFROM and post-processing..." + echo "Submitting $INPUTFILE for restart from '$RESTARTFROM' and post-processing..." else - echo "Submitting $INPUTFILE for restart from $RESTARTFROM..." + echo "Submitting $INPUTFILE for restart from '$RESTARTFROM'..." fi # Create a submission script for the run