-
Notifications
You must be signed in to change notification settings - Fork 27
Fortran API Reference
FTI Datatypes
FTI Constants
FTI_Init
FTI_InitType
FTI_Protect
FTI_Checkpoint
FTI_Status
FTI_Recover
FTI_Snapshot
FTI_Finalize
FTI datatypes are used in the C-API function FTI_Protect
. With the count
parameter and the datatype, FTI is able to determine the size of the allocated memory region at ptr
.
The FTI Fortran interface defines a template of FTI_Protect
for all intrinsic data types. Hence the datatype definitions are not necessary here and are not avalable for the Fortran interface.
FTI_BUFS
: 256
FTI_DONE
: 1
FTI_SCES
: 0
FTI_NSCS
: -1
- Reads configuration file.i
- Creates checkpoint directories.
- Detects topology of the system.
- Regenerates data upon recovery.
DEFINITION
FTI_Init ( config_file, global_comm, err )
ARGUMENTS
Variable | What for? |
---|---|
[IN] character (len=*) :: config_file
|
Path to the config file |
[IN/OUT] integer :: global_comm
|
MPI communicator used for the execution |
[OUT] integer :: err
|
Token for FTI error code. |
ERROR HANDLING
Value | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
No Success |
FTI_NREC |
FTI couldn't recover ckpt files, no recovery possible |
DESCRIPTION
This function initializes the FTI context. It should be called before other FTI functions, right after MPI initialization.
EXAMPLE
EXAMPLE
- Initializes a data type.
DEFINITION
FTI_InitType ( type_F, size_F, err )
INPUT
Variable | What for? |
---|---|
type(FTI_type), intent(OUT) :: type_F |
The data type to be initialized |
integer, intent(IN) :: size_F |
The size of the data type to be initialized |
integer, intent(OUT) :: err |
Token for FTI error code. |
OUTPUT
Value | Reason |
---|---|
FTI_SCES |
Success |
DESCRIPTION
This function initializes a data type. A variable’s type which isn’t defined by default by FTI (see: FTI Datatypes) should be added using this function before adding this variable to protected variables.
EXAMPLE
EXAMPLE
- Stores metadata concerning the variable to protect.
DEFINITION
FTI_Protect ( id, data, ierr )
INPUT
Variable | What for? |
---|---|
type(FTI_type), intent(OUT) :: id |
Unique ID of the variable to protect |
type_fort , pointer :: data |
Pointer to memory address of variable |
integer, intent(OUT) :: ierr |
Token for FTI error code. |
OUTPUT
Value | Reason |
---|---|
FTI_SCES |
Success |
exit(1) |
Number of protected variables is > FTI_BUFS
|
DESCRIPTION
This function should be used to add data structure to the list of protected variables. This list of structures is the data that will be stored during a checkpoint and loaded during a recovery. It resets the dataset with given id if it was already previously registered. When size of a variable changes during execution it should be updated using this function before next check- point to properly store data.
EXAMPLE
EXAMPLE
- Writes values of protected runtime variables to a checkpoint file of requested level.
DEFINITION
FTI_Checkpoint ( id_F, level, err )
INPUT
Variable | What for? |
---|---|
integer, intent(IN) :: id_F |
Unique checkpoint ID |
integer, intent(IN) :: level |
Checkpoint level (1=L1, 2=L2, 3=L3, 4=L4) |
integer, intent(OUT) :: err |
Token for FTI error code. |
OUTPUT
Value | Reason |
---|---|
FTI_DONE |
Success |
FTI_NSCS |
Failure |
DESCRIPTION
This function is used to store current values of protected variables into a checkpoint file. Depending on the checkpoint level file is stored in local, partner node or global directory. Checkpoint’s id must be different from 0.
EXAMPLE
EXAMPLE
- Returns the current status of the recovery flag.
DEFINITION
FTI_Status ( status )
INPUT
Variable | What for? |
---|---|
integer, intent(OUT) :: status |
Token for status flag. |
OUTPUT
Value | Reason |
---|---|
|
No checkpoints taken yet or recovered successfully |
|
At least one checkpoint is taken. If execution fails, the next start will be a restart |
|
The execution is a restart from checkpoint level L4 and keep_last_checkpoint was enabled during the last execution |
DESCRIPTION
This function returns the current status of the recovery flag.
EXAMPLE
EXAMPLE
- Loads checkpoint data from the checkpoint file and initializes the runtime variables of the execution.
DEFINITION
FTI_Recover ( err )
INPUT
Variable | What for? |
---|---|
integer, intent(OUT) :: err |
Token for FTI error code. |
OUTPUT
Value | Reason |
---|---|
FTI_SCES |
Success |
FTI_NSCS |
Failure |
DESCRIPTION
This function loads the checkpoint data from the checkpoint file and it up- dates some basic checkpoint information. It should be called after initial- ization of protected variables after a failure. If a variable changes it’s size during execution it must have the latest size before Recover. The easiest way to do so is to add size of variable as another variable to protected list, and then call Recover twice. First to recover size of variable. Second to recover variable’s data (after an update of protected list).
EXAMPLE
Basic example:
EXAMPLE
Example if a variable changes its size during execution:
EXAMPLE
- Loads checkpoint data and initializes runtime variables upon recovery.
- Writes multilevel checkpoints regarding their requested frequencies.
DEFINITION
FTI_Snapshot ( err )
INPUT
Variable | What for? |
---|---|
integer, intent(OUT) :: err |
Token for FTI error code. |
OUTPUT
Value | Reason |
---|---|
FTI_SCES |
Successfull call (without checkpointing) or if recovery successful |
FTI_NSCS |
Failure of FTI_Checkpoint
|
FTI_DONE |
Success of FTI_Checkpoint
|
exit(1) |
Failure on recovery |
DESCRIPTION
This function loads the checkpoint data from the checkpoint file in case of
restart. Otherwise, it checks if the current iteration requires checkpointing
(see e.g.: ckpt_L1) and performs a checkpoint if needed (internal call to FTI_Checkpoint
). Should be called after
initialization of protected variables.
EXAMPLE
EXAMPLE
- Frees the allocated memory.
- Communicates the end of the execution to dedicated threads.
- Cleans checkpoints and metadata.
DEFINITION
FTI_Finalize ( err )
INPUT
Variable | What for? |
---|---|
integer, intent(OUT) :: err |
Token for FTI error code. |
OUTPUT
Value | Reason |
---|---|
FTI_SCES |
For application process |
exit(0) |
For FTI process |
DESCRIPTION
This function notifies the FTI processes that the execution is over, frees
some data structures and it closes. If this function is not called on the end
of the program the FTI processes will never finish (deadlock). Should be
called before MPI_Finalize()
.
EXAMPLE
EXAMPLE