xFasterTransformer supports a different model format than huggingface, compatibe with NVIDIA FasterTransformer's format. The tools are used to dump Huggingface models parameters on every layer to binary for xFasterTransformer code on CPU.
After that, convert the model into xFasterTransformer format using the script. You will see many bin files in the output directory.
python chatglm_convert.py -i ${HF_DATASET_DIR} -o ${OUTPUT_DIR}