writting time of input files #1557

Alex-Gauvain · 2022-09-23T08:04:41Z

Alex-Gauvain
Sep 23, 2022

Hello,

I use modflow 6 on large simulations (50 years at daily time step). I run first the flow model and then the transport model. I have one questions to ask you:

The simulations are long to run. But I also noticed that the writing of the files (the files when i specify a stressperiod) also took a lot of time, especially the .maw file that I generate with the following code :

mawspd = {}
for kper in range (0,50*365):
well_sp =[]
for well in range (0, 2):
well_sp.append([well, "rate", q])
mawspd[kper] = well_sp

flopy.mf6.ModflowGwfmaw(self.gwf, print_input=False, print_head=False, print_flows=False, save_flows=True, mover=True, no_well_storage=True,
head_filerecord="{}.maw.hds".format(self.modelname),
budget_filerecord="{}.maw.bud".format(self.modelname),
packagedata=mawpackagedata, connectiondata=mawconnectiondata,
perioddata=mawspd, pname="MAW-1")

I compared with modflow 2005 and the writing times are much shorter. Would you have an idea how to limit the writing time ?

Thanks

Alex

spaulins-usgs · 2022-09-23T23:17:35Z

spaulins-usgs
Sep 23, 2022
Collaborator

Hi @Alex-Gauvain,

I unfortunately do not have any good ideas how to limit the writing time. In case you are curious, below I explain why the writing time is slower in flopy for MODFLOW-6. I welcome any input/suggestions on how I might improve this.

Flopy for MODFLOW 2005 uses numpy.savetxt to save most of mnw's settings while flopy for MODFLOW-6 writes out maw data by looping through the numpy recarray and writing it out line by line. Because numpy.savetxt is executing very little interpreted python code and is mostly executing compiled code, it is very fast. Flopy for MODFLOW-6 is executing more interpreted python code and is therefore slower.

There are several reasons flopy for MODFLOW-6 does not make use of numpy.savetxt. Here are the reasons I remember off the top of my head.

MODFLOW-6 supports multiple discretizations. For various reasons it was decided to store the cell identification information in a single cellid field stored as a tuple (layer, row, column), (layer, cpl), (node,). That way all cellids, regardless of the number of integers in the id, are stored as a single recarray field. The cellid has to be extracted from the numpy recarray to be properly written to a file, and this alone excludes the possibility of using numpy.savetxt without significant time-consuming preprocessing.
Not all MODFLOW-6 data fits into a recarray with consistent column types and some recarray rows can have empty columns. For example, the MAW package's period block has mawsetting "keystrings" that support different data types and different amounts of data. There are several ways that the different keystrings and associated data can be represented in a single recarray, but all of the solutions lead to various formatting issues when using numpy.savetxt.

1 reply

Alex-Gauvain Sep 27, 2022
Author

Hi @spaulins-usgs,

Thank you for your comments.

Would it be conceivable to create a file writing scheme for each discretization? For example, use the scheme established for modflow 2005 for regular grids. I can imagine that it would be necessary to reorganize a large part of the code...

For the second point, have you thought about using arrays that take into account multiple data types? Like pandas or xarray for example?

To give you an indication, for my simulation of 50 x 365 days, the writing time is almost equal to the simulation time.

spaulins-usgs · 2022-09-28T17:29:02Z

spaulins-usgs
Sep 28, 2022
Collaborator

@Alex-Gauvain, regarding the discretizations, in order to get .savetxt to write correctly we would need to store each part of the cellid in a separate recarray field. We currently store cellids as a tuple, which .savetxt outputs text in the format "(1, 5, 2)". While storing each cellid part in a separate field is possible, it would require some reworking of the flopy code and would break some people's existing flopy scripts. Anyone's code who calls the get_data method to retrieve a recarray and then edits that recarray would potentially break due to the recarray fields changing. I would need to get some consensus among the flopy developers before making such a fundamental change.

Pandas and xarray were considered when flopy was first developed, but numpy recarray was chosen at the time for a number of reasons. I believe those reasons included, to stay consistent with flopy for mf2005, which stores data in recarrays, and pandas was not part of major python distributions like anaconda. That said, numpy recarrays do support "object" type columns that take into account multiple data types. Flopy currently uses "object" type columns for some fields. Though, if I recall correctly, there are some limits placed on the formatting string used to format the type "object" columns, since the type "object" columns can support multiple types and only one formatting string can be specified for all the potential types.

To summarize, I do agree that flopy's slow write time can be a problem for some projects. I think you are correct that flopy could potentially be recoded to support much faster save calls, like numpy.savetxt. Though this would require some recoding, potentially break flopy users existing code, and may result in some formatting limitations and other negative consequences. I do think this is worth discussing with the other flopy developers, and will bring it up the next time we meet.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

writting time of input files #1557

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

writting time of input files #1557

Alex-Gauvain Sep 23, 2022

Replies: 2 comments · 1 reply

spaulins-usgs Sep 23, 2022 Collaborator

Alex-Gauvain Sep 27, 2022 Author

spaulins-usgs Sep 28, 2022 Collaborator

Alex-Gauvain
Sep 23, 2022

Replies: 2 comments 1 reply

spaulins-usgs
Sep 23, 2022
Collaborator

Alex-Gauvain Sep 27, 2022
Author

spaulins-usgs
Sep 28, 2022
Collaborator