-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next-gen batch runner #2321
Comments
Experimental designs might be one of the most important new things to support. I encountered this library that might be useful: |
Nice initiative! One thing to note is that the current So I think your vision aligns nicely with the current structure. And I agree that the most important area of improvement is stage 1 and a clear "run configuration" definition. |
I like the conceptual design. I would however design it to be easy to extend / combine with whatever experimental design generator you want to use, rather than try and cover all of that ourselves. The same applies to the subsequent stages. The motivation for this is that doing large-scale computational experimentation is its own can of worms and not, in my view, the core of the MESA library. It is easy to go overboard with trying to built on this into MESA, but making it less and less useful for others. To wit, last week I spoke with various people who use NetLogo and do large scale uncertainty quantification. None of them use Netlogo's behavior space but all use other packages that interface with NeLogo via java. So, it is more important in my view to establish a clean API for running a single experiment on a MESA model, then do design a very elaborate batch runner. |
Agree with @quaquel, but I think this is somewhat in line with what @EwoutH was proposing, in my understanding. The Although maybe |
I guess there is a distinction between inputs to creating experiments and the individual experiments. To start with the latter, this can be as simple as a dict of key value pairs. Typically this will be passed directly to the The other is more subtle, and I lack a good name for it. It is basically the parameter space and some density function over this space. in the simplest case, this space is bounded, the axes are orthogonal to one another (i.e., they are independent, that is, there are no correlations), and you assume a uniform distribution over the space (so all points are equally likely). Each of these assumptions can be relaxed but they make your life increasingly more difficult. Moreover, you have to specify how you want to sample points from this space (monte carlo, LHS, some factorial design, etc.), and you have to specify how many points you want to sample. All this interacts in a messy way. For example, if you have a factorial design, you normally specify the number of points on each dimension. If you have a Monte Carlo sampler, you specify how many points in total you wan want to sample. Given all this, you have either a collection of experiments or an experiment generator. This you pass to the runner, which then executes them (potentially in parallel). It is only the last task that is properly the batch runner. The rest is the design of experiments. an additional minor concern in that you typically want to run each experiment for multiple seeds. You can collapse the seed into the experiment or delay it and let it be handled by the batch runner. Regardless, you need to track the seed number, of course, for replication purposes of each experiment. |
Objective
The goal of this proposal is to redesign the Mesa batch runner into a modular, flexible system that separates the batch run process into three stages: Preparation, Running, and Processing. The focus will be on the Preparation stage, where different experimental designs can be used to generate run configurations. These configurations will be encapsulated in a dataclass that includes the model class and all relevant parameters, ensuring reusability in the Running stage.
Design Overview
RunConfiguration
) to store the model class, run parameters, and configuration details (e.g.,max_steps
,data_collection_period
).RunConfiguration
objects and execute each run independently.Key Components
1.
RunConfiguration
DataclassThis dataclass stores all the information required to run a single configuration of the experiment.
2. Configuration Generators
Provide different strategies for generating configurations:
Each generator will output a list of
RunConfiguration
objects.3. Batch Runner Class
The
BatchRunner
class will manage the execution of all runs using theRunConfiguration
objects. It will handle multiprocessing, progress tracking, and result collection.The text was updated successfully, but these errors were encountered: