diff --git a/docs/contribute/core.md b/docs/contribute/core.md index 23c4c4c80..d70b2aecc 100644 --- a/docs/contribute/core.md +++ b/docs/contribute/core.md @@ -3,9 +3,7 @@ Core contributions are changes to AutoRA which aren't experimentalists, (synthetic) experiment runners, or theorists. The primary purpose of the core is to provide utilities for: -- describing experiments (in the [`autora-core` package](https://github.com/autoresearch/autora-core)) -- handle workflows for automated experiments - (currently in the [`autora-workflow` package](https://github.com/autoresearch/autora-workflow)) +- describing experiments and handling workflows (in the [`autora-core` package](https://github.com/autoresearch/autora-core)) - run synthetic experiments (currently in the [`autora-synthetic` package](https://github.com/autoresearch/autora-synthetic). Synthetic experiment runners may be submitted as pull requests to the [`autora-synthetic`](https://github.com/AutoResearch/autora-synthetic/blob/main/CONTRIBUTING.md) package, providing they require no additional dependencies. However, if your contribution requires additional dependencies, you can submit it as a full package following diff --git a/docs/contribute/index.md b/docs/contribute/index.md index 178c95de5..459dde93a 100644 --- a/docs/contribute/index.md +++ b/docs/contribute/index.md @@ -18,8 +18,7 @@ as well as external contributors. ![image](../img/package_overview.png) [`autora`](https://github.com/autoresearch/autora) is the parent package which end users are expected to install. The -parent depends on core packages, such as [`autora-core`](https://github.com/autoresearch/autora-core), -[`autora-workflow`](https://github.com/autoresearch/autora-workflow), and +parent depends on core packages, such as [`autora-core`](https://github.com/autoresearch/autora-core) and [`autora-synthetic`](https://github.com/autoresearch/autora-synthetic). It also includes vetted modules (child packages) as optional dependencies which users can choose to install. @@ -64,11 +63,8 @@ The following packages are considered core packages, and are actively maintained [Autonomous Empirical Research Group](https://musslick.github.io/AER_website/Team.html): - **autora-core** [`https://github.com/autoresearch/autora-core`](https://github.com/autoresearch/autora-core) This package includes fundamental utilities -and building blocks for all the other packages. This is always installed when a user installs `autora` and can be -a dependency of other child packages. - - -- **autora-workflow** [`https://github.com/autoresearch/autora-workflow`](https://github.com/autoresearch/autora-workflow): The workflow package includes basic utilities for managing the workflow of closed-loop research processes, e.g., coordinating workflows between the theorists, experimentalists, and experiment runners. Though it currently stands alone, this package will ultimately be merged into autora-core. +and building blocks for all the other packages. This includes basic utilities for managing the workflow of closed-loop research processes, e.g., coordinating workflows between the theorists, experimentalists, and experiment runners. The `autora-core` package is always installed when a user installs `autora` and can be +a dependency of other child packages. - **autora-synthetic** [`https://github.com/autoresearch/autora-synthetic`](https://github.com/autoresearch/autora-synthetic): This package includes a number of ground-truth models from different scientific disciplines that can be used for benchmarking automated scientific discovery. If you seek to contribute a scientific model, please see the [core contributor guide](core.md) for details. @@ -76,7 +72,7 @@ a dependency of other child packages. We welcome contributions to these packages in the form of pull requests, bug reports, and feature requests. For more details, see the -[core contributor guide](core.md). Feel free to ask any questions or provide any feedback regarding core contributions on the +[core contributor guide](core.md). Feel free to ask any questions or provide any feedback regarding core contributions in the [AutoRA forum](https://github.com/orgs/AutoResearch/discussions/categories/core-contributions). For core contributions, including contributions to [`autora-synthetic`](https://github.com/autoresearch/autora-synthetic), it is possible to set up your python environment in many different ways. diff --git a/docs/contribute/modules/experimentalist.md b/docs/contribute/modules/experimentalist.md index 7b40bcf42..5f2547611 100644 --- a/docs/contribute/modules/experimentalist.md +++ b/docs/contribute/modules/experimentalist.md @@ -23,19 +23,21 @@ Make sure to select the `experimentalist` option when prompted. You can skip all ## Implementation For an experimentalist, you should implement a function that returns a set of experimental conditions. This set may be -a numpy array, iterator variable or other data format. +a `pandas` data frame, `numpy` array, iterator variable or other data format. !!! hint - We generally **recommend using 2-dimensional numpy arrays as outputs** in which - each row represents a set of experimental conditions. The columns of the array correspond to the independent variables. + We generally **recommend using pandas data frames as outputs** in which +columns correspond to the independent variables of an experiment. -Once you've created your repository, you can implement your experimentalist by editing the `init.py` file in +Once you've created your repository, you can implement your experimentalist by editing the +`__init__.py` file in ``src/autora/experimentalist/name_of_your_experimentalist/``. You may also add additional files to this directory if needed. -It is important that the `init.py` file contains a function called `name_of_your_experimentalist` +It is important that the `__init__.py` file contains a function called +`name_of_your_experimentalist` which returns a set of experimental conditions (e.g., as a numpy array). -The following example ``init.py`` illustrates the implementation of a simple experimentalist +The following example ``__init__.py`` illustrates the implementation of a simple experimentalist that uniformly samples without replacement from a pool of candidate conditions. ```python @@ -44,25 +46,34 @@ Example Experimentalist """ import random -from typing import Iterable, Sequence, Union +import pandas as pd +import numpy as np +from typing import Iterable, Union -random_sample(conditions: Union[Iterable, Sequence], n: int = 1): +def random_sample(conditions: Union[pd.DataFrame, np.ndarray], + num_samples: int = 1) -> pd.DataFrame: """ Uniform random sampling without replacement from a pool of conditions. Args: conditions: Pool of conditions - n: number of samples to collect + num_samples: number of samples to collect - Returns: Sampled pool + Returns: Sampled pool of conditions """ - if isinstance(conditions, Iterable): - conditions = list(conditions) - random.shuffle(conditions) - samples = conditions[0:n] - - return samples + if isinstance(conditions, pd.DataFrame): + # Randomly sample N rows from DataFrame + sampled_data = conditions.sample(n=num_samples) + return sampled_data + + elif isinstance(conditions, np.ndarray): + # Randomly sample N rows from NumPy array + if num_samples > conditions.shape[0]: + raise ValueError("num_samples cannot be greater than the number of rows in the array.") + indices = np.random.choice(conditions.shape[0], size=num_samples, replace=False) + sampled_conditions = conditions[indices] + return sampled_conditions ``` ## Next Steps: Testing, Documentation, Publishing diff --git a/docs/contribute/modules/index.md b/docs/contribute/modules/index.md index ee4569a7a..f7b0eaf82 100644 --- a/docs/contribute/modules/index.md +++ b/docs/contribute/modules/index.md @@ -24,12 +24,13 @@ After setting up your repository and linking it to your GitHub account, you can ### Implement Your Code -You may implement your code in the ``init.py`` located in the respective feature folder in ``src/autora``. +You may implement your code in the ``__init__.py`` located in the respective feature folder in ``src/autora``. Please refer to the following guides on implementing -- [theorists](theorist.md) -- [experimentalists](experimentalist.md) -- [experiment runners](experiment-runner.md) + +* [theorists](theorist.md) +* [experimentalists](experimentalist.md) +* [experiment runners](experiment-runner.md) If the feature you seek to implement does not fit in any of these categories, then you can create folders for new categories. If you are unsure how to proceed, you are always welcome @@ -98,13 +99,16 @@ Once you've published your module, you should take some time to celebrate and an Once your package is working and published, you can **make a pull request** on [`autora`](https://github.com/autoresearch/autora) to have it vetted and added to the "parent" package. Note, if you are not a member of the AutoResearch organization on GitHub, you will need to create a fork of the repository for the parent package and submit your pull request via that fork. If you are a member, you can create a pull request from a branch created directly from the parent package repository. Steps for creating a new branch to add your module are specified below. !!! success + In order for your package to be included in the parent package, it must meet the following criteria: - - have basic documentation in ``docs/index.md`` - - have a basic python notebook exposing how to use the module in ``docs/Basic Usage.ipynb`` - - have basic tests in ``tests/`` - - be published via PyPI or Conda - - be compatible with the current version of the parent package - - follow standard python coding guidelines including PEP8 + + * have basic documentation in ``docs/index.md`` + * have a basic python notebook exposing how to use the module in ``docs/Basic Usage.ipynb`` + * have basic tests in ``tests/`` + * be published via PyPI or Conda + * be compatible with the current version of the parent package + * follow standard python coding guidelines including PEP8 + * the repository in which your package is hosted must be public The following demonstrates how to add a package published under `autora-theorist-example` in PyPI in the GitHub repository `example-contributor/contributor-theorist`. diff --git a/docs/contribute/modules/theorist.md b/docs/contribute/modules/theorist.md index 7f39e8484..42d478d14 100644 --- a/docs/contribute/modules/theorist.md +++ b/docs/contribute/modules/theorist.md @@ -2,7 +2,8 @@ AutoRA theorists are meant to return scientific models describing the relationship between experimental conditions and observations. Such models may take the form of a simple linear regression, non-linear equations, causal graphs, -a more complex neural network, or other models which +a more complex neural network, or other models which + - can be identified based on data (and prior knowledge) - can be used to make novel predictions about observations given experimental conditions. @@ -26,16 +27,19 @@ Make sure to select the `theorist` option when prompted. You can skip all other ## Implementation -Once you've created your repository, you can implement your theorist by editing the `init.py` file in +Once you've created your repository, you can implement your theorist by editing the `__init__.py` +file in ``src/autora/theorist/name_of_your_theorist/``. You may also add additional files to this directory if needed. -It is important that the `init.py` file contains a class called `NameOfYourTheorist` which inherits from +It is important that the `__init__.py` file contains a class called `NameOfYourTheorist` which +inherits from `sklearn.base.BaseEstimator` and implements the following methods: - `fit(self, conditions, observations)` - `predict(self, conditions)` See the [sklearn documentation](https://scikit-learn.org/stable/developers/develop.html) for more information on -how to implement the methods. The following example ``init.py`` illustrates the implementation of a simple theorist +how to implement the methods. The following example ``__init__.py`` illustrates the implementation +of a simple theorist that fits a polynomial function to the data: ```python @@ -45,6 +49,43 @@ Example Theorist """ import numpy as np +import pandas as pd +from typing import Union +from sklearn.base import BaseEstimator + + +class ExampleRegressor(BaseEstimator): + """ + This theorist fits a polynomial function to the data. + """ + + def __init__(self, degree: int = 2): + self.degree = degree + + def fit(self, conditions: Union[pd.DataFrame, np.ndarray], + observations: Union[pd.DataFrame, np.ndarray]): + + # fit polynomial function: observations ~ conditions + self.coeff = np.polyfit(conditions, observations, deg = 2) + self.polynomial = np.poly1d(self.coeff) + pass + + def predict(self, conditions): + + return self.polynomial(conditions) +``` + +Note, however, that it is best practice to make sure the conditions are compatible with the `polyfit`. In this case, we will make sure to add some checks: + +```python + +""" +Example Theorist +""" + +import numpy as np +import pandas as pd +from typing import Union from sklearn.base import BaseEstimator @@ -56,21 +97,32 @@ class ExampleRegressor(BaseEstimator): def __init__(self, degree: int = 2): self.degree = degree - def fit(self, conditions, observations): + def fit(self, conditions: Union[pd.DataFrame, np.ndarray], + observations: Union[pd.DataFrame, np.ndarray]): - # polyfit expects a 1D array - if conditions.ndim > 1: + # polyfit expects a 1D array, convert pandas data frame to 1D vector + if isinstance(conditions, pd.DataFrame): + conditions = conditions.squeeze() + + # polyfit expects a 1D array, flatten nd array + if isinstance(conditions, np.ndarray) and conditions.ndim > 1: conditions = conditions.flatten() - if observations.ndim > 1: - observations = observations.flatten() - - # fit polynomial - self.coeff = np.polyfit(conditions, observations, 2) + # fit polynomial function: observations ~ conditions + self.coeff = np.polyfit(conditions, observations, deg = 2) self.polynomial = np.poly1d(self.coeff) pass def predict(self, conditions): + + # polyfit expects a 1D array, convert pandas data frame to 1D vector + if isinstance(conditions, pd.DataFrame): + conditions = conditions.squeeze() + + # polyfit expects a 1D array, flatten nd array + if isinstance(conditions, np.ndarray) and conditions.ndim > 1: + conditions = conditions.flatten() + return self.polynomial(conditions) ```