All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- BREAKING: change
eval_frequency
default value to None to avoid confusion with hidden default value (#91) - BREAKING: rename Algo to Function (#82)
- BREAKING: clarify
EvaluationStrategy
arguments: changerounds
toeval_frequency
andeval_rounds
(#85)
0.34.0 - 2023-02-20
- Possibility to test on an organization where no training has been performed (#74)
- Add contributing, contributors & code of conduct files (#68)
- Test only field for datasamples (#67)
- Remove RemoteDataMethod and change RemoteMethod class to be fully flexible regarding function name. The substra-tools methods is now generic, and load the inputs depending on the inputs dictionary content (#59)
- BREAKING: rename tuple to task (#79)
0.33.0 - 2022-12-19
- test: add Github Action to run subprocess tests on Windows after each merge (#60)
- test: pass the CI e2e tests on Python 3.10 (#56)
-
fix: bug introduced with numpy 1.24 and cloudpickle: TypeError: __generator_ctor(). Remove version from requirements. (Issue open)
-
fix: bug introduced with numpy 1.24 and cloudpickle: TypeError: __generator_ctor(). Remove version from requirements.
0.32.0 - 2022-11-22
-
The metric registration is simplified. The user can now directly write a score function within their script, and directly register it by specifying the right dependencies and permissions. The score function must have
(datasamples, predictions_path)
as signature. (#47)Example of new metric registration:
metric_deps = Dependency(pypi_dependencies=["numpy==1.23.1"]) permissions_metric = Permissions(public=True) def mse(datasamples, predictions_path): y_true = datasamples["target"] y_pred = np.load(predictions_path) return np.mean((y_true - y_pred)**2) metric_key = add_metric( client=substra_client, permissions=permissions_metric, dependencies=metric_deps, metric_function=mse, )
-
doc on the model loading page (#40)
-
The round 0 is now exposed. Possibility to evaluate centralized strategies before any training (FedAvg, NR, Scaffold). The round 0 is skipped for single org strategy and cannot be evaluated before training (#46)
- Github actions on Ubuntu 22.04 (#52)
- torch algo: test that
with_batch_norm_parameters
is only about the running mean and variance of the batch norm layers (#30) - torch algo:
with_batch_norm_parameters
- also take into account thetorch.nn.LazyBatchNorm{x}d
layers (#30) - chore: use the generic task (#31)
- Apply changes from algo to function in substratools (#34)
- add
tools_functions
method toRemoteDataMethod
andRemoteMethod
to return the function(s) to send totools.execute
.
- add
- Register functions in substratools using decorator
@tools.register
(#37) - Update substratools Docker image (#49)
- Fix python 3.10 compatibility by catching OSError for Notebooks (#51)
- Free disk space in main github action to run the CI (#48)
- local dependencies are installed in one
pip
command to optimize the installation and avoid incompatibilities error (#39) - Fix error when installing current package as local dependency (#41)
- Fix flake8 repo for pre-commit (#50)
0.31.0 - 2022-10-03
- algo category from algo as it is not required by substra anymore
- documentation of the
predict
function of Algos was not up to date (#33)
0.30.0 - 2022-09-26
- Return statement of both
predict
and_local_predict
methods from Torch Algorithms.
- Update the Client, it takes a backend type instead of debug=True + env variable to set the spawner - (#210)
- Do not use Model.category since this field is being removed from the SDK
- Update the tests and benchmark with the change on Metrics from substratools (#24)
- NOTABLE CHANGES due to breaking changes in substra-tools:
- the opener only exposes
get_data
andfake_data
methods - the results of the above method is passed under the
datasamples
keys within theinputs
dict arg of all tools methods (train, predict, aggregate, score) - all method (train, predict, aggregate, score) now takes a
task_properties
argument (dict) in addition toinputs
andoutputs
- The
rank
of a task previously passed under therank
key within the inputs is now given in thetask_properties
dict under therank
key
- the opener only exposes
This means that all opener.py
file should be changed from:
import substratools as tools
class TestOpener(tools.Opener):
def get_X(self, folders):
...
def get_y(self, folders):
...
def fake_X(self, n_samples=None):
...
def fake_y(self, n_samples=None):
...
to:
import substratools as tools
class TestOpener(tools.Opener):
def get_data(self, folders):
...
def fake_data(self, n_samples=None):
...
This also implies that metrics
has now access to the results of get_data
and not only get_y
as previously. The user should adapt all of his metrics
file accordingly e.g.:
class AUC(tools.Metrics):
def score(self, inputs, outputs):
"""AUC"""
y_true = inputs["y"]
...
def get_predictions(self, path):
return np.load(path)
if __name__ == "__main__":
tools.metrics.execute(AUC())
could be replace with:
class AUC(tools.Metrics):
def score(self, inputs, outputs, task_properties):
"""AUC"""
datasamples = inputs["datasamples"]
y_true = ... # getting target from the whole datasamples
def get_predictions(self, path):
return np.load(path)
if __name__ == "__main__":
tools.metrics.execute(AUC())
- BREAKING CHANGE:
train
andpredict
method of all substrafl algos now takesdatasamples
as argument instead ofX
abdy
. This is impacting the user code only if he or she overwrite those methods instead of using the_local_train
and_local_predict
methods. - BREAKING CHANGE: The result of the
get_data
method from the opener is automatically provided to the givendataset
as__init__
arg instead ofx
andy
within thetrain
andpredict
methods of allTorch*Algo
classes. The userdataset
should be adapted accordingly e.g.:
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, x, y, is_inference=False) -> None:
...
class MyAlgo(TorchFedAvgAlgo):
def __init__(
self,
):
torch.manual_seed(seed)
super().__init__(
model=my_model,
criterion=criterion,
optimizer=optimizer,
index_generator=index_generator,
dataset=MyDataset,
)
should be replaced with
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, datasamples, is_inference=False) -> None:
...
class MyAlgo(TorchFedAvgAlgo):
def __init__(
self,
):
torch.manual_seed(seed)
super().__init__(
model=my_model,
criterion=criterion,
optimizer=optimizer,
index_generator=index_generator,
dataset=MyDataset,
)
0.29.0 - 2022-09-19
- Use the new Substra SDK feature that enable setting the
transient
flag on tasks instead ofclean_models
on compute plans to remove intermediary models.
0.28.0 - 2022-09-12
- Throw an error if
pytorch 1.12.0
is used. There is a regression bug intorch 1.12.0
, that impacts optimizers that have been pickled and unpickled. This bug occurs for Adam optimizer for example (but not for SGD). Here is a link to one issue covering it: pytorch/pytorch#80345
-
Removing
classic-algos
from the benchmark dependencies -
NOTABLE CHANGES due to breaking changes in substra-tools: the user must now pass the method name to execute from the tools defined class within the dockerfile of both
algo
andmetric
under the--method-name
argument:ENTRYPOINT ["python3", "metrics.py"]
shall be replaced by:
ENTRYPOINT ["python3", "metrics.py", "--method-name", "score"]
-
Use the new Susbtra sdk features that return the path of the downloaded file. Change the
model_loading.py
implementation and the tests.
- In the PyTorch algorithms, move the data to the device (GPU or CPU) in the training loop and predict function so that the user does not need to do it.
- Rename connect-tools docker images to substra-tools
- Benchmark:
- use public data hosted on Zenodo for the benchmark
- Fix the GPU test to the last breaking changes, and unskip the
use_gpu=False
case
- Update the NpIndexGenerator docstrings to add information how to use it as a full epoch index generator.
- BREAKING CHANGES:
- an extra argument
predictions_path
has been added to bothpredict
and_local_predict
methods from all*TorchAglo
classes. The user now have to use the_save_predictions
method to save its predictions in_local_predict
. The user defined metrics will load those saved prediction withnp.load(inputs['predictions'])
. The_save_predictions
method can be overwritten.
- an extra argument
Default _local_predict
method from substrafl algorithms went from:
def _local_predict(self, predict_dataset: torch.utils.data.Dataset):
if self._index_generator is not None:
predict_loader = torch.utils.data.DataLoader(predict_dataset, batch_size=self._index_generator.batch_size)
else:
raise BatchSizeNotFoundError(
"No default batch size has been found to perform local prediction. "
"Please overwrite the _local_predict function of your algorithm."
)
self._model.eval()
predictions = torch.Tensor([])
with torch.inference_mode():
for x in predict_loader:
predictions = torch.cat((predictions, self._model(x)), 0)
return predictions
to
def _local_predict(self, predict_dataset: torch.utils.data.Dataset, predictions_path: Path):
if self._index_generator is not None:
predict_loader = torch.utils.data.DataLoader(predict_dataset, batch_size=self._index_generator.batch_size)
else:
raise BatchSizeNotFoundError(
"No default batch size has been found to perform local prediction. "
"Please overwrite the _local_predict function of your algorithm."
)
self._model.eval()
predictions = torch.Tensor([])
with torch.inference_mode():
for x in predict_loader:
predictions = torch.cat((predictions, self._model(x)), 0)
self._save_predictions(predictions, predictions_path)
return predictions
- NOTABLE CHANGES due to breaking changes in connect-tools.
- both
load_predictions
andget_predictions
methods have been removed from the opener - the user defined
metrics
now takesinputs
andoutputs
as argument.inputs
is a dict containing:rank
: inty
: the result ofget_y
applied to the task datasamplespredictions
: a file path where the output predictions of the user defined algo has been saved. As stated above, those predictions can be load thanks tonp.load
if the user didn't overwrite the_save_predictions
methods from substrafl defined*Algo
.
outputs
is a dict containing:performance
: a file path where to save the result of the metrics. It must be done through thetools.save_performance
function.
- both
Instead of:
import substratools as tools
from sklearn.metrics import roc_auc_score
class AUC(tools.MetricAlgo):
def score(self, y_true, y_pred):
"""AUC"""
metric = roc_auc_score(y_true, y_pred) if len(set(y_true)) > 1 else 0
return float(metric)
if __name__ == "__main__":
tools.algo.execute(AUC())
the metric files should look like:
import numpy as np
import substratools as tools
from sklearn.metrics import roc_auc_score
class AUC(tools.MetricAlgo):
def score(self, inputs, outputs):
"""AUC"""
y_pred = np.load(inputs["predictions"])
y_true = inputs["y"]
metric = roc_auc_score(y_true, y_pred) if len(set(y_true)) > 1 else 0
tools.save_performance(float(metric), outputs["performance"])
if __name__ == "__main__":
tools.algo.execute(AUC())
- Documentation for the
_skip
argument from the_local_predict
and_local_train
methods ofTorch*Algo
.
- Update the inputs/outputs to make them compatible with the task execution
- GPU execution: move the RNG state to CPU in case the checkpoint has been loaded on the GPU
- fix: rng state for torch algos. Add test for both stability between organizations and rounds.
- feat:
_local_predict
has been re added - feat: add default batching to
predict
- BREAKING CHANGE: drop Python 3.7 support
- BREAKING CHANGE: the library is now named "substrafl"
- feat: add compute task inputs
- fix: support several items in the
Dependency
-local_dependencies
field
- feat: add compute task output
- BREAKING CHANGE: add the torch Dataset as argument of TorchAlgo to preprocess the data
The
_init_
function of the dataset must contain (self, x, y, is_inference). The__getitem__
function is expected to return x, y if is_inference is False, else x. This behavior can be changed by re-writing the_local_train
orpredict
methods._local_train
is no longer mandatory to overwrite any more. Its signature passed from(x, y)
to(train_dataset)
_local_predict
has been deleted._get_len_from_x
has been deleted.
- feat: the compute plan tasks are uploaded to Connect using the auto-batching feature (it should solve gRPC message errors for large compute plans)
- BREAKING CHANGE: convert (test task) to (predict task + test task)
-
Added functions to download the model of a strategy :
-
The function
substrafl.model_loading.download_algo_files
downloads the files needed to load the output model of a strategy according to the given round. These files are downloaded to the given folder. -
The
substrafl.model_loading.load_algo
function to load the output model of a strategy from the files previously downloaded via the the functionsubstrafl.model_loading.download_algo_files
.
Those two functions works together:
download_algo_files(client=substra_client, compute_plan_key=key, round_idx=None, dest_folder=session_dir) model = load_algo(input_folder=session_dir)
-
- compatibility with substra 0.28.0
- feat: Newton Raphson strategy
- added packaging to the install requirements
- Stop using metrics APIs, use algo APIs instead
- BREAKING CHANGE: Strategy rounds starts at
1
and initialization round is now0
. It used to start at0
and the initialization round was-1
For each composite train tuple, aggregate tuple and test tuple the meta dataround_idx
has changed accordingly to the rule stated above. - BREAKING CHANGE: rename node to organization in Connect
- Rename the
OneNode
strategy toSingleOrganization
- when using the
TorchScaffoldAlgo
:- The number of time the
_scaffold_parameters_update
method must be called within the_local_train
method is now checked - A warning is thrown if an other optimizer than
SGD
- If multiple learning rates are set for the optimizer, a warning is thrown and the smallest learning rate is used for
the shared state aggregation operation.
0
is not considered as a learning rate for this choice as it could be used to deactivate the learning process of certain layers from the model.
- The number of time the
- BREAKING CHANGE: add initialization round to centralized strategies :
- Each centralized strategy starts with an initialization round composed of one composite train tuple on each train data node
- One round of a centralized strategy is now:
Aggregation
->Training on composite
- Composite train tuples before test tuples have been removed
- All torch algorithm have now a common
predict
method - The
algo
argument has been removed from thepredict
method of all strategies - The
fake_traintuple
attribute of theRemoteStruct
class has been removed
The full discussion regarding this feature can be found here
-
feat: meaningful name for algo . You can use the
_algo_name
parameter to set a custom algo name for the registration. By default, it is set tomethod-name_class-name
.algo.train( node.data_sample_keys, shared_state=self.avg_shared_state, _algo_name=f"Training with {algo.__class__.__name__}", )
- chore: add latest connect-tools docker image selection
- Torch algorithms now support GPUs, there is a parameter
use_gpu
in the__init__
of the Torch algo classes. Ifuse_gpu
is True and there is no GPU detected, the code runs on CPU.
- The wheels of the libraries installed with
editable=True
are now in$HOME/.substrafl
instead of$LIB_PATH/dist
- benchmark:
make benchmark
runs the default remote benchmark on the connect platform specified in the config filemake benchmark-local
runs the default local benchmark in subprocess mode
-
BREAKING CHANGE: replace "tag" argument with "name" in execute_experiment
-
execute_experiment
checks that the algo and strategy are compatible. You can override the list of strategies the algo is compatible with using thestrategies
property :from substrafl.algorithms.algo import Algo from substrafl import StrategyName class MyAlgo(Algo): @property def strategies(self): return [StrategyName.FEDERATED_AVERAGING, StrategyName.SCAFFOLD] # ...
- feat: the compute plan key of the experiment is saved in the experiment summary before submitting or executing it
- feat: add the possibility for the user to pass additional metadata to the compute plan metadata
- Force the reinstallation of connect-tools in the Docker image, necessary for the editable mode
-
BREAKING CHANGE: the default value of
drop_last
in theNpIndexGenerator
is now False -
BREAKING CHANGE: the index generator is now required when implementing a strategy
from substrafl.index_generator import NpIndexGenerator nig = NpIndexGenerator( batch_size=batch_size, num_updates=num_updates, drop_last=False, # optional, defaults to False shuffle=True, # optional, defaults to True ) class MyAlgo(TorchFedAvgAlgo): def __init__(self): super().__init__( index_generator=nig, # other parameters ) # ...
-
The user can now initialize his
TorchAlgo
function with custom parameters (only primitive types are supported) :class MyAlgo(TorchFedAvgAlgo): def __init__(self, my_arg): super.__init__( model=model, criterion=criterion optimizer=optimizer, index_generator=nig, my_arg=my_arg, # This is necessary ) # ...
- Fix the format of the asset ids: the right format is
str(uuid.uuid4())
and notuuid.uuid4().hex
- feat: rename "compute_plan_tag" to "tag" #131
- feat: Add the optional argument "compute_plan_tag" to give the user the possibility to choose its own tag (timestamp by default) #128
- feat: Scaffold strategy
- feat: add one node strategy
- The Connect tasks have a
round_idx
attribute in their metadata - doc: add python api to documentation
- API documentation: fix the docstrings and the display of the documentation for some functions
- (BREAKING CHANGE) FedAvg strategy: the train function must return a FedAvgSharedState, the average function returns a FedAvgAveragedState. No need to change your code if you use TorchFedAvgAlgo
- benchmark:
- Use the same batch sampler between the torch and Substrafl examples
- Make it work with
num_workers
> 0 - Explain the effect of the sub-sampling
- Update the default benchmark parameters in
benchmarks.sh
- Add new curves to the plotting: when one parameter changes while the others stay the same
- Use connect-tools 0.10.0 as a base image for the Dockerfile
- fix: naming changed from FedAVG to FedAvg
- fix: log a warning if an existing wheel is used to build the docker image
- fix:
execute_experiment
has no side effects on its arguments - fix:
Dependency.local_package
are installed in no editable mode and additionally acceptspyproject.yaml
as configuration file - fix:
execute_experiment
acceptsNone
asevaluation_strategy
- fix: The
substrafl.algorithms.algo.Algo
abstractmethod
decorator is now taken into account
- feat:
EvaluationStrategy
can now be reinitialized - Refactoring
substrafl.algorithms.pytorch.fed_avg.TorchFedAvgAlgo
:- replace the
_preprocess
and_postprocess
functions by_local_train
and_local_predict
- the user can override the
_get_len_from_x
function to get the number of samples in the dataset from x batch_size
is now a required argument, and a warning is issued if it is None
- replace the
- The
substrafl.index_generator.np_index_generator.NpIndexGenerator
class now works withtorch.utils.data.DataLoader
, withnum_workers
> 0 - The benchmark uses
substrafl.algorithms.pytorch.fed_avg.TorchFedAvgAlgo
instead of its own custom algorithm - Add the
clean_models
option to theexecute_experiment
function
- feat: make a base class for the index generator and document it
- The
Algo
now exposes amodel
property to get the model after downloading it from Connect - (BREAKING CHANGE) experiment summary is saved as a json in
experiment_folder
- fix: notebook dependency failure You can now run a substrafl experiment with local dependencies in a Jupyter notebook
-
feat: models can now be tested every n rounds, on the same nodes they were trained on This feature introduces a new parameter
evaluation_strategy
inexecute_experiment
, which takes anEvaluationStrategy
instance fromsubstrafl.evaluation_strategy
. If this parameter is not given, performance will not be measured at all (previously, it was measured at the end of the experiment by default). -
feat: install substrafl from pypi
- fix: Update pydantic version to enable autocompletion
- feat: Add a FL algorithm wrapper in PyTorch for the federated averaging strategy
- test: connect-test integration
- feat: Add a possibility to test an algorithm on selected rounds or every n rounds
- fix: dependency management: the
local_code
dependencies are copied to the same folder structure relatively to the algo - fix: dependency management - it failed when resolving the
local_code
dependencies because the path to the algo was relative
- feat: batch indexer
- feat: more logs + function to set the logging level
- Subprocess mode is now faster as it fully reuses the user environment instead of re building the connect related parts (substra #119 and #63)
- fix: error message for local dependency
- feat: User custom dependencies
- feat: support substra subprocess mode
- first release