FederatedML
- Hetero SecureBoost: more efficient computation with GOSS, histogram subtraction, cipher compression, 2-4x faster
- Hetero GLM: improved communication efficiency, adjustable floating point precision, 2x faster
- Hetero NN: adjustable floating point precision, support SelectiveBackPropagation and dropOut on interaction layer, 2x faster
- Hetero Feature Binning: improved algorithm with cipher compression, 2x faster
- Intersect: add split calculation option and adjustable random base fraction, 30% faster
- Homo NN: restructure torch backend and enhanced grammar; train and predict with raw image data
- Intersect supports SM3 hashing method
- Hetero SecureBoost: L1 penalty & adjustable min_child_weight to prevent overfitting
- NEW SecureBoost Transformer: feature engineering module that encodes instances with leaf nodes from SecureBoost model
- Hetero Pearson: support local VIF computation
- Hetero Feature Selection: support selection based on VIF and Pearson
- NEW Homo Feature Binning: support virtual/recursive binning strategy
- NEW Sample Weight: set sample weights based on label or from feature column, Hetero GLM & Hetero SecureBoost support weighted training
- NEW Data Transformer: case-insensitive on data schema
- Local Baseline supports prediction task
- Cross Validation: output fold split history
- Evaluation: add multi-result-unfold option which unfolds multi-classification evaluation result to several binary evaluation results in a one-vs-rest manner
System Architecture
- Added local file system directory path virtual storage engine to support image input data
- Added the message queue Pulsar cross-site transmission engine, which can be used with the Spark computing engine, and can be added to the Exchange role to support the star networking mode
FATE-Test
- Add Benchmark performance for efficiency comparison; add mock data generation tool; support metrics comparison between training and validation sets
- FATE-Flow unittest for REST/CLI/SDK API and training-prediction workflow
FederatedML
- Add Feldman Verifiable Secret Sharing protocol (contributed)
- Add Feldman Verifiable Sum Module (contributed)
- Updated FATE-Client and FATE-Test for new FATE-Flow
- Upgraded early stopping strategy: record best model for each metric
Fate-Flow
- Optimize the model center, reconstruct publishing model, support deploy, load, bind, migrate operations, and add new interfaces such as model info
- Improve identity authentication and resource authorization, support party identity verification, and participate in the authorization of roles and components
- Optimize and fix resource manager, add task_cores job parameters to adapt to different computing engines
Eggroll
- In one-way communication mode, add party identity authentication function, which needs to be used with FATE-Cloud
Deploy
- Support 1.5.0 retain data upgrade to 1.5.1
- Fix predict-cache in SecureBoost validation
- Fix job clean CLI
FederatedML
- Refactored Hetero FTL with optional communication-efficiency mechanism, with 4x time efficiency improvement
- Hetero SecureBoost supports complete secure mode
- Hetero SecureBoost now can reduce time consumption over highly sparse data by using sparse matrix computation on histogram aggregations.
- Hetero SecureBoost optimization: the communication round in prediction is reduced to no larger than tree depth, prediction speed is improved by 32 times in a 100-tree model.
- Addition of Hetero FastSecureBoost module, whose mixed/layered modeling method makes it twice as efficient as SecureBoost
- Improved Hetero Federated Binning with 30%~50% time efficiency improvement
- Better GLM: >10% improvement in time efficiency
- FATE first unsupervised learning algorithm: Hetero KMeans
- Upgraded Hetero Feature Selection: add PSI filter and SecureBoost feature importance filter
- Add Data Split module: splitting data into train, validate, and test sets inside FATE modeling workflow
- Add DataStatistic module: compute min/max, mean, median, skewness, kurtosis, coefficient of variance, percentile, etc.
- Add PSI module for computing population stability index
- Add Homo OneHot module for one-hot encoding in homogeneous scenario
- Evaluation module adds metrics for clustering
- Optional FedProx mechanism for Homo LR, useful for training with non-iid data
- Add Oblivious Transfer Protocol and OT-based module Secure Information Retrieval
- Random Iterative Affine protocol, providing additional security
Fate-Flow
- Brand new scheduling framework based on global state and optimistic concurrency control and support multiple scheduler
- Upgraded task scheduling: multi-model output for component, executing component in parallel, component rerun
- Add new DSL v2 which significantly improves user experiences in comparison to DSL v1. Several syntax error detection functions are supported in v2. Now DSL v1 and v2 are compatible in the current FATE version
- Enhanced resource scheduling: remove limit on job number, base on cores, memory and working node according to different computing engine supports
- Add model registry, supports model query, import/export, model transfer between clusters
- Add Reader component: automatically dump input data to FATE-compatible format and cluster storage engine; now data from HDFS
- Refactor submit job configuration's parameters setting, support different parties use different job parameters when using dsl V2.
System Architecture
- New architectural framework that supports a combination of different computing, storage, and transfer engines
- Support new engine combination: Spark、HDFS、RabbitMQ
- New data table management, standardized API for all different storage engines
- Rearrange FATE code structure, conf setting at one place, streamlined user experiment
- Support one-way network communication between parties, only one party needs to open the entrance network strategy
FATE-Client
- Pipeline, a tool with a keras-like user interface and integrates TensorFlow, PyTorch, Keras in the backend, is used for fast federated model building with FATE
- Brand new CLI v2 with easy independent installation, user-friendly programming syntax & command-line prompt
- Support FLOW python language SDK
- Support PyPI
FATE-Test
- Testsuite: For Fate function regressions
- Benchmark tool and examples for comparing modeling quality; provided examples include common models such as heterogeneous LR, SecureBoost, and NN
- Performance Statistics: Log now includes statistics on timing, API usage, and variable transfer
EggRoll
- RollSite supports the communication certificates
FATE-Flow
- Task Executor supports monkey patch
- Add forward API
FederatedML
- Fix bug of Hetero SecureBoost of sending tree weight info from guest to host.
FederatedML
- Optimize performance of Pearson which increases efficiency by more than twice.
- Optimize Min-test module: Add secure-boost as optional test task. Set partyid and work_mode as input parameters. Use pre-import data set as input so that improved test process.
- Support tok_k iv filter in feature selection module.
- Support filling missing value for tag:value format data in DataIO.
- Fix bug of lacking one layer of depth of tree in HeteroSecureBoost and support automatically alignment header of input data in predict process
- Standardize the naming of example data set and add a data pre-import script.
FATE-Flow
- Distinguish between user stop job and system stop job;
- Optimized some logs;
- Optimize zookeeper configuration
- The model supports persistent storage to mysql
- Push the model to the online service to support the specified storage address (local file and FATEFlowServer interface)
FederatedML
- Reconstructed Evaluation Module improves efficiency by 60 times
- Add PSI, confusion matrix, f1-score and quantile threshold support for Precision/Recall in Evaluation.
- Add option to retain duplicated keys in Union.
- Support filter feature based on mode
- Manual filter allows manually set columns to retain
- Auto recoginize whether a data set includes a label column in predict process
- Bug-fix: Missing schema after merge in Union; Fail to align label of multi-class in homo_nn with PyTorch backend; Floating-point precision error and value error due to int-type input in Feature Scale
FATE-Flow
- Allow the host to stop the job
- Optimize the task queue
- Automatically align the input table partitions of all participants when the job is running
- Fate flow client large file upload optimization
- Fixed some bugs with abnormal status
FederatedML
- Support Homo Secureboost
- Support AIC/BIC-based Stepwise for Linear Models
- Add Hetero Optimal Feature Binning, support iv/gini/chi-square/ks,and allow asymmetric binning methods
- Interoperate with AI ecosystem: Add pytorch backend for Homo NN
- Homo Framework factorization, simplify developing homo algorithms
- Early stopping strategy for hetero algorithms.
- Local Baseline supports multi-class classification
- Add consistency check to Predict function
- Optimize validation strategy,3x speed up in-training validation
FATE-Flow
- Refactoring model management, native file directory storage, storage structure is more flexible, more information
- Support model import and export, store and restore with reliable distributed system(Redis is currently supported)
- Using MySQL instead of Redis to implement Job Queue, reducing system complexity
- Support for uploading client local files
- Automatically detects the existence of the table and provides the destroy option
- Separate system, algorithm, scheduling command log, scheduling command log can be independently audited
Eggroll
Stability Boosts:
- New resource management components introduce the brand new session mechanism. Processors can be cleaned up with a simple method call, even the session goes wrong.
- Removes storage service. No C++ / native library compilation is needed.
- Federated learning algorithms can still work at a 28% packet loss rate.
Performance Boosts:
- Performances of federated learning algorithms are improved on Eggroll 2. Some algorithms get 10x performance boost.
- Join interface is 16x faster than pyspark under federated learning scenarios.
User Experiences Boosts:
- Quick deployment. Maven, pip, config and start.
- Light dependencies. Check our requirements.txt / pom.xml and see.
- Easy debugging. Necessary running contexts are provided. Runtime status are kept in log files and databases.
- Few daemon processes. And they are all JVM applications.
Deploy
- Support deploying by MacOS
- Support using external db
- Deploy JDK and Python environments on demand
- Improve MySQL and FATE Flow service.sh
- Support more custom deployment configurations in the default_configurations.sh, such as ssh_port, mysql_port and so one.
FederatedREC
- Add federated recommendation submodule
- Add heterogeneous Factoraization Machine
- Add hemogeneous Factoraization Machine
- Add heterogeneous Matrix Factorization
- Add heterogeneous Singular Value Decomposition
- Add heterogeneous SVD++ (Factorization Meets the Neighborhood)
- Add heterogeneous Generalized Matrix Factorization
FederatedML
- Support Sparse data training in heterogeneous General Linear Model(Hetero-LR、Hetero-LinR、Hetero-PoissonR)
- Fix 32M limitation of quantile binning to support higher feature dimension
- Fix 32M limitation of histogram statistics for SecureBoost to support higher feature dimension training.
- Add abnormal parameters and input data detection in OneHot Encoder
- fix not passing validate data to fit process to support evaluate validation data during training process
FATE-Flow
- Add clean job CLI for cleaning output and intermediate results, including data, metrics and sessions
- Support for obtaining table namespace and name of output data via CLI
- Fix KillJob unsuccessful execution in some special cases
- Improve log system, add more exception and run time status prompts
FederatedML
- Add heterogeneous Deep Neural Network
- Add Secret-Sharing Protocol-SPDZ
- Add heterogeneous feature correlation algorithm with SPDZ and support heterogeneous Pearson Correlation Calculation
- Add heterogeneous SQN optimizer, available for Hetero-LogisticRegression and Hetero-LinearRegression, which can reduce communication costs significantly
- Supports intersection for expanding duplicate IDs
- Support multi-host in heterogeneous feature binning
- Support multi-host in heterogeneous feature selection
- Support IV calculation for categorical features in heterogeneous feature binning
- Support transform raw feature value to WOE in heterogeneous feature binning
- Add manual filters in heterogeneous feature selection
- Support performance comparison with sklearn's logistic regression
- Automatic object/table clean in training iteration procedure in Federation
- Improve transfer performance for large object
- Add automatic scripts for submitting and running tasks
FATE-Flow
- Add data management module for recording the uploaded data tables and the outputs of the model in the job running, and for querying and cleaning up CLI.
- Support registration center for simplifying communication configuration between FATEFlow and FATEServing
- Restruct model release logic, FATE_Flow pushes model directly to FATE-Serving. Decouple FATE-Serving and Eggroll, and the offline and online architectures are connected only by FATE-Flow.
- Provide CLI to query data upload record
- Upload and download data support progress statistics by line
- Add some abnormal diagnosis tips
- Support adding note information to job
Native Deploy
- Fix bugs in EggRoll startup script, add mysql, redis startup options.
- Disable host name resolution configuration for mysql service.
- The version number of each module of the software packaging script is updated using the automatic acquisition mode.
- Add cluster deployment support based on ubuntu operating system。
- Add union component which support data merging.
- Support indicating partial columns in Onehot Encoder
- Support intermediate data cleanup after the task ends
- Accelerated Intersection
- Optimizing the deployment process
- Fix a bug of secureboost' early stop
- Fix a bug in download api
- Fix bugs of spark-backend
FederatedML
- Provide a general algorithm framework for homogeneous federated learning, which supports Secure Aggregation
- Add homogeneous Deep Neural Network
- Add heterogeneous Linear Regression
- Add heterogeneous Poisson Regression
- Support multi-host in heterogeneous Logistic Regression
- Support multi-host in heterogeneous Linear Regression
- Support multi-host Intersection
- Accelerated Intersection by usage of cache
- Reconstruct heterogeneous Generalized Linear Models Framework
- Support affine homomorphic encryption in heterogeneous SecureBoost
- Support input data with missing value in heterogeneous SecureBoost
- Support evaluation during training on both train and validate data
- Add spark as computing engine
FATE-Flow
- Upload and Download support CLI for querying job status
- Support for canceling waiting job
- Support for setting job timeout
- Support for storing a job scheduling log in the job log folder
- Add authentication control Beta version, including component, command, role
- Python and JDK environment are required only for running standalone version quick experiment
- Support cluster version docker deployment
- Add deployment guide in Chinese
- Standalone version job for quick experiment is supported when cluster version deployed.
- Python service log will remain for 14 days now.
- Fix bugs of multi-host support in Cross-Validation
- Fix bugs of showing up evaluation metrics when both train and eval exist
- Add links for each algorithm module in FederatedML home page README
- Fix bugs for evaluation data type
- Fix bugs for feature binning to take abnormal values into consideration
- Fix bugs for train and eval
- Fix bugs in binning merge
- Fix bugs in Samplers
- Fix federated feature selection feature filter bug
- Support upload file in version argument
- Support get serviceRoleName from configuration
This version includes two new products of FATE, FATE-Board, and FATE-Flow respectively, FATE-Board as a visual tool for federation modeling, and FATE-Flow is an end to end pipeline platform for federated learning. This version contains important improvements to the FederatedML, which better tracks the running progress of federated learning algorithms.
FATE-Board
- Federated Learning Job DashBoard
- Federated Learning Job Visualisation
- Federated Learning Job Management
- Real-time Log Panel
FATE-FLOW
- DAG defines Pipeline
- Federated Multi-party asymmetric DSL parser
- Federated Learning lifecycle management
- Federated Task collaborative scheduling
- Tracking for data, metric, model and so on
- Federated Multi-party model management
FederatedML
- Update all algorithm modules running mechanism for supporting federated modeling pipeline by FATE-Flow
- Intermediate statistic result callback is available and visualizable in FATE-Board for all algorithm modules.
- Support Nesterov Momentum SGD Optimizer
- Add Homomorphic Encryption Scheme Based on Affine Transforms
- Support sparse input-format in federated feature binning
- Update evaluation metrics, such as ks, roc, gain, lift curve and so on
- Update algorithm's parameter-define class
FATE-Serving
- Add online federated modeling pipeline DSL parser for online federated inference
- Adjust the Logic of Online Service Module
- Adjust the log format
- Replace the grpc connection pool of the online service module
- Improving Model Processing Details
- fix feature scale bugs in v0.3
- fix federated feature selection bugs in v0.3
FederatedML
- Support OneVsALL for multi-label classification task
- Add trash-recycle in Hetero Logistic Regression
- Add numeric stable for sigmoid and log_logistic function.
- Support different calculation mode in Hetero Logistic Regression and Hetero SecureBoost
- Decouple Federated Feature Binning and Federated Feature Selection
- Add feature importance calculation in Hetero SecureBoost
- Add multi-host in Hetero SecureBoost
- Support tag:value sparse format input data
- Support output intersect-id with feature-instance in Intersection
- Support OneHot encoding module.
- Support bucket binning for Federated Feature Binning.
- Support add, sub, mul, div ,gt, lt ,eq, etc mathematical operator on Fixed-Point data
- Add authority validation for parameter setting
FATE-Serving
- Add multi-level cache for multi-party inference result
- Add startInferceJob and getInferenceResult interfaces to support the inference process asynchronization
- Normalized inference return code
- Real-time logging of inference summary logs and inferential detail logs
- Improve the loading of the pre and post processing adapter and data access adapter for host
EggRoll
- New computing and storage APIs
- Stability optimizations
- Performance optimizations
- Storage usage improvements
Example
- Add Mini-FederatedML test task example
- Using task manager to submit distributed task for current examples
- fix detect onehot max column overflow bug.
- fix dataio dense format not reading host data header bug.
- fix bugs of call of statistics function
- fix bug for federated feature selection that at least one feature remains for each party
- Not allowing so small batch size in LR module for safety consideration.
- fix naming error in federated feature selection module.
- Fix the bug of automated publishing model information in some extreme cases
- Fixed some overflow bugs in fixed-point data
- fix many other bugs.
WorkFlow
- Add Model PipleLine
- Add Hetero Federated Feature Binning workflow
- Add Hetero Federated Feature Selection workflow
- Add hetero dnn workflow
- Add intersection operator before train, predict and cross_validation
FederatedML
- Support svm-light sparse format inputdata
- Support tag sparse format inputdata
- Add Hetero Federated Feature Binning
- Add Hetero Federated Feature Selection
- Add Feature Scaler: MinMaxScaler & StandardScaler
- Add Feature Imputer for missing value filling
- Add Data Statistic for datainstance
- Support encoding and main calculation role configurable for RAW Intesection
- Add Sampler: RandomSampler & StratifiedSampler
- Support regression in SecureBoost
- Support regression evaluation
- Support Decentralized FTL
- Add feature extracting by DNN
- Change Model Format to ProtoBuf
- Add abnormal parameter detection
- Add abnormal input data detection
FATE-Serving(An online inference for federated learning models)
- Dynamic Loading Federated Learning Models.
- Real-time Prediction Using Federated Learning Models.
Model Management
- Versioning
- Reproducibility
- Queries, Search
Task Manager
- Add Load File/ Download File
- Add Import ID from Local File
- Add Start workflow
- Add workflow Job Queue
- Add Query Job Status
- Add Get Runtime conf
- Add Delete Task
EggRoll
- Add Node Manager for multiprocessor to improve distributed computing performance
- Add C++ overwrite storage service
- Add eggroll cleanup API
Deploy
- Add auto-deploy
- Improved deployment documentation
Example
- Add Hetero Federated Feature Binning example
- Add Hetero Federated Feature Selection example
- Add Hetero DNN example
- Add toy example
- Add task manager examples
- Add Serving example
- Hetero-LR Minibath bugfixed
- Gradient Average bugfixed
- One-second latency for proxy bugfixed
- Training flowid bugfixed
- Many bugfixes
- Many performance improvements
- Many documentation fixes
Initial release of FATE.
WorkFlow
- Support Intersection workflow
- Support Train workflow
- Support Predict workflow
- Support Validation workflow
- Support Model Load and Save workflow
FederatedML
- Support Distributed Secure Intersection and Raw Intersection for Sample Alignment
- Support Distributed Homogeneous LR and Heterogeneous LR
- Support Distributed SecureBoost
- Support Distributed Secure Federated Transfer Learning
- Support Binary and Multi-Class Evaluation
- Support Model Cross-Validation
- Supprt Mini-Batch
- Support L1, L2 Regularizers
- Support Multi-Party Homogeneous FederatedAggregator
- Support Multi-Party Heterogeneous FederatedAggregator
- Support Partially Homomorphic Encryption MPC Protocol
Architecture
- Initial release of Computing APIs
- Initial release of Storage APIs
- Initial release of Federation APIs
- Initial release of cross-site network communication (i.e. 'Federation')
- Initial release of Standalone runtime, including computing engine and k-v storage
- Initial release of Distributed runtime, including distributed computing engine, distributed k-v storage, metadata management and intra-site/cross-site network communication
- Support cross-site heterogenous infrastructure
- Initial support of modeling and inference
Deploy
- Support standalone (docker & manual) deployment
- Support cluster deployment