Skip to content

Latest commit

 

History

History
61 lines (36 loc) · 5.21 KB

algorithms.md

File metadata and controls

61 lines (36 loc) · 5.21 KB

Federated Analysis Algorithms Descriptions

More information on the available federated analysis algorithms, although retrievable through the individual component list, can be directly found on https://github.com//madgik/exareme2/tree/master/documentation/algorithms

This includes documentation on the existing algorithm federation approach, unit tests, as well as information related to creating a new algorithm.

k-Means Clustering

The purpose of the k-means algorithm is to partition the data into k clusters. Here, all attributes should be numerical and we use the Eucledean distance as our metric.

Documentation

ANOVA

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among group means in a sample. The platform offers both one-way and two-way ANOVA analysis tools. Wikipedia

Documentation

Linear Regression

Linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. This algorithm is also available on the platform with cross-validation.

Documentation

Logistic Regression

Logistic Regression training is done by Maximum Likelihood Estimation (MLE) by gradient descent using, for example, Newton's method. Applying Newton's method leads to the following algorithm, called Iteratively Reweighted Least Squares (IRLS). Here the dependent variable y has to be binary. This algorithm is also available with cross-validation.

Documentation

Naive Bayes Classifier

In machine learning, naïve Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naïve) independence assumptions between the features. The platform offers Gaussian Naive Bayes classification and Categorical Naive Bayes classification, and they both make use of the cross-validation technique. Wikipedia

Documentation

Pearson Correlation

This algorithm computes the Pearson correlation coefficient between two vectors x and y using the eq.(1)

Documentation

Principal Components Analysis (PCA)

The are usually two approaches for computing the principal components. The first is by diagonalizing the covariance matrix, while the second is by SVD decomposition on the data matrix X. In most implementations, the second approach is preferred due to its numerical stability. Here, however, we took the first approach since it better fits with our privacy requirements. Additionally, as a first step, data is centered and standardized.

Documentation

Support Vector Machine (SVM)

In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Wikipedia

Documentation

Independent T-Test

The Student’s Independent sample t-test (sometimes called a two-sample t-test) is used to test the null hypothesis that two groups have the same mean. A low p-value suggests that the null hypothesis is not true, and therefore the group means are different. In each local dataset, let x and y be the variables of interest. y is the grouping variable with two levels.

Documentation

One-Sample T-Test

The Student’s One-sample t-test is used to test the null hypothesis that the true mean is equal to a particular value (typically zero). A low p-value suggests that the null hypothesis is not true, and therefore the true mean (μ) must be different from the test value. In each local dataset, let xj be the variable of interest.

Documentation

Paired T-Test

The Student’s paired samples t-test (sometimes called a dependent-samples t-test) is used to test the null hypothesis that the difference between pairs of measurements is equal to zero. A low p-value suggests that the null hypothesis is not true, and that the difference between the measurement pairs is not zero. In each local dataset, let xj1 and xj2 be the variables of interest.

Documentation