Install via pip
: pip install delphi.ai
This library holds a collection of algorithms that can be used debias models that have been defected due to truncation, or missing data. A few projects using the library can found in:
We demonstrate how to use the library in a set of walkthroughs and our API reference. Functionality provided by the library includes:
For best results using the package, the data should have mean 0 and variance 1.
Before running PSGD, the library will check that all of the required
arguments are provided for runnning the procedure with an internal function. After this, all other hyperparameters can be provided by the user, or their defaults values will be used. The current
default hyperparameters can be seen by looking at the delphi.utils.defaults.py
file.
For logging experiment information, we use MadryLab's cox. More information and tutorials on how to use the logging framework, check out the link.
- distributions:
distributions
module includes algorithms for learning from censored (known truncation) and truncated (unknown truncation; unsupervised learning) distributions - stats :
stats
module includes models for regression and classification from truncated samples
CensoredNormal
learns censored normal distributions, by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper Efficient Statistics in High Dimensions from Truncated Samples.
When evaluating censored normal distributions, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the CensoredNormal
module. The CensoredNormal
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable)): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionvariance
(float): provide distribution's variance, if the distribution's variance is given, the mean is exclusively calculatedepochs
(int): maximum number of times to iterate over datasettrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1step_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5batch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track distribution's train and validation losses
loc_
(torch.Tensor): distribution's estimated meanvariance_
(torch.Tensor): distribution's estimated variance
In the following code block, here, we show an example of how to use the censored normal distribution module:
from delphi.distributions.censored_normal import CensoredNormal
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate 0 (ie. S = {x >= 0 for all x in S})
phi = oracle.Left_Distribution(0.0)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha})
# define censored normal distribution object
censored = CensoredNormal(train_kwargs, store=store)
# fit to dataset
censored.fit(S)
# close store
store.close()
CensoredMultivariateNormal
learns censored multivariate normal distributions, by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper Efficient Statistics in High Dimensions from Truncated Samples.
When evaluating censored multivariate normal distributions, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the CensoredMultivariateNormal
module. The CensoredMultivariateNormal
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressioncovariance_matrix
(torch.Tensor): provide distribution's covariance_matrix, if the distribution's covariance_matrix is given, the mean vector is exclusively calculatedepochs
(int): maximum number of times to iterate over datasettrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1step_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5batch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track distribution's train and validation losses
loc_
(torch.Tensor): distribution's estimated meancovariance_matrix_
(torch.Tensor): distribution's estimated covariance matrix
In the following code block, here, we show an example of how to use the censored multivariate normal distribution module:
from torch import Tensor
from delphi.distributions.censored_multivariate_normal import CensoredMultivariateNormal
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate 0 (ie. S = {x >= 0 for all x in S})
phi = oracle.Left_Distribution(Tensor([0.0, 0.0]))
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha})
# define censored multivariate normal distribution object
censored = CensoredMultivariateNormal(train_kwargs, store=store)
# fit to dataset
censored.fit(S)
# close store
store.close()
TruncatedNormal
learns truncated normal distributions, with unknown truncation, by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper Efficient Truncated Statistics with Unknown Truncation.
When evaluating truncated normal distributions, the user needs to import
the TruncatedNormal
module. The TruncatedNormal
module accepts
a parameters object that the user can define for running the PSGD procedure. When debiasing truncated normal distributions, we don't require a membership
oracle, as it is unknown. However, after running our procedure, we are able to provide an approximation of what the truncation set is. Since the user
inputs a membership oracle in the args
object, when the truncation set is known, we add the learned membership oracle to the args
object as well.
NOTE: when learning truncation sets, the user can not pass in a Parameters
object directly into the TruncatedNormal
object, because they will not
be able to access the Parameters
object afterwards.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:alpha
(float): required argument; survivial probability for truncated regressioncovariance_matrix
(torch.Tensor): provide distribution's covariance_matrix, if the distribution's covariance_matrix is given, the mean vector is exclusively calculatedepochs
(int): maximum number of times to iterate over datasettrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1step_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5batch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default Falsed
(int): degree of expansion to use for Hermite polynomial when learning truncation set; default 100
store
(cox.store.Store): logging object to keep track distribution's train and validation losses
loc_
(torch.Tensor): distribution's estimated meanvariance_
(torch.Tensor): distribution's estimated variance
In the following code block, here, we show an example of how to fit the truncated normal distribution module:
from delphi.distributions.truncated_normal import TruncatedNormal
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate 0 (ie. S = {x >= 0 for all x in S})
phi = oracle.Left_Distribution(0.0)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha,
'd': 100})
# define truncated normal distribution object
truncated = TruncatedNormal(train_kwargs, store=store)
# fit to dataset
truncated.fit(S)
# close store
store.close()
After fitting the distribution, we now have a membership oracle that we learned through a hermite polynomial. In the following code block, we show an example of how use the membership oracle:
import torch as ch
from torch.distributions.multivariate_normal import MultivariateNormal
# generate samples from a standard multivariate normal distribution
M = MultivariateNormal(ch.zeros(1,), ch.eye(1))
samples = M.rsample([1000,])
# filter samples with learning membership oracle
filtered = train_kwargs.phi(samples)
TruncatedMultivariateNormal
learns truncated multivariate normal distributions, with unknown truncation, by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper Efficient Truncated Statistics with Unknown Truncation.
When evaluating truncated multivariate normal distributions, the user needs to import
the TruncatedMultivariateNormal
module. The TruncatedMultivariateNormal
module accepts
a parameters object that the user can define for running the PSGD procedure. When debiasing truncated normal distributions, we don't require a membership
oracle, as it is unknown. However, after running our procedure, we are able to provide an approximation of what the truncation set is. Since the user
inputs a membership oracle in the args
object, when the truncation set is known, we add the learned membership oracle to the args
object as well.
NOTE: when learning truncation sets, the user can not pass in a Parameters
object directly into the TruncatedMultivariateNormal
object, because they will not
be able to access the Parameters
object afterwards.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionvariance
(float): provide distribution's variance, if the distribution's variance is given, the mean is exclusively calculatedepochs
(int): maximum number of times to iterate over datasettrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1step_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5batch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default Falsed
(int): degree of expansion to use for Hermite polynomial when learning truncation set; default 100
store
(cox.store.Store): logging object to keep track distribution's train and validation losses
loc_
(torch.Tensor): distribution's estimated meancovariance_matrix_
(torch.Tensor): distribution's estimated covariance matrix
In the following code block, here, we show an example of how to use the truncated multivariate normal distribution module:
from torch import Tensor
from delphi.distributions.truncated_multivariate_normal import TruncatedMultivariateNormal
from delphi.utils.helpers import Parameters
from delphi import oracle
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate 0 (ie. S = {x >= 0 for all x in S})
phi = oracle.Left_Distribution(Tensor([0.0, 0.0]))
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha,
'd': 100})
# define truncated multivariate normal distribution object
truncated = TruncatedMultivariateNormal(train_kwargs, store=store)
# fit to dataset
truncated.fit(S)
# close store
store.close()
After fitting the distribution, we now have a membership oracle that we learned through a hermite polynomial. In the following code block, we show an example of how use the membership oracle:
import torch as ch
from torch.distributions.multivariate_normal import MultivariateNormal
# generate samples from a standard multivariate normal distribution
M = MultivariateNormal(ch.zeros(2,), ch.eye(2))
samples = M.rsample([1000,])
# filter samples with learning membership oracle
filtered = train_kwargs.phi(samples)
TruncatedBooleanProduct
learns truncated boolean product distributions, by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper Efficient Parameter Estimation of Truncated Boolean Product Distributions.
When evaluating truncated multivariate normal distributions, the user needs to import
the TruncatedBernoulli
module. The TruncatedBernoulli
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionepochs
(int): maximum number of times to iterate over datasettrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1step_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5batch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track distribution's train and validation losses
probs_
(torch.Tensor): distribution's d-dimensional probability vectorlogits_
(torch.Tensor): distribution's d-dimensional logits vector (log probabilities)
In the following code block, here, we show an example of how to use the truncated multivariate normal distribution module:
from torch import Tensor
from delphi.distributions.truncated_boolean_product import TruncatedBernoulli
from delphi.utils.helpers import Parameters
from delphi import oracle
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# sum floor truncate at 0 (ie. S = {x.sum() >= 50 for all x in S})
phi = oracle.Sum_Floor(50)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha})
# define truncated bernoulli distribution object
trunc_bool = TruncatedBernoulli(train_kwargs, store=store)
# fit to dataset
trunc_bool.fit(S)
# close store
store.close()
TruncatedLinearRegression
learns from truncated linear regression model's with the noise
variance is known or unknown. In the known setting we use the algorithm described in the following
paper: Computationally and Statistically Efficient Truncated Regression. When
the variance of the ground-truth linear regression's model is unknown, we use the algorithm described in
the following paper: Efficient Truncated Linear Regression with Unknown Noise Variance.
When evaluating truncated regression models, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the TruncatedLinearRegression
module. The TruncatedLinearRegression
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionepochs
(int): maximum number of times to iterate over datasetnoise_var
(float): provide noise variance, if the noise variance for the truncated regression model is known, else unknown variance procedure is run by defaultfit_intercept
(bool): whether to fit the intercept or not; default to Truetrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1var_lr
(float): initial learning rate to use variance parameters, when running unknown variancestep_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5normalize
(bool): our methods assume that the max(||x_{i}||_{2}) <= 1, so before running the procedure, you must divide the input featurers X = {x_{(1)}, x_{(2)}, ... , x_{(n)}} by \max(||x_{i}||_{2}) \dot \sqrt(k), where k represents the number of dimensions the input features have; by default the procedure normalizes the features for the userbatch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track regression's train and validation losses
coef_
(torch.Tensor): regression weight coefficientsintercept_
(torch.Tensor): regression intercept termvariance_
(torch.Tensor): if the noise variance is unknown, this property provides its estimate
In the following code block, here, we show an example of how to use the library with unknown noise variance:
from delphi.stats.truncated_linear_regression import TruncatedLinearRegression
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate linear regression at 0 (ie. S = {y >= 0 for all (x, y) in S})
phi = oracle.Left_Regression(0.0)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha})
# define trunc linear regression object
trunc_reg = TruncatedLinearRegression(train_kwargs, store=store)
# fit to dataset
trunc_reg.fit(X, y)
# close store
store.close()
# make predictions with new regression
print(trunc_reg.predict(X))
predict(X)
: predict regression points for input feature matrix X (num_samples by features)
TruncatedLassoRegression
learns from truncated LASSO regression model's with the noise
variance is known. In the known setting we use the algorithm described in the following
paper Truncated Linear Regression in High Dimensions
When evaluating truncated lasso regression models, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the TruncatedLassoRegression
module. The TruncatedLassoRegression
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionl1
(float): l1 regularizationepochs
(int): maximum number of times to iterate over datasetnoise_var
(float): provide noise variance, if the noise variance for the truncated regression model is known, else unknown variance procedure is run by defaultfit_intercept
(bool): whether to fit the intercept or not; default to Truetrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1var_lr
(float): initial learning rate to use variance parameters, when running unknown variancestep_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5normalize
(bool): our methods assume that the max(||x_{i}||_{2}) <= 1, so before running the procedure, you must divide the input featurers X = \{x_{(1)}, x_{(2)}, ... , x_{(n)}\} by max(||x_{i}||_{2}) \dot \sqrt(k), where k represents the number of dimensions the input features have; by default the procedure normalizes the features for the userbatch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track lasso regression's train and validation losses
coef_
(torch.Tensor): regression weight coefficientsintercept_
(torch.Tensor): regression intercept termvariance_
(torch.Tensor): if the noise variance is unknown, this property provides its estimate
In the following code block, here, we show an example of how to use the truncated lasso regression module with known noise variance:
from delphi.stats.truncated_lasso_regression import TruncatedLassoRegression
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate lasso regression at 0 (ie. S = {y>= 0 for all (x, y) in S})
phi = oracle.Left_Regression(0.0)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha,
'noise_var': 1.0})
# define trunc linear LASSO regression object
trunc_lasso_reg = TruncatedLassoRegression(train_kwargs, store=store)
# fit to dataset
trunc_lasso_reg.fit(X, y)
# close store
store.close()
# make predictions with new regression
print(trunc_lasso_reg.predict(X))
predict(X)
: predict regression points for input feature matrix X (num_samples by features)
TruncatedRidgeRegression
learns from truncated ridge regression model's when the noise
variance is known or unknown.
When evaluating truncated ridge regression models, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the TruncatedRidgeRegression
module. The TruncatedRidgeRegression
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionweight_decay
(float): weight decay regularizationepochs
(int): maximum number of times to iterate over datasetnoise_var
(float): provide noise variance, if the noise variance for the truncated regression model is known, else unknown variance procedure is run by defaultfit_intercept
(bool): whether to fit the intercept or not; default to Truetrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1var_lr
(float): initial learning rate to use variance parameters, when running unknown variancestep_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5normalize
(bool): our methods assume that the max(||x_{i}||_{2}) <= 1, so before running the procedure, you must divide the input featurers X = \{x_{(1)}, x_{(2)}, ... , x_{(n)}\} by max(||x_{i}||_{2}) \dot \sqrt(k), where k represents the number of dimensions the input features have; by default the procedure normalizes the features for the userbatch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track lasso regression's train and validation losses
coef_
(torch.Tensor): regression weight coefficientsintercept_
(torch.Tensor): regression intercept termvariance_
(torch.Tensor): if the noise variance is unknown, this property provides its estimate
In the following code block, here, we show an example of how to use the truncated lasso regression module with known noise variance:
from delphi.stats.truncated_ridge_regression import TruncatedRidgeRegression
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate lasso regression at 0 (ie. S = {y>= 0 for all (x, y) in S})
phi = oracle.Left_Regression(0.0)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha,
'weight_decay': .01,
'noise_var': 1.0})
# define trunc linear LASSO regression object
trunc_ridge_reg = TruncatedRidgeRegression(train_kwargs, store=store)
# fit to dataset
trunc_ridge_reg.fit(X, y)
# close store
store.close()
# make predictions with new regression
print(trunc_ridge_reg.predict(X))
predict(X)
: predict regression points for input feature matrix X (num_samples by features)
TruncatedElasticNetRegression
learns from truncated elastic net regression model's when the noise
variance is known or unknown.
When evaluating truncated elastic net regression models, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the TruncatedElasticNetRegression
module. The TruncatedRidgeRegression
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionweight_decay
(float): weight decay regularizationl1
(float): l1 regularizationepochs
(int): maximum number of times to iterate over datasetnoise_var
(float): provide noise variance, if the noise variance for the truncated regression model is known, else unknown variance procedure is run by defaultfit_intercept
(bool): whether to fit the intercept or not; default to Truetrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1var_lr
(float): initial learning rate to use variance parameters, when running unknown variancestep_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5normalize
(bool): our methods assume that the max(||x_{i}||_{2}) <= 1, so before running the procedure, you must divide the input featurers X = \{x_{(1)}, x_{(2)}, ... , x_{(n)}\} by max(||x_{i}||_{2}) \dot \sqrt(k), where k represents the number of dimensions the input features have; by default the procedure normalizes the features for the userbatch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track lasso regression's train and validation losses
coef_
(torch.Tensor): regression weight coefficientsintercept_
(torch.Tensor): regression intercept termvariance_
(torch.Tensor): if the noise variance is unknown, this property provides its estimate
In the following code block, here, we show an example of how to use the truncated lasso regression module with known noise variance:
from delphi.stats.truncated_elastic_net_regression import TruncatedElasticNetRegression
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate lasso regression at 0 (ie. S = {y>= 0 for all (x, y) in S})
phi = oracle.Left_Regression(0.0)
# pass algorithm parameters in through Parameters object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha,
'weight_decay': .01,
'noise_var': 1.0})
# define trunc linear LASSO regression object
trunc_elastic_reg = TruncatedRidgeRegression(train_kwargs, store=store)
# fit to dataset
trunc_elastic_reg.fit(X, y)
# close store
store.close()
# make predictions with new regression
print(trunc_elastic_reg.predict(X))
predict(X)
: predict regression points for input feature matrix X (num_samples by features)
TruncatedLogisticRegression
learns truncated logistic regression models by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper A Theoretical and Practical Framework for Classification and Regression from Truncated Samples.
.
When evaluating truncated logistic regression models, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and the TruncatedLogisticRegression
module. The TruncatedLogisticRegression
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionepochs
(int): maximum number of times to iterate over datasetfit_intercept
(bool): whether to fit the intercept or not; default to Truetrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1var_lr
(float): initial learning rate to use variance parameters, when running unknown variancestep_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5normalize
(bool): our methods assume that the max(||x_{i}||_{2}) <= 1, so before running the procedure, you must divide the input featurers X = {x_{(1)}, x_{(2)}, ... , x_{(n)}} by max(||x_{i}||_{2}) \dot \sqrt(k), where k represents the number of dimensions the input features have; by default the procedure normalizes the features for the userbatch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change epochs, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False - just a tdqm output
store
(cox.store.Store): logging object to keep track logistic regression's train and validation losses and accuracy
coef_
(torch.Tensor): regression weight coefficientsintercept_
(torch.Tensor): regression intercept term
In the following code block, here, we show an example of how to use the truncated logistic regression module:
from delphi.stats.truncated_logistic_regression import TruncatedLogisticRegression
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate logistic regression at 0 (ie. S = {z >= -.1 for all (x, y) in S})
phi = oracle.Left_Regression(-0.1)
# pass algorithm parameters in through parameter object
train_kwargs = Parameters({'phi': phi,
'alpha': alpha})
# define truncated logistic regression object
trunc_log_reg = TruncatedLogisticRegression(train_kwargs, store=store)
# fit to dataset
trunc_log_reg.fit(X, y)
# close store
store.close()
# make predictions with new regression
print(trunc_log_reg.predict(X))
predict(X)
: predict classification for input feature matrix X (num_samples by features)
TruncatedProbitRegression
learns truncated probit regression models, by maximizing the truncated log likelihood.
The algorithm that we use for this procedure is described in the following
paper A Theoretical and Practical Framework for Classification and Regression from Truncated Samples.
When evaluating truncated logistic regression models, the user needs three things; an oracle, a Callable that
indicates whether a sample falls within the truncation set, the model's alpha
, survival probability, and TruncatedProbitRegression
module. The TruncatedProbitRegression
module accepts
a parameters object that the user can define for running the PSGD procedure.
args
(delphi.utils.Parameters): parameters object that holds hyperparameters for experiment. Possible hyperparameters include:phi
(Callable): required argument; callable class that receives num_samples by 1 inputtorch.Tensor
, and returns a num_samples by 1 outputs a num_samples by 1Tensor
with(0, 1)
representing membership inS
or not.alpha
(float): required argument; survivial probability for truncated regressionepochs
(int): maximum number of times to iterate over datasetfit_intercept
(bool): whether to fit the intercept or not; default to Truetrials
(int): maximum number of trials to perform PSGD; after trials, model with smallest loss on the dataset is returnedval
(float): percentage of dataset to use for validation set; default .2lr
(float): initial learning rate to use for regression weights; default 1e-1step_lr
(int): number of gradient steps to take before adjusting learning rate by valuestep_lr_gamma
; default 100step_lr_gamma
(float): amount to adjust learning rate, everystep_lr
stepsnew_lr = curr_lr * step_lr_gamma
custom_lr_multiplier
(str): cosine or cyclic for cosine annealing learning rate scheduling or cyclic learning rate scheduling; default Nonemomentum
(float): momentum; default 0.0adam
(bool): use adam adaptive learning rate optimizer; default Falseeps
(float): epsilon denominator for gradients (ie. to prevent divide by zero calculations); default 1e-5r
(float): initial projection set radius; default 1.0rate
(float): at the end of each trial, the projection set radius is increased at rate rate; default 1.5normalize
(bool): our methods assume that the max(||x_{i}||_{2}) <= 1, so before running the procedure, you must divide the input featurers X = \{x_{(1)}, x_{(2)}, ... , x_{(n)}\} by max(||x_{i}||_{2}) \dot \sqrt(k), where k represents the number of dimensions the input features have; by default the procedure normalizes the features for the userbatch_size
(int): the number of samples to use for each gradient step; default 50tol
(float): if using early stopping, threshold for when to stop; default 1e-3workers
(int): number of workers to use for procedure; default 1num_samples
(int): number of samples to sample from distribution in gradient for each sample in batch (ie. if batch size is 10, and num_samples is 100, the each gradient step with sample 100 * 10 samples from a gaussian distribution); default 50early_stopping
(bool): whether to check loss for convergence; compares the best avg validation loss at the end of an epoch, with current avg epoch loss estimate, if best_loss - curr_loss < tol for n_iter_no_change, then procedure terminates; default Falsen_iter_no_change
(int): number of iterations to check for change before declaring convergence; default 5verbose
(bool): whether to print a verbose output with loss logs, etc.; default False
store
(cox.store.Store): logging object to keep track probit regression's train and validation losses and accuracy
coef_
(torch.Tensor): regression weight coefficientsintercept_
(torch.Tensor): regression intercept term
In the following code block, here, we show an example of how to use the truncated probit regression module:
from delphi.stats.truncated_probit_regression import TruncatedProbitRegression
from delphi import oracle
from delphi.utils.helpers import Parameters
from cox.store import Store
OUT_DIR = 'PATH_TO_EXPERIMENT_LOGGING_DIRECTORY'
store = Store(OUT_DIR)
# left truncate probit regression at 0 (ie. S = {z >= -0.1 for all (x, y) in S})
phi = oracle.Left_Regression(-0.1)
# pass algorithm parameters in through dictionary
train_kwargs = Parameters({'phi': phi,
'alpha': alpha})
# define truncated probit regression object
trunc_prob_reg = TruncatedProbitRegression(train_kwargs, store=store)
# fit to dataset
trunc_prob_reg.fit(X, y)
# close store
store.close()
# make predictions with new regression
print(trunc_prob_reg.predict(X))
predict(X)
: predict classification for input feature matrix X (num_samples by features)