-
Notifications
You must be signed in to change notification settings - Fork 73
Bring Your Own Model
In this section we explain how add your own model to work with CryptoNets. We demonstrate how to use Keras for the task but other tools can be used too. Creating scripts to automate this process is on the todo list.
Training a model to be used with CryptoNets is not different than training Neural Networks for other tasks. However, there are several points to consider when training:
- Supported layer types - CryptoNets supports only the following types of layers: dense layers, convolution layers, square activations, and mean pool layers.
- Depth - there will be big penalty in terms of inference latency and memory requirements for networks with many nonlinear transformations. Therefore, one should prefer to use only few square activations if possible. Note that adjacent linear layers (dense layers, convolution layers, and mean pools) can be collapsed into a single layer (see explanation below). Therefore, there is no penalty for additional linear layers.
- Width - to improve performance, it is beneficial to make sure that each hidden layer is not wider than the width of the cyphertext which is typically 8192 or 16384.
- Fidelity - to improve performance, it is beneficial to make sure that the inputs and weights do not require high fidelity to ensure correct predictions. This reduces the number of bits required in each message and allows working with smaller parameters. At training time this can be achieved by quantizing the inputs before training. As an example, if the inputs are numbers in the range 0-255 the following command will normalize them to be in the range 0-1 as well as quantize them to have only 8 levels
x_train = np.round(x_train/32)/8
x_test = np.round(x_test/32)/8`
from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D
import os
from keras import backend as K
import numpy as np
def square(x):
return x * x;
batch_size = 64
num_classes = 10
epochs = 500
num_predictions = 20
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar_model.h5'
# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train = np.round(x_train/32)/8
x_test = np.round(x_test/32)/8
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Conv2D(83, (3, 3)))
model.add(Dropout(0.25))
model.add(Activation(square))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Conv2D(130, (5, 5), padding='same'))
model.add(Activation(square))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
opt = keras.optimizers.Adam(amsgrad=True, decay=0.0001, lr = 0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
print('Using real-time data augmentation.')
datagen = ImageDataGenerator(
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
zca_epsilon=1e-06,
rotation_range=0,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.,
zoom_range=0.,
channel_shift_range=0.,
fill_mode='nearest',
cval=0.,
horizontal_flip=True,
vertical_flip=False,
rescale=None,
preprocessing_function=None,
data_format=None,
validation_split=0.0)
if not os.path.isdir(save_dir):
os.makedirs(save_dir)
model_path = os.path.join(save_dir, model_name)
checkpoint = keras.callbacks.ModelCheckpoint(model_path, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=10)
callback_list=[checkpoint]
datagen.fit(x_train)
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), callbacks=callback_list
epochs=epochs, validation_data=(x_test, y_test), workers=4)
# Save
model.save(model_path)
print('Saved trained model at %s ' % model_path)
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
Since CryptoNets models are using the square activation which does not exist in Keras. You have to define the square function before loading a previously saved model
from numpy import loadtxt
from keras.models import load_model
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'model.h5'
model_path = os.path.join(save_dir, model_name)
def square(x):
return x * x;
# load model
model = load_model(model_path, custom_objects={'square': square})
Once the model is training the next step is to convert the weight vectors to a format that CryptoNets recognizes. CryptoNets expects the weights to be in a CSV file where the weights for each layer are in a separate line. Once challenge is to collapse adjected linear layers into a single linear layer. This can be done in an analytic way, but it is sometime challenging when pooling and convolutions are used. Therefore, we demonstrate here how this can be done in a numeric way.
For each block of adjacent linear layers, we create a network that contains only these layers. We scan the inputs for this block and save the outputs. This is demonstrated in the following code:
from numpy import loadtxt
from keras.models import load_model
from keras.engine import InputLayer
output_center=2
input_kernel_range = range(0,10)
input_shape=[14,14,83]
input_maps = input_shape[2]
model_ = Sequential()
model_.add(AveragePooling2D(pool_size=(2, 2),input_shape=[14,14,83]))
model_.add(Conv2D(130, (5, 5), padding='same'))
model_.layers[1].set_weights(model.layers[6].get_weights())
test = np.zeros(input_shape)[np.newaxis,...]
bias = model_.predict(test)[0,output_center,output_center,:]
A = 0
for c in range(0, input_maps):
for y in input_kernel_range:
for x in input_kernel_range:
test = np.zeros(input_shape)[np.newaxis,...]
test[0,x,y,c]=1
prediction = model_.predict(test)
d = prediction[0,output_center,output_center,:] - bias
if (isinstance(A, int)):
A = d
else:
A = np.c_[A, d]
A.tofile("weights.csv", sep=',')
bias.tofile("bias.csv", sep=',')
Explanations:
The code assumes that the trained model is stored in a variable named model
. The code creates a network model_
that holds the block of adjacent linear layer, that is a block of layers that do not have any nonlinear activation in them. It also skips any dropout layer since these layers are not used at inference time.
There are several parameters that should be adjusted to the structure of the block:
- ‘output_center’ should be set such that in the output tensor of this block, the data at ‘[0,output_center, output_center,:]’ are not using any input which is padding in the input tensor. For example, if there is a padding of 2 on the input and the block generates output in skips of one then
output_center
should be set to 2. If the skips are of two thenoutput_center
should be set to 1. -
input_kernel_range
should be set such that the output defined byoutput_center
depends only on inputs whos x,y location are in theinput_kernel_range
-
input_shape
is the shape of the input tensor. - you should also declare the structure of
model_
and copy parameters frommodel
The code above generates two files: ‘weights.csv
and bias.csv
. You should generate such file for each block of the network. Once done, all the weights files should be combined into a single file and the bias file into another file. Therefore, if you have 3 blocks of layers and you created the files weights1.csv
weights2.csv
and weights3.csv
you can combine them into a single file using the command
gawk ‘{print $0;}’ weights1.csv weights2.csv weights3.csv > all_weights.csv
In the same way, the bias files bias1.csv
, bias2.csv
and bias3.csv
are joined using
gawk ‘{print $0;}’ bias1.csv bias2.csv bias3.csv > all_bias.csv
The number of lines in the weights and bias files should equal the number of blocks in the network. You can also verify that each line in these files contains the correct number of parameters. For the biases, issue the command
gawk -F”,” ‘{print NF,”parameters in block “, NR;}’ all_bias.csv
This command will print the number of bias parameters for each block. The number of such parameters for a block should be the number of output maps for this block in the neural network. For example, for the network defined above these number would be 83 for the first block, 130 for the second block, and 10 for the last block.
To test the weights file, issue the command
gawk -F”,” ‘{print NF,”parameters in block “, NR;}’ all_weights.csv
This will print the number of weights in each block. The expected number of weights is the size of the kernel of the block times the number of output maps. For example, if the block operates on kernels of size 5x5x3 and generates 32 output maps then the number of weights would be 5x5x3x32=2400. For the network defined above the number of weights are 8x8x3x83=15936 for the first block, 10x10x83x130=1079000 for the second block and 7x7x130x10=63700 for the third block.
Creating an application requires several steps:
- Creating the application and defining the model structure
- Testing the application without encryption
- Selecting encryption parameters
- Testing the application with encryption
- Optimizing parameters
- Reducing latency
In the following sections we describe each of these steps
The first step would be to create a basic application which is not optimized. The goal is to work on the correctness of the application first and defer optimizations to later steps. Create a project and add as references the SEAL library, the HE Wrapper library and the NeuralNetworks library. The main code should look like:
using System;
using NeuralNetworks;
using HEWrapper;
namespace MyNS
{
public class MyNetwork
{
static void Main(string[] args)
{
string fileName = "test.txt";
int batchSize = 1; // number of examples to use per batch
int numberOfRecords = 10; // number of records in the test set
var Factory = new RawFactory((ulong)batchSize); // we start without using encryption
// this will make things faster
// and easier to debug
WeightsReader wr = new WeightsReader("all_weight.csv", "all_bias.csv");
// load parameters
var ReaderLayer = new BatchReader // reads data from file
{
FileName = fileName,
SparseFormat = false,
MaxSlots = batchSize,
NormalizationFactor = 1.0 / 256.0,
Scale = 8.0
};
var EncryptedLayer = new EncryptLayer() { Source = ReaderLayer, Factory = Factory};
// next we define the structure of the network
var ConvLayer1 = new PoolLayer()
{
Source = EncryptedLayer,
InputShape = new int[] { 3, 32, 32 },
KernelShape = new int[] { 3, 8, 8 },
Upperpadding = new int[] { 0, 1, 1 },
Lowerpadding = new int[] { 0, 1, 1 },
Stride = new int[] { 1000, 2, 2 },
MapCount = new int[] { 83, 1, 1 },
WeightsScale = 128000.0,
Weights = (double[])wr.Weights[0],
Bias = (double[])wr.Biases[0]
};
var ActivationLayer2 = new SquareActivation()
{
Source = ConvLayer1
};
var ConvLayer3 = new PoolLayer
{
Source = ActivationLayer2,
InputShape = new int[] { 83, 14, 14 },
KernelShape = new int[] { 83, 10, 10 },
Upperpadding = new int[] { 0, 4, 4 },
Lowerpadding = new int[] { 0, 4, 4 },
Stride = new int[] { 83, 2, 2 },
MapCount = new int[] { 112, 1, 1 }
WeightsScale = 1280000.0,
Weights = ((double[])wr.Weights[1]),
Bias = ((double[])wr.Biases[1])
};
var ActivationLayer4 = new SquareActivation()
{
Source = DenseLayer4
};
var DenseLayer5 = new PoolLayer()
{
Source = ActivationLayer4,
Weights = (double[])wr.Weights[2],
Bias = (double[])wr.Biases[2],
WeightsScale = 256000.0
};
var network = DenseLayer5;
Console.WriteLine("Preparing");
network.PrepareNetwork();
int count = 0;
int errs = 0;
while (count < numberOfRecords)
{
using (var m = network.GetNext())
Utils.ProcessInEnv(env =>
{
var decrypted = m.Decrypt(env);
int pred = 0;
for (int j = 1; j < decrypted.RowCount; j++)
{
if (decrypted[j, 0] > decrypted[pred, 0]) pred = j;
}
if (pred != readerLayer.Labels[0]) errs++;
count++;
if (count % batchSize == 0)
Console.WriteLine("errs {0}/{1} accuracy {2:0.000}% prediction {3} label {4} {5}", errs, count, 100 - (100.0 * errs / (count)), pred, readerLayer.Labels[0], TimingLayer.GetStats());
}, factory);
}
Console.WriteLine("errs {0}/{1} accuracy {2:0.000}%", errs, count, 100 - (100.0 * errs / (count)));
network.DisposeNetwork();
Console.WriteLine("Max computed value {0} ({1})", RawMatrix.Max, Math.Log(RawMatrix.Max) / Math.Log(2));
}
}
The ReadLayer
accepts inputs in 2 formats. If SparseFormat
is set to false then the data is assumed to be tab separated where each line represents a single record. The label is in column LabelColumn
which is set by default to column 0. If SparseFormat
is set to true then each line in the file is expected to have the format label number-of-features index1:value1 index2:value2 ... indexN: valueN
where number of features
is the dimension of the input space (including zero features. For each none zero feature in the current record the pair index:value
contain the index of the feature (first feature is numbered 0) and value is the assigned value for the feature.
The NormalizationFactor
is used to apply normalization to the input data. Each input feature is multiplied by the value specified here. Make sure to use real numbers here since if you declare NormalizationFactor = 1/256
then the compiler will treat 1/256
as integers and compute this value to be zero.
The Scale
parameter is used to specify the fidelity in which parameters and inputs be used. Formally, any input or weight x
is processed with the formula x ← round(x * Scale) / Scale
. Therefore, if Scale
is set to 10 then the value of x
will be rounded to the closest number with only one digit after the decimal point. In the first stage of converting a network to be used with CryptoNets we suggest using large values for Scale
to make sure that no accuracy is lost due to rounding errors. Later in the process the value of Scale
should be reduced to improve performance when using encryption.
At the current stage you should have an application that compiles. In this stage the goal is to verify the correctness of the application. When the application is running it should process one record at a time and output the expected label and the computed label. It also keeps track of the accuracy so far. This can be compared to the values computed by Keras. There are several techniques that can be used for debugging in case the computed values do not match the expected ones.
When values to not compute correctly it is helpful to verify that values in hidden layers are computed correctly. To compute the output of a certain layer change the line var network = DenseLayer5;
in the above application to point to the layer you are interested in seeing it’s output. For example, if you’d like to see the output of the first convolution layer use var network = ConvLayer1;
. To see the output itself, you can either use breakpoints after the line var decrypted = m.Decrypt(env);
and inspect the variable decrypted
, or add the command Utils.Show(m, factory);
before or after this line. Utils.Show
prints the decrypted values of m
to the screen.
Note that the order of tensors in Cryptonet is HWC (Color, Horizontal, Vertical) while Keras, by default, uses CHW. This should be considered when comparing results. The difference in the order of parameters is a common source of problems in converting models from Keras (or other training systems) to CryptoNets.
When debugging, it is sometimes useful handcraft the inputs to a certain layer in order to test the output. This can be achieved in multiple ways. For testing a single layer, it is possible to generate a matrix of the desired size and feed it to the layer. For example, the layer ConvLayer3
expects an input of size 83x14x14. You can generate a matrix of this size using Math.Net, encrypt it, apply the layer and watch the results. The following code demonstrates generating a matrix which is all zeros but the first entry, applying ConvLayer3 to the matix and watching the results:
var mat = Matrix<Double>.Build.Dense(1, 83*14*14);
mat[0,0] = 1;
var enc = factory.GetEncryptedMatrix(mat, EMatrixFormat.ColumnMajor, 1);
var res = ConvLayer3.Apply(enc);
Utils.Show(res, factory);
Another option is to create a class that represents a layer that ignores its inputs and outputs a preassign data. Here is an example for such class. You determine its output by assigning value to its Data
property.
public class DummyLayer : BaseLayer
{
public IMatrix Data { get; set; }
public override IMatrix Apply(IMatrix m)
{
return Data;
}
public override int OutputDimension()
{
return (int)(Data.RowCount * Data.ColumnCount);
}
public override double GetOutputScale()
{
return Data.Scale;
}
public override IMatrix GetNext()
{
return Data;
}
}
The encryption scheme requires selecting parameters for the encryption. The main parameters to play with are the plaintext-modulus and the dimension. The parameters should have two main properties: (i) the results of the computation should be correct. (ii) the computation should be efficient. We start be selecting parameters that will provide correct results.
To allow correctness the parameters should support large enough number to be processed. Much like in traditional programming where a program might fail if number are allocated with insufficient space (short integers vs. long integers or floats vs. doubles), the same thing may happen when using homomorphic encryption. The first step is to determine the amount of space needed. When running without encryption (using the RawFactory), CryptoNets keys track of the size of number processes in the line:
Console.WriteLine("Max computed value {0} ({1})", RawMatrix.Max, Math.Log(RawMatrix.Max) / Math.Log(2));
prints the maximal number used (in absolute value) and the number of bits this number required to encode this number. To determine the number of bits needed, add 1 to this number since an additional bit is required to hold the sign of the number.
To provide the required number of bits, a number of prime numbers are provided such that the product of these numbers as at least the required number of bits. For example, if 70 bits are needed, we can use 2 prime numbers with 35 bits each or 4 prime number with 18 bits each. Working with more prime numbers increases the running time. However, smaller primes allow more computation to be done before the noise budget exceeds.
Noise budget is another important parameter of Homomorphic Encryptions. In a nut-shell, a freshly encrypted number has a certain amount of noise budget. Every operation on such number (addition, multiplication, …) reduces this budget. Once this budget hits zero, the decryption will fail to provide correct results. The amount of noise budget available is determined by several parameters, the most important of them are the dimension used (N
) and the size of the prime numbers used as plaintext-modulus. The dimension N should be a power of two, the larger it is, the greater the noise budget is. However, the larger N
is, the slower the program runs. Typical values for N
range from 2^12 to 2^15. On the other hand, greater noise budget is available when the plaintext modulus are smaller. However, as discussed above, working with smaller plaintext modulus requires using more plaintext modulus to achieve the required number of bits and therefore slows down the application. Selecting a good set of parameters is currently done manually.
After determining the required number of bits, select a value for N
and the number of primes to be used.
The following script can be used to find proper prime numbers for the encryption. You must supply 3 parameters: bits
is the minimal number of bits each prime should have. nDegree
is the number of bits in N, and count
is the number of primes to generate.
from sympy.ntheory import isprime
import math
bits = 39.8# number of bits requested from each prime
nDegree = 14
count = 2
mod = 2**(nDegree+1) # the value of n needed
if (bits < nDegree + 1):
bits = nDegree + 1
candidate = 1 + 2**math.floor(bits)
skip = math.ceil((2**(bits)-candidate)/mod)
candidate += skip * mod
while (count > 0):
while (not(isprime(candidate))):
candidate = candidate + mod
print(candidate)
candidate = candidate + mod
count = count - 1
For the parameters specified here, this script will return the numbers
957181001729
957181034497
With these numbers you can test your program by replacing the line
var Factory = new RawFactory((ulong)batchSize); // we start without using encryption
with the line
var Factory = new EncryptedSealBfvFactory(new ulong[] { 957181001729, 957181034497 }, 16384);
where the number 16384 is N
. Since we asked for 2 primes with 39.8 bits each, these parameters can support 79.6 bits.