Skip to content

Bring Your Own Model

ranigb edited this page Sep 18, 2019 · 20 revisions

Warning: this section is work in progress

In this section we explain how add your own model to work with CryptoNets. We demonstrate how to use Keras for the task but other tools can be used too. Creating scripts to automate this process is on the todo list.

Table of content

Training

Training a model to be used with CryptoNets is not different than training Neural Networks for other tasks. However, there are several points to consider when training:

  1. Supported layer types - CryptoNets supports only the following types of layers: dense layers, convolution layers, square activations, and mean pool layers.
  2. Depth - there will be big penalty in terms of inference latency and memory requirements for networks with many nonlinear transformations. Therefore, one should prefer to use only few square activations if possible. Note that adjacent linear layers (dense layers, convolution layers, and mean pools) can be collapsed into a single layer (see explanation below). Therefore, there is no penalty for additional linear layers.
  3. Width - to improve performance, it is beneficial to make sure that each hidden layer is not wider than the width of the cyphertext which is typically 8192 or 16384.
  4. Fidelity - to improve performance, it is beneficial to make sure that the inputs and weights do not require high fidelity to ensure correct predictions. This reduces the number of bits required in each message and allows working with smaller parameters. At training time this can be achieved by quantizing the inputs before training. As an example, if the inputs are numbers in the range 0-255 the following command will normalize them to be in the range 0-1 as well as quantize them to have only 8 levels
    x_train = np.round(x_train/32)/8
    x_test = np.round(x_test/32)/8`

Example of training code

from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D
import os
from keras import backend as K
import numpy as np



def square(x):
    return x * x;

batch_size = 64
num_classes = 10
epochs = 500
num_predictions = 20
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar_model.h5'

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train = np.round(x_train/32)/8
x_test = np.round(x_test/32)/8
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Conv2D(83, (3, 3)))
model.add(Dropout(0.25))
model.add(Activation(square))

model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Conv2D(130, (5, 5), padding='same'))
model.add(Activation(square))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

opt = keras.optimizers.Adam(amsgrad=True, decay=0.0001, lr = 0.001)

model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])


print('Using real-time data augmentation.')
datagen = ImageDataGenerator(
    featurewise_center=False,  
    samplewise_center=False,  
    featurewise_std_normalization=False,
    samplewise_std_normalization=False, 
    zca_whitening=False,  
    zca_epsilon=1e-06,  
    rotation_range=0,  
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.,
    zoom_range=0., 
    channel_shift_range=0., 
    fill_mode='nearest',
    cval=0.,
    horizontal_flip=True,
    vertical_flip=False, 
    rescale=None,
    preprocessing_function=None,
    data_format=None,
    validation_split=0.0)
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
model_path = os.path.join(save_dir, model_name)
checkpoint = keras.callbacks.ModelCheckpoint(model_path, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=10)
callback_list=[checkpoint]
datagen.fit(x_train)
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), callbacks=callback_list
                        epochs=epochs, validation_data=(x_test, y_test), workers=4)

# Save 
model.save(model_path)
print('Saved trained model at %s ' % model_path)

scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Tip - loading a model

Since CryptoNets models are using the square activation which does not exist in Keras. You have to define the square function before loading a previously saved model

from numpy import loadtxt
from keras.models import load_model
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'model.h5'
model_path = os.path.join(save_dir, model_name)
def square(x):
    return x * x;
# load model
model = load_model(model_path, custom_objects={'square': square})

Converting weights to CryptoNets format

Once the model is training the next step is to convert the weight vectors to a format that CryptoNets recognizes. CryptoNets expects the weights to be in a CSV file where the weights for each layer are in a separate line. Once challenge is to collapse adjected linear layers into a single linear layer. This can be done in an analytic way, but it is sometime challenging when pooling and convolutions are used. Therefore, we demonstrate here how this can be done in a numeric way.

For each block of adjacent linear layers, we create a network that contains only these layers. We scan the inputs for this block and save the outputs. This is demonstrated in the following code:

from numpy import loadtxt
from keras.models import load_model
from keras.engine import InputLayer

output_center=2
input_kernel_range = range(0,10)
input_shape=[14,14,83]
input_maps = input_shape[2]
model_ = Sequential()
model_.add(AveragePooling2D(pool_size=(2, 2),input_shape=[14,14,83]))
model_.add(Conv2D(130, (5, 5), padding='same'))
model_.layers[1].set_weights(model.layers[6].get_weights())
test = np.zeros(input_shape)[np.newaxis,...]
bias = model_.predict(test)[0,output_center,output_center,:]
A = 0
for c in range(0, input_maps):
    for y in input_kernel_range:
        for x in input_kernel_range:
            test = np.zeros(input_shape)[np.newaxis,...]
            test[0,x,y,c]=1
            prediction = model_.predict(test)
            d = prediction[0,output_center,output_center,:] - bias
            if (isinstance(A, int)):
                A = d
            else:
                A = np.c_[A, d]
A.tofile("weights.csv", sep=',')
bias.tofile("bias.csv", sep=',')

Explanations: The code assumes that the trained model is stored in a variable named model. The code creates a network model_ that holds the block of adjacent linear layer, that is a block of layers that do not have any nonlinear activation in them. It also skips any dropout layer since these layers are not used at inference time.

There are several parameters that should be adjusted to the structure of the block:

  • ‘output_center’ should be set such that in the output tensor of this block, the data at ‘[0,output_center, output_center,:]’ are not using any input which is padding in the input tensor. For example, if there is a padding of 2 on the input and the block generates output in skips of one then output_center should be set to 2. If the skips are of two then output_center should be set to 1.
  • input_kernel_range should be set such that the output defined by output_center depends only on inputs whos x,y location are in the input_kernel_range
  • input_shape is the shape of the input tensor.
  • you should also declare the structure of model_ and copy parameters from model

The code above generates two files: ‘weights.csv and bias.csv. You should generate such file for each block of the network. Once done, all the weights files should be combined into a single file and the bias file into another file. Therefore, if you have 3 blocks of layers and you created the files weights1.csv weights2.csv and weights3.csv you can combine them into a single file using the command

gawk ‘{print $0;}’ weights1.csv weights2.csv weights3.csv > all_weights.csv 

In the same way, the bias files bias1.csv, bias2.csv and bias3.csv are joined using

gawk ‘{print $0;}’ bias1.csv bias2.csv bias3.csv > all_bias.csv 

Tip – verifying files

The number of lines in the weights and bias files should equal the number of blocks in the network. You can also verify that each line in these files contains the correct number of parameters. For the biases, issue the command

gawk -F”,” ‘{print NF,”parameters in block “, NR;}’ all_bias.csv

This command will print the number of bias parameters for each block. The number of such parameters for a block should be the number of output maps for this block in the neural network. For example, for the network defined above these number would be 83 for the first block, 130 for the second block, and 10 for the last block.

To test the weights file, issue the command

gawk -F”,” ‘{print NF,”parameters in block “, NR;}’ all_weights.csv

This will print the number of weights in each block. The expected number of weights is the size of the kernel of the block times the number of output maps. For example, if the block operates on kernels of size 5x5x3 and generates 32 output maps then the number of weights would be 5x5x3x32=2400. For the network defined above the number of weights are 8x8x3x83=15936 for the first block, 10x10x83x130=1079000 for the second block and 7x7x130x10=63700 for the third block.

Creating an application for the model

Creating an application requires several steps:

  1. Creating the application and defining the model structure
  2. Testing the application without encryption
  3. Selecting encryption parameters
  4. Testing the application with encryption
  5. Optimizing parameters
  6. Reducing latency

In the following sections we describe each of these steps

1. Creating the application

The first step would be to create a basic application which is not optimized. The goal is to work on the correctness of the application first and defer optimizations to later steps. Create a project and add as references the SEAL library, the HE Wrapper library and the NeuralNetworks library. The main code should look like:

using System; 
using NeuralNetworks; 
using HEWrapper; 
 
 
namespace MyNS
{ 
     public class MyNetwork 
     { 
         static void Main(string[] args) 
         { 
             string fileName = "test.txt"; 
             int batchSize = 1;                        // number of examples to use per batch
             int numberOfRecords = 10;      // number of records in the test set
             var Factory = new RawFactory((ulong)batchSize);  // we start without using encryption
                                                                                                       // this will make things faster
                                                                                                       // and easier to debug

             WeightsReader wr = new WeightsReader("all_weight.csv", "all_bias.csv"); 
                                                        // load parameters
            
             var ReaderLayer = new BatchReader  // reads data from file
             { 
                 FileName = fileName, 
                 SparseFormat = false, 
                 MaxSlots = batchSize, 
                 NormalizationFactor = 1.0 / 256.0, 
                 Scale = 8.0 
             }; 
 
 
             var EncryptedLayer = new EncryptLayer() { Source = ReaderLayer, Factory = Factory};
    
            // next we define the structure of the network


            var ConvLayer1 = new PoolLayer()
            {
                Source = EncryptedLayer,
                InputShape = new int[] { 3, 32, 32 },
                KernelShape = new int[] { 3, 8, 8 },
                Upperpadding = new int[] { 0, 1, 1 },
                Lowerpadding = new int[] { 0, 1, 1 },
                Stride = new int[] { 1000, 2, 2 },
                MapCount = new int[] { 83, 1, 1 },
                WeightsScale = 128000.0,
                Weights = (double[])wr.Weights[0],
                Bias = (double[])wr.Biases[0]
            };


            var ActivationLayer2 = new SquareActivation()
            {
                Source = ConvLayer1
            };


            var ConvLayer3 = new PoolLayer
            {
                Source = ActivationLayer2,
                InputShape = new int[] { 83, 14, 14 },
                KernelShape = new int[] { 83, 10, 10 },
                Upperpadding = new int[] { 0, 4, 4 },
                Lowerpadding = new int[] { 0, 4, 4 },
                Stride = new int[] { 83, 2, 2 },
                MapCount = new int[] { 112, 1, 1 }
                WeightsScale = 1280000.0,
                Weights = ((double[])wr.Weights[1]),
                Bias = ((double[])wr.Biases[1])
            };


            var ActivationLayer4 = new SquareActivation()
            {
                Source = DenseLayer4
            };

            var DenseLayer5 = new PoolLayer()
            {
                Source = ActivationLayer4,
                Weights = (double[])wr.Weights[2],
                Bias = (double[])wr.Biases[2],
                WeightsScale = 256000.0
            };


            var network = DenseLayer5;
            Console.WriteLine("Preparing");
            network.PrepareNetwork();
            int count = 0;
            int errs = 0;
            while (count < numberOfRecords)
            {
                using (var m = network.GetNext())
                    Utils.ProcessInEnv(env =>
                    {
                        var decrypted = m.Decrypt(env);
                        int pred = 0;
                        for (int j = 1; j < decrypted.RowCount; j++)
                        {
                            if (decrypted[j, 0] > decrypted[pred, 0]) pred = j;
                        }
                        if (pred != readerLayer.Labels[0]) errs++;
                        count++;
                        if (count % batchSize == 0)
                            Console.WriteLine("errs {0}/{1} accuracy {2:0.000}% prediction {3} label {4} {5}", errs, count, 100 - (100.0 * errs / (count)), pred, readerLayer.Labels[0], TimingLayer.GetStats());

                    }, factory);
            }
            Console.WriteLine("errs {0}/{1} accuracy {2:0.000}%", errs, count, 100 - (100.0 * errs / (count)));
            network.DisposeNetwork();
            Console.WriteLine("Max computed value {0} ({1})", RawMatrix.Max, Math.Log(RawMatrix.Max) / Math.Log(2));
        }
    }

Tip – Input format

The ReadLayer accepts inputs in 2 formats. If SparseFormat is set to false then the data is assumed to be tab separated where each line represents a single record. The label is in column LabelColumn which is set by default to column 0. If SparseFormat is set to true then each line in the file is expected to have the format label number-of-features index1:value1 index2:value2 ... indexN: valueN where number of features is the dimension of the input space (including zero features. For each none zero feature in the current record the pair index:value contain the index of the feature (first feature is numbered 0) and value is the assigned value for the feature.

The NormalizationFactor is used to apply normalization to the input data. Each input feature is multiplied by the value specified here. Make sure to use real numbers here since if you declare NormalizationFactor = 1/256 then the compiler will treat 1/256 as integers and compute this value to be zero.

Tip – Scale

The Scale parameter is used to specify the fidelity in which parameters and inputs be used. Formally, any input or weight x is processed with the formula x ← round(x * Scale) / Scale. Therefore, if Scale is set to 10 then the value of x will be rounded to the closest number with only one digit after the decimal point. In the first stage of converting a network to be used with CryptoNets we suggest using large values for Scale to make sure that no accuracy is lost due to rounding errors. Later in the process the value of Scale should be reduced to improve performance when using encryption.

2. Testing the application without encryption

At the current stage you should have an application that compiles. In this stage the goal is to verify the correctness of the application. When the application is running it should process one record at a time and output the expected label and the computed label. It also keeps track of the accuracy so far. This can be compared to the values computed by Keras. There are several techniques that can be used for debugging in case the computed values do not match the expected ones.

Tip – computing activations in hidden layers

When values to not compute correctly it is helpful to verify that values in hidden layers are computed correctly. To compute the output of a certain layer change the line var network = DenseLayer5; in the above application to point to the layer you are interested in seeing it’s output. For example, if you’d like to see the output of the first convolution layer use var network = ConvLayer1;. To see the output itself, you can either use breakpoints after the line var decrypted = m.Decrypt(env); and inspect the variable decrypted, or add the command Utils.Show(m, factory); before or after this line. Utils.Show prints the decrypted values of m to the screen.

Note that the order of tensors in Cryptonet is HWC (Color, Horizontal, Vertical) while Keras, by default, uses CHW. This should be considered when comparing results. The difference in the order of parameters is a common source of problems in converting models from Keras (or other training systems) to CryptoNets.

Tip – injecting data

When debugging, it is sometimes useful handcraft the inputs to a certain layer in order to test the output. This can be achieved in multiple ways. For testing a single layer, it is possible to generate a matrix of the desired size and feed it to the layer. For example, the layer ConvLayer3 expects an input of size 83x14x14. You can generate a matrix of this size using Math.Net, encrypt it, apply the layer and watch the results. The following code demonstrates generating a matrix which is all zeros but the first entry, applying ConvLayer3 to the matix and watching the results:

var mat = Matrix<Double>.Build.Dense(1, 83*14*14);
mat[0,0] = 1;
var enc = factory.GetEncryptedMatrix(mat, EMatrixFormat.ColumnMajor, 1);
var res = ConvLayer3.Apply(enc);
Utils.Show(res, factory);

Another option is to create a class that represents a layer that ignores its inputs and outputs a preassign data. Here is an example for such class. You determine its output by assigning value to its Data property.

    public class DummyLayer : BaseLayer
    {
        public IMatrix Data { get; set; }
        public override IMatrix Apply(IMatrix m)
        {
            return Data;
        }

        public override int OutputDimension()
        {
            return (int)(Data.RowCount * Data.ColumnCount);
        }

        public override double GetOutputScale()
        {
            return Data.Scale;
        }
        public override IMatrix GetNext()
        {
            return Data;
        }
    }

3. Selecting encryption parameters

The encryption scheme requires selecting parameters for the encryption. The main parameters to play with are the plaintext-modulus and the dimension. The parameters should have two main properties: (i) the results of the computation should be correct. (ii) the computation should be efficient. We start be selecting parameters that will provide correct results.

To allow correctness the parameters should support large enough number to be processed. Much like in traditional programming where a program might fail if number are allocated with insufficient space (short integers vs. long integers or floats vs. doubles), the same thing may happen when using homomorphic encryption. The first step is to determine the amount of space needed. When running without encryption (using the RawFactory), CryptoNets keys track of the size of number processes in the line:

          Console.WriteLine("Max computed value {0} ({1})", RawMatrix.Max, Math.Log(RawMatrix.Max) / Math.Log(2));

prints the maximal number used (in absolute value) and the number of bits this number required to encode this number. To determine the number of bits needed, add 1 to this number since an additional bit is required to hold the sign of the number.

To provide the required number of bits, a number of prime numbers are provided such that the product of these numbers as at least the required number of bits. For example, if 70 bits are needed, we can use 2 prime numbers with 35 bits each or 4 prime number with 18 bits each. Working with more prime numbers increases the running time. However, smaller primes allow more computation to be done before the noise budget exceeds.

Noise budget is another important parameter of Homomorphic Encryptions. In a nut-shell, a freshly encrypted number has a certain amount of noise budget. Every operation on such number (addition, multiplication, …) reduces this budget. Once this budget hits zero, the decryption will fail to provide correct results. The amount of noise budget available is determined by several parameters, the most important of them are the dimension used (N) and the size of the prime numbers used as plaintext-modulus. The dimension N should be a power of two, the larger it is, the greater the noise budget is. However, the larger N is, the slower the program runs. Typical values for N range from 2^12 to 2^15. On the other hand, greater noise budget is available when the plaintext modulus are smaller. However, as discussed above, working with smaller plaintext modulus requires using more plaintext modulus to achieve the required number of bits and therefore slows down the application. Selecting a good set of parameters is currently done manually.

After determining the required number of bits, select a value for N and the number of primes to be used. The following script can be used to find proper prime numbers for the encryption. You must supply 3 parameters: bits is the minimal number of bits each prime should have. nDegree is the number of bits in N, and count is the number of primes to generate.

from sympy.ntheory import isprime
import math

bits = 39.8# number of bits requested from each prime
nDegree = 14
count = 2



mod = 2**(nDegree+1) # the value of n needed
if (bits < nDegree + 1):
    bits = nDegree + 1
    
candidate = 1 + 2**math.floor(bits)

skip = math.ceil((2**(bits)-candidate)/mod)

candidate += skip * mod




while (count > 0):
    while (not(isprime(candidate))):
      candidate = candidate + mod
    print(candidate)
    candidate = candidate + mod
    count = count - 1 

For the parameters specified here, this script will return the numbers

 957181001729
 957181034497

With these numbers you can test your program by replacing the line

   var Factory = new RawFactory((ulong)batchSize);  // we start without using encryption

with the line

   var Factory = new EncryptedSealBfvFactory(new ulong[] { 957181001729, 957181034497 }, 16384);

where the number 16384 is N. Since we asked for 2 primes with 39.8 bits each, these parameters can support 79.6 bits.

4. Testing the application with encryption

5. Optimizing parameters

6. Reducing latency