Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation Input-Feature Unit-Output Correlation Maps (Envelope Activation Correlation) #2

Open
robintibor opened this issue Jun 24, 2021 · 4 comments

Comments

@robintibor
Copy link
Owner

robintibor commented Jun 24, 2021

Filter to frequency bands:

for low_cut_hz, high_cut_hz in filterbands:
log.info("Compute filterband from {:.1f} to {:.1f}...".format(
low_cut_hz, high_cut_hz))
if low_cut_hz > 0 and high_cut_hz < 125:
filtered = bandpass_topo(train_topo, low_cut_hz,
high_cut_hz, sampling_rate=250.0, axis=0, filt_order=4)
elif low_cut_hz == 0:
filtered = lowpass_topo(train_topo, high_cut_hz,
sampling_rate=250.0, axis=0, filt_order=4)
else:

Compute envelope

(absolute of hilbert transform):

env = np.abs(scipy.signal.hilbert(batches_topo, axis=2))

Square Envelope

(square_before_mean was True in our setting)
[Envelope was saved to a file and reloaded]

if square_before_mean:
env = np.square(env)

⚠️ Note there is possibly one mistake/discrepancy in the paper: We square before averaging (next step), not after ⚠️

Compute Moving Average of the envelope within the receptive field of the corresponding layer

Basic steps:

  1. Determine receptive field size of the layer
    def transform_to_meaned_trial_env(env, model, i_layer, train_set,
    n_inputs_per_trial):
    all_layers = lasagne.layers.get_all_layers(model)
    layer = all_layers[i_layer]
    field_size = get_receptive_field_size(layer)
  2. Average the envelopes within the receptive field using pooling
    meaned_env = get_meaned_batch_env(env, field_size)
    pooled = downsample.max_pool_2d(inputs, ds=(field_size, 1), st=(1, 1), ignore_border=True, mode='average_exc_pad')
  3. Aggregate per-trial envelopes from the per-batch envelopes
    fb_envs_per_trial = fb_env.reshape(n_trials,n_inputs_per_trial,
    fb_env.shape[1], fb_env.shape[2], fb_env.shape[3])
    trial_env = transform_to_trial_acts(fb_envs_per_trial,
    [n_inputs_per_trial] * n_trials,
    n_sample_preds=n_sample_preds,
    n_trial_len=n_trial_len)

Compute Correlation with Activations

For trained model

topo_corrs = compute_trial_topo_corrs(model, i_layer, train_set,
exp.iterator, trial_env, split_per_class=True)

and random model
rand_topo_corrs = compute_trial_topo_corrs(rand_model, i_layer, train_set,
exp.iterator, trial_env, split_per_class=True)

Compute Activations

trial_acts = compute_trial_acts(model, i_layer, iterator, train_set)

So Compute per-batch activations and then aggregate to per-trial activations in
# compute number of inputs per trial
i_trial_starts, i_trial_ends = compute_trial_start_end_samples(
train_set.y, check_trial_lengths_equal=True,
input_time_length=iterator.input_time_length)
# +1 since trial ends is inclusive
n_trial_len = i_trial_ends[0] - i_trial_starts[0] + 1
n_inputs_per_trial = int(np.ceil(n_trial_len / float(iterator.n_sample_preds)))
log.info("Create theano function...")
all_layers = lasagne.layers.get_all_layers(model)
all_out_fn = create_pred_fn(all_layers[i_layer])
assert(iterator.input_time_length == get_input_time_length(model))
assert(iterator.n_sample_preds == get_n_sample_preds(model))
log.info("Compute activations...")
all_outs_per_batch = [all_out_fn(batch[0])
for batch in iterator.get_batches(train_set, False)]
batch_sizes = [len(batch[0]) for batch in iterator.get_batches(train_set, False)]
all_outs_per_batch = np.array(all_outs_per_batch)
n_trials = len(i_trial_starts)
log.info("Transform to trial activations...")
trial_acts = get_trial_acts(all_outs_per_batch,
batch_sizes, n_trials=n_trials,
n_inputs_per_trial=n_inputs_per_trial,
n_trial_len=n_trial_len,
n_sample_preds=iterator.n_sample_preds)
log.info("Done.")

Compute Correlation Envelope and Activations

topo_corrs = compute_topo_corrs(trial_env, trial_acts)

flat_trial_env = trial_env.transpose(2,0,1,3).reshape(
trial_env.shape[0] * trial_env.shape[2],
trial_env.shape[1] * trial_env.shape[3])
flat_trial_acts = trial_acts.transpose(1,0,2).reshape(
trial_acts.shape[1],-1)
#flat_corrs = np.corrcoef(flat_trial_env, flat_trial_acts)
#relevant_corrs = flat_corrs[:flat_trial_env.shape[0],
# flat_trial_env.shape[0]:]
relevant_corrs = corr(flat_trial_env, flat_trial_acts)
topo_corrs = relevant_corrs.reshape(trial_env.shape[2], trial_env.shape[0],
trial_acts.shape[1])
return topo_corrs

In the end these correlations for trained and untrained model will be saved:

np.save('{:s}.labelsplitted.env_corrs.{:s}'.format(base_name, file_name_end), topo_corrs)
np.save('{:s}.labelsplitted.env_rand_corrs.{:s}'.format(base_name, file_name_end), rand_topo_corrs)

Now when you have these correlations for trained and untrained model you can average across units in a layer and then compute the difference of them (difference between trained and untrained model correlations). This is Figure 15 in https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730

As a comparison we also compute the correlations of the envelope with the class labels (no network involved!). This is in the rightmost plots in Figure 15, or class-resolved/per class in Figure 14.

@robintibor robintibor changed the title Explanation Amplitude Correlation Explanation Input-Feature Unit-Output Correlation Maps (Envelope Activation Correlation) Jun 24, 2021
@JulianWisser
Copy link

How to interpret and calculate the activations? Different than layer output?

I couldn't follow your calculation of the activations to 100%. As I understood from the paper the activations of one layer are the unit outputs. So the dimension of the activations/unit outputs is the same as the output shape of that layer!?

To calculate the correlation between the mean squared envelopes and the activations, their dimensions have to be the same. And you said that they are exactly the same since the envelopes are calculated with the receptive field size of the corresponding layer.

Your deep model contains the following layers with output shapes:

    Layer (type)               Output Shape         Param #

    Expression-1          [-1, 1, 1000, 22]               0
        Conv2d-2          [-1, 25, 991, 22]             275
        Conv2d-3           [-1, 25, 991, 1]          13,750
   BatchNorm2d-4           [-1, 25, 991, 1]              50
    Expression-5           [-1, 25, 991, 1]               0
     MaxPool2d-6           [-1, 25, 330, 1]               0
    Expression-7           [-1, 25, 330, 1]               0
       Dropout-8           [-1, 25, 330, 1]               0
        Conv2d-9           [-1, 50, 321, 1]          12,500
  BatchNorm2d-10           [-1, 50, 321, 1]             100
   Expression-11           [-1, 50, 321, 1]               0
    MaxPool2d-12           [-1, 50, 107, 1]               0
   Expression-13           [-1, 50, 107, 1]               0
      Dropout-14           [-1, 50, 107, 1]               0
       Conv2d-15           [-1, 100, 98, 1]          50,000
  BatchNorm2d-16           [-1, 100, 98, 1]             200
   Expression-17           [-1, 100, 98, 1]               0
    MaxPool2d-18           [-1, 100, 32, 1]               0
   Expression-19           [-1, 100, 32, 1]               0
      Dropout-20           [-1, 100, 32, 1]               0
       Conv2d-21           [-1, 200, 23, 1]         200,000
  BatchNorm2d-22           [-1, 200, 23, 1]             400
   Expression-23           [-1, 200, 23, 1]               0
    MaxPool2d-24            [-1, 200, 7, 1]               0
   Expression-25            [-1, 200, 7, 1]               0
       Conv2d-26              [-1, 4, 1, 1]           5,604
   LogSoftmax-27              [-1, 4, 1, 1]               0
   Expression-28                    [-1, 4]               0

I implemented a keras version of the model with exactly the same dimensions. Only using channels last and average pooling.

       Layer (type)                 Output Shape              Param #

       conv2d (Conv2D)              (None, 22, 991, 25)       275
       conv2d_1 (Conv2D)            (None, 1, 991, 25)        13775
       average_pooling2d (AveragePo (None, 1, 330, 25)        0
       dropout (Dropout)            (None, 1, 330, 25)        0
       conv2d_2 (Conv2D)            (None, 1, 321, 50)        12550
       batch_normalization (BatchNo (None, 1, 321, 50)        200
       average_pooling2d_1 (Average (None, 1, 107, 50)        0
       dropout_1 (Dropout)          (None, 1, 107, 50)        0
       conv2d_3 (Conv2D)            (None, 1, 98, 100)        50100
       batch_normalization_1 (Batch (None, 1, 98, 100)        400
       average_pooling2d_2 (Average (None, 1, 32, 100)        0
       dropout_2 (Dropout)          (None, 1, 32, 100)        0
       conv2d_4 (Conv2D)            (None, 1, 23, 200)        200200
       batch_normalization_2 (Batch (None, 1, 23, 200)        800
       average_pooling2d_3 (Average (None, 1, 7, 200)         0
       dropout_3 (Dropout)          (None, 1, 7, 200)         0
       conv2d_5 (Conv2D)            (None, 1, 1, 4)           5604
       flatten (Flatten)            (None, 4)                 0
       dense (Dense)                (None, 4)                 20

In keras I calculate the unit outputs for the pooling layers and the last convolution layer with the following code (these are the ends of the different blocks you mentioned in the paper):

        out_tensor_1 = denseConvNet.get_layer(index = 2).output         
        out_tensor_2 = denseConvNet.get_layer(index = 6).output
        out_tensor_3 =denseConvNet.get_layer(index = 10).output
        out_tensor_4 = denseConvNet.get_layer(index = 14).output
        out_tensor_pred = denseConvNet.get_layer(index = -3).output        


        trained = []
        earlyPredictor_1 = tf.keras.Model(denseConvNet.input , out_tensor_1)
        trained.append(earlyPredictor_1.predict(data_test))
        earlyPredictor_2 = tf.keras.Model( denseConvNet.input  , out_tensor_2)
        trained.append(earlyPredictor_2.predict(data_test))
        earlyPredictor_3 = tf.keras.Model( denseConvNet.input  , out_tensor_3)
        trained.append(earlyPredictor_3.predict(data_test))
        earlyPredictor_4 = tf.keras.Model( denseConvNet.input  , out_tensor_4)
        trained.append(earlyPredictor_4.predict(data_test))
        earlyPredictor_pred = tf.keras.Model( denseConvNet.input , out_tensor_pred)
        trained.append(earlyPredictor_pred.predict(data_test))

Due to that my unit outputs have the same dimensions as the output shape as the layer (channels last):

        Block 1                   (24, 1, 330, 25)
        Block 2                   (24, 1, 107, 50)
        Block 3                   (24, 1, 32, 100)
        Block 4                   (24, 1, 7, 200)
        Prediction Layer          (24, 1, 1, 4)

The receptive field size and the resulting envelope shapes of the corresponding layers are as follow:

        Receptive field size of layer 2/Block 1: 10 
        Mean squared envelope shape of layer 2/Block 1: (24, 22, 991, 1)

        Receptive field size of layer 6/Block 2: 39 
        Mean squared envelope shape of layer 6/Block 2:(24, 22, 962, 1)

        Receptive field size of layer 10/Block 3:126 
        Mean squared envelope shape of layer 10/Block 3:(24, 22, 875, 1)

        Receptive field size of layer 14/Block 4: 387 
        Mean squared envelope shape of layer 14/Block 4: (24, 22, 614, 1)

        Receptive field size of pred layer: 927 
        Mean squared envelope shape of pred layer: (24, 22, 74, 1)

Can you tell me how to understand the activations if they are not (simply) the output of one layer and how to compute them?

@robintibor
Copy link
Owner Author

Hi,
the main point that one needs to take into account is cropped decoding. See Figure 4 in https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730 or the explanation at https://braindecode.org/auto_examples/plot_bcic_iv_2a_moabb_cropped.html

For cropped decoding without padding, the model will get as many outputs as it has inputs - receptive field size + 1.
Therefore one can use average pooling on the envelope with pooling size same as receptive field size and then get exactly matching outputs from the pooling and the output unit in the temporal dimension. And the average pooling will pool over exactly the same inputs that the output unit used to compute its output.

Concretely to have cropped decoding, one must appropriately replace the max pooling stride by dilations in the following layers.

Code from current braindecode does that here:

https://github.com/braindecode/braindecode/blob/6fe57295e5a1372dfd0abe77c494d1352bd647eb/braindecode/models/util.py#L9-L52

(it was manually done in this old repo, potentially more difficult to understand).

That may clear things up?

@JulianWisser
Copy link

Hi,
thanks for the clarification. That helped a lot to understand the problem since by now I only worked with trial wise training.
You said one only must "replace the max pooling stride by dilations in the following layers" like it is done in the mentioned function.
So is there no need to transform the data that is used as an input?

@robintibor
Copy link
Owner Author

No there is no need to transform the input. In our paper we actually used [-500,+4000] ms for both trialwise and cropped decoding, which is 1125 timesteps @ 250 Hz. [0,4000] should give similar, if slightly worse results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants