Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoder , decoder framework for image reconstruction using event cameras #3

Open
ChidanandKumarKS opened this issue Feb 21, 2022 · 2 comments

Comments

@ChidanandKumarKS
Copy link

i tried to use stereospike which is using your repo to do image reconstruction from event camera rather than optical flow.
But iam unable to get reconstruct image using event camera as input.

Requesting you to help me in this regard.

Below is the encoder and decoder written with spikingjelly repo: Iam getting bad image reconstruction with no texture

fangwei123456 directed at you to help in this regard

class StereoSpike(NeuromorphicNet):
"""
Baseline model, with which we report state-of-the-art performances in the second version of our paper.

- all neuron potentials must be reset at each timestep
- predict_depth layers do have biases, but it is equivalent to remove them and reset output I-neurons to the sum
       of all 4 biases, instead of 0.
"""
def __init__(self, surrogate_function=surrogate.ATan(), detach_reset=True, v_threshold=1.0, v_reset=0.0, multiply_factor=1.):
    super().__init__(surrogate_function=surrogate_function, detach_reset=detach_reset)

    # bottom layer, preprocessing the input spike frame without downsampling
    self.bottom = nn.Sequential(
        nn.Conv2d(in_channels=5, out_channels=32, kernel_size=5, stride=1, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )

    # encoder layers (downsampling)
    self.conv1 = nn.Sequential(
        nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.conv2 = nn.Sequential(
        nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.conv3 = nn.Sequential(
        nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.conv4 = nn.Sequential(
        nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )

    # residual layers
    self.bottleneck = nn.Sequential(
        SEWResBlock(512, v_threshold=self.v_th, v_reset=self.v_rst, connect_function='ADD', multiply_factor=multiply_factor),
        SEWResBlock(512, v_threshold=self.v_th, v_reset=self.v_rst, connect_function='ADD', multiply_factor=multiply_factor),
    )

    # decoder layers (upsampling)
    self.deconv4 = nn.Sequential(
        NNConvUpsampling2(in_channels=512, out_channels=256, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.deconv3 = nn.Sequential(
        NNConvUpsampling2(in_channels=256, out_channels=128, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.deconv2 = nn.Sequential(
        NNConvUpsampling2(in_channels=128, out_channels=64, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.deconv1 = nn.Sequential(
        NNConvUpsampling2(in_channels=64, out_channels=32, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )

    # these layers output depth maps at different scales, where depth is represented by the potential of IF neurons
    # that do not fire ("I-neurons"), i.e., with an infinite threshold.
    self.predict_depth1 = nn.Sequential(
        NNConvUpsampling2(in_channels=32, out_channels=1, kernel_size=3, scale_factor=1, bias=True),
        MultiplyBy(multiply_factor),
    )

    self.Ineurons = neuron.IFNode(v_threshold=float('inf'), v_reset=0.0, surrogate_function=self.surrogate_fct)
    self.sigmoid = nn.Sigmoid()
    self.num_encoders = 4

def forward(self, x,pred):

    # x must be of shape [batch_size, num_frames_per_depth_map, 4 (2 cameras - 2 polarities), W, H]
    frame = x

    # data is fed in through the bottom layer
    out_bottom = self.bottom(frame)

    # pass through encoder layers
    out_conv1 = self.conv1(out_bottom)
    out_conv2 = self.conv2(out_conv1)
    out_conv3 = self.conv3(out_conv2)
    out_conv4 = self.conv4(out_conv3)

    # pass through residual blocks
    out_rconv = self.bottleneck(out_conv4)

    # gradually upsample while concatenating and passing through skip connections
    out_deconv4 = self.deconv4(out_rconv)
    out_add4 = out_deconv4 + out_conv3
    # self.Ineurons(self.predict_depth4(out_add4))

    out_deconv3 = self.deconv3(out_add4)
    out_add3 = out_deconv3 + out_conv2
    # self.Ineurons(self.predict_depth3(out_add3))

    out_deconv2 = self.deconv2(out_add3)
    out_add2 = out_deconv2 + out_conv1
    # self.Ineurons(self.predict_depth2(out_add2))

    out_deconv1 = self.deconv1(out_add2)
    out_add1 = out_deconv1 + out_bottom
    self.Ineurons(self.predict_depth1(out_add1))
    img = self.sigmoid(self.Ineurons.v)

    return {'image': img}

def set_init_depths_

potentials(self, depth_prior):
self.Ineurons.v = depth_prior

Results

@urancon
Copy link
Owner

urancon commented Feb 25, 2022

Hello @ChidanandKumarKS and thank you for your interest in our work !

Just like you, I would also think intuitively that StereoSpike architecture would fit for image reconstruction. However, there are many places where you might have done a "mistake" and I don't have enough information. Can you tell me more about your approach ?
For instance, have you double-checked your loss, and if yes, is it suitable for grayscale image reconstruction ? Are you using the same dataloading than on this repo ? Do you use a specific format for the input data ? (I see that your 'bottom' layer takes 5 channels for its input) Does your loss decreases and on how many epochs has your model been trained ?

Concerning the architecture, I don't see anything wrong in particular with what you've shared. The only thing that I would point out is that you're not using StereoSpike's intermediary predictions, but it's just a note in case it is not intentional from you.

Sorry to have answered you a bit late, but I hope these answers can help you ! Don't hesitate to ask if you have more questions !

@fangwei123456
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants