Skip to content

5aola/ImGen

Repository files navigation

ImGen - Image generation from scene graphs

This is a university project for the subject Deep Learning in Practice with Python and LUA (VITMAV45). Our goal was to recreate and implement the publication of Johnson, Gupta, Fei-Fei (2018). Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. We used the COCO Dataset for training, but due to lack of computation resource, we decided to use a smaller set of objects, and we didn't implement the discriminators.

Authors

Documentation

Documentation of the project.

Hungrian presentation video of the project

Our generated pictures from graphs:

Three people Two giraffes
image image

Improvement during training

At the beginning without standardization:

image

After standardization:

image

image

After 30 epochs:

image

Compiling the dataset

  • We downloaded the COCO train database from these links:
  • From almost 20 GB of images we sorted 10000 pieces for training, validating and testing.
  • We will use a 10% testing and a 10% validation split.
  • We cropped and resized the images to 64x64 to reduce computational time.
  • Fortunately the COCO database is well prepared and we have to sort out a few things only:
    • Minimum 1 object has to be on the image
    • Maximum 4 objects can be on the image
    • If the object mask is too small (its smaller than the original picture times 0.02) then sort It out
  • The data compiling is in SceneGraphGeneration.ipynb

CocoSceneGraphDataset class

  • In order to make the loading of the pictures and annotations straightforward, we made an individual class for this purpose.
  • We transformed this class from the sg2im project that also uses and processes the COCO dataset.

Second milestone

We decided to change up for PyCharm IDE instead of Jupyter, because as the project got bigger we needed better structure and intellisense.

Graph Convolutional Networks (GCN)

  • To convert graphs into train data we used graph convolutional network
  • The goal of the this network to convert the graph to a vector, that represents the relation (edges and vertices) of the original scene graph
  • This network takes a object_vector, predicat_vectors and edges
  • We use Embedding network to convert objects and predicates into vectors(dim is 64)
  • The edges are vectors that represent the relationship
  • This model returns with object and predicate that contains the knowledge of their own relations

Box and Masknets

  • These two nets are trained at the same time
  • Both networks are using object_vectors (the output of the GCN) for training
  • Boxnet is an MLP (Sequential model of linear, batchnorm1D and dropout layers usimg Relu Activation)
  • Masknet is a CNN (Sequential model of UpSample, BatchNorm2D, Conv2D, and ReLU layers)
  • Outputs are box_predicates and mask_predicates

Cascaded Refinement Network

  • After generating the scene layout, we synthesize an image that respects the object positions given in the layout.
  • A Cascaded Refinement Network consists of a series of convolutional refinement modules.
  • The first module takes Gaussian noise as input, and the output from the last module is passed to two final convolution layers to produce the output image.

Training our model

  • We trained the model on NVIDIA 1060 GPU with cuda for a whole night but due to lack of computational resource, we couldn't build an accurate model
  • The imporvement was visible, but there is plenty of room for improvement with parameter and code optimalization

Run the model

In order to generate images from scene graphs, follow the next steps:

  • download our pretrained model weights from : https://www.dropbox.com/s/zihnz2cuj009uuy/model.pt?dl=0
  • instert the model.pt file into the NagyHF folder
  • install all the required libraries
  • in order to write own scene graphs use the scene_graphs/inputSceneGraphs.json file and follow the structure
  • we only trained for person and giraffe objects
  • run the run.py program
  • the generated images are visible in the generatedImages folder

About

Image generation from scene graphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •