ImGen - Image generation from scene graphs

This is a university project for the subject Deep Learning in Practice with Python and LUA (VITMAV45). Our goal was to recreate and implement the publication of Johnson, Gupta, Fei-Fei (2018). Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. We used the COCO Dataset for training, but due to lack of computation resource, we decided to use a smaller set of objects, and we didn't implement the discriminators.

Authors

Documentation

Documentation of the project.

Hungrian presentation video of the project

Our generated pictures from graphs:

Three people	Two giraffes

Improvement during training

At the beginning without standardization:

After standardization:

After 30 epochs:

Compiling the dataset

We downloaded the COCO train database from these links:
- The images: http://images.cocodataset.org/zips/train2017.zip
- The annotations (relationships, object, masks, etc.): http://images.cocodataset.org/annotations/annotations_trainval2017.zip
From almost 20 GB of images we sorted 10000 pieces for training, validating and testing.
We will use a 10% testing and a 10% validation split.
We cropped and resized the images to 64x64 to reduce computational time.
Fortunately the COCO database is well prepared and we have to sort out a few things only:
- Minimum 1 object has to be on the image
- Maximum 4 objects can be on the image
- If the object mask is too small (its smaller than the original picture times 0.02) then sort It out
The data compiling is in SceneGraphGeneration.ipynb

CocoSceneGraphDataset class

In order to make the loading of the pictures and annotations straightforward, we made an individual class for this purpose.
We transformed this class from the sg2im project that also uses and processes the COCO dataset.

Second milestone

We decided to change up for PyCharm IDE instead of Jupyter, because as the project got bigger we needed better structure and intellisense.

Graph Convolutional Networks (GCN)

To convert graphs into train data we used graph convolutional network
The goal of the this network to convert the graph to a vector, that represents the relation (edges and vertices) of the original scene graph
This network takes a object_vector, predicat_vectors and edges
We use Embedding network to convert objects and predicates into vectors(dim is 64)
The edges are vectors that represent the relationship
This model returns with object and predicate that contains the knowledge of their own relations

Box and Masknets

These two nets are trained at the same time
Both networks are using object_vectors (the output of the GCN) for training
Boxnet is an MLP (Sequential model of linear, batchnorm1D and dropout layers usimg Relu Activation)
Masknet is a CNN (Sequential model of UpSample, BatchNorm2D, Conv2D, and ReLU layers)
Outputs are box_predicates and mask_predicates

Cascaded Refinement Network

After generating the scene layout, we synthesize an image that respects the object positions given in the layout.
A Cascaded Refinement Network consists of a series of convolutional refinement modules.
The first module takes Gaussian noise as input, and the output from the last module is passed to two final convolution layers to produce the output image.

Training our model

We trained the model on NVIDIA 1060 GPU with cuda for a whole night but due to lack of computational resource, we couldn't build an accurate model
The imporvement was visible, but there is plenty of room for improvement with parameter and code optimalization

Run the model

In order to generate images from scene graphs, follow the next steps:

download our pretrained model weights from : https://www.dropbox.com/s/zihnz2cuj009uuy/model.pt?dl=0
instert the model.pt file into the NagyHF folder
install all the required libraries
in order to write own scene graphs use the scene_graphs/inputSceneGraphs.json file and follow the structure
we only trained for person and giraffe objects
run the run.py program
the generated images are visible in the generatedImages folder

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
NagyHF		NagyHF
.gitignore		.gitignore
README.md		README.md
SceneGraphGeneration.ipynb		SceneGraphGeneration.ipynb
document.pdf		document.pdf
imagelist.json		imagelist.json
load_data.ipynb		load_data.ipynb
masks.json		masks.json
scenegraphs.json		scenegraphs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImGen - Image generation from scene graphs

Authors

Documentation

Our generated pictures from graphs:

Improvement during training

Compiling the dataset

CocoSceneGraphDataset class

Second milestone

Graph Convolutional Networks (GCN)

Box and Masknets

Cascaded Refinement Network

Training our model

Run the model

About

Releases

Packages

Contributors 3

Languages

5aola/ImGen

Folders and files

Latest commit

History

Repository files navigation

ImGen - Image generation from scene graphs

Authors

Documentation

Our generated pictures from graphs:

Improvement during training

Compiling the dataset

CocoSceneGraphDataset class

Second milestone

Graph Convolutional Networks (GCN)

Box and Masknets

Cascaded Refinement Network

Training our model

Run the model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages