We have proposed an image generation approach using speech emotion recognition. We have used VQGAN and CLIP models in tandem to generate an image from a text prompt from speech and emotions recognized from the speech spectrogram. We see that the achieved accuracy of the emotion recognition model was about 75%.