Learning words by Drawing Images

In this paper, we propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. We use the learned representations of GANs and manipulate them to edit semantic concepts in the generated outputs, and use such GAN-generated images to train a model using a triplet loss.

Download our paper

Dídac Surís*, Adrià Recasens*, David Bau, David Harwath, James Glass and Antonio Torralba
Computer Vision and Pattern Recognition (CVPR), 2019

PDF Bibtex

Acknowledgements

Funding for this research was partially supported by Toyota Research Institute.