Simulating Household Activities via Programs

In order to learn to perform complex activities, autonomous agents need to know the sequences of actions needed to reach a given task. Towards this goal, we present VirtualHome, a 3D environment allowing to simulate and generate videos of activities as sequences of actions and interactions.

VirtualHome is built upon 3 main blocks: A Knowledge Base of household tasks, containing instructions of how certain common tasks should be performed. VirtualHome Environment, a 3D simulator that simulates and generates videos of such tasks and the script generation models, which allow to generate programs from descriptions or video demonstrations.

Task Knowledge Base
We collect a Knowledge Base of Daily Indoor Activities. Every activity is given with a description of how to perform it and a script to execute it.
VirtualHome Environment
We present a virtual environment that allows to execute the activity programs, generating long videos of people performing the activities described in them.
Script Generation
We propose models to generate activity programs given descriptions or videos portraying the activity. This allows to learn to perform activities from examples.

Task Knowledge Base

We have collected a knowledge base of activities people do at home. For every activity, we have descriptions of different ways to perform it, and a program that describes how to execute it.

The Knowledge Base Contains
  • +500 activities
  • +2800 programs
  • +300 objects and +2700 interactions
Prepare coffee
Open coffee maker. Put filter in main basket, put ground coffee in filter, fill reservoir with water, turn on coffee maker.

VirtualHome Environment

We present a virtual environment that allows to execute programs of activties, generating videos of people performing activites inside apartments.

The environment has
  • 6 apartments with 4 characters available
  • +350 objects per scene
  • Instance and Semantic Label Annotation, Depth, Pose and Optical Flow
Set up table

Script Generation

We present in this section works using VirtualHome to generate programs from demonstrations.

VirtualHome: Simulating Household Activities via Programs

Presents VirtualHome and proposes a model to predict programs from videos or descriptions. The predicted program is finetuned using RL to be executable in the simulator.

X. Puig*, K. Ra*, M. Boben*, J. Li, T. Wang, S. Fidler, A. Torralba.
Link paper
In Proc. Computer Vision and Pattern Recognition (CVPR), 2018.

Synthesizing Environment-Aware Activities via Activity Sketches

Represents activities via Sketches. Proposes a model to generate programs from sketches such that they are consistent with a target environment.

A. Liao*, X. Puig*, M. Boben, A. Torralba, S.Fidler.
Link paper
In Proc. Computer Vision and Pattern Recognition (CVPR), 2019.



Xavier Puig


Kevin Ra

McGill University

Marko Boben

University of Ljulbjana

Yuan-Hong Liao

University of Toronto

Jiaman Li

University of Toronto

Tingwu Wang

University of Toronto

Sanja Fidler

University of Toronto

Antonio Torralba


MIT University of Toronto
Reach out to for questions, suggestions and feedback