Overview¶
Learning occurs in two distinct phases: Imitating expert teacher, and learning from the visual policy’s own experiences with teacher supervision.
Phase I: Bootstrapping Off Teacher Unrolls¶
We collect trajectory data from the expert teacher and its intermediate checkpoints. We do not use the action labels from the sub-optimal teachers, and generate the correct action on-line during training.
Phase II: Learing from On-Policy Supervision¶
For each round, we:
Collect 1000 rollouts and save each timestep’s
observation and action data
generated image
These are saved on luma01 server.
Transfer data to workstation from luma01
Train BC on all datasets so far using the previous checkpoint
The resulting checkpoint is used to sample the next round
Running the Experiment¶
We use Task queues to manage the process. Have a look at dagger_runner.py under lucidsim_experiments. It takes care of sending out the
jobs and checks when they are done. Here’s how it interacts with the three nodes you must launch:
The rollout jobs are sent to queue
teacher_queue_nameThese are grabbed by the
flow_teacher_node.pyThese are called “teacher nodes”
Each teacher node completes one rollout, sending RPC calls to
render_node.pyin the weaver module for the generated imagesThe generated image is then warped to provide future frames and provide input to the sampling policy
Once all rollout jobs are done and gathered, the runner sends all data to the host workstation (whatever is running
dagger_runner.py)A training job is uploaded to the trainer node:
trainer.py. This should be run on the same host workstation. Once training is complete, the new checkpoint is used for sampling.
The typical setup is to run dagger_runner.py with the appropriate arguments (specified via sweep file) on the host workstation, along with
trainer.py. The main runner arguments you might change will be the prompt collection and baseline interval (warping interval). See
examples of the prompt collections under lucidsim_experiments.datasets.lucidsim_v1._collections.
The teacher and render nodes are launched on the cluster. You may want to launch slightly more teacher nodes than render nodes, since warping is faster and we want to keep all the renderers busy. You can monitor the speed of the teacher nodes to see whether you should increase the number of render or teacher nodes.
An example launch file can be found in launch_flow_teacher_node.py and, within the weaver module, vision_launch.py