NVAIL Partners present AI Research at ICLR

By Sandra Skaff

This week top AI researchers are gathered in New Orleans, LA, to present their cutting edge research. Our NVAIL partners are among the researchers presenting this work. We highlight here the work of three of these partners, which has been developed with robotics as a target application.


Mila researchers are presenting BabyAI, which is a platform for learning to perform tasks based on language instructions with an expert agent which can intervene to give the learner advice when needed. This work will translate into a future world where robots will live alongside humans, and need to learn from their instructions.

BabyAI comprises 19 levels of tasks, which form a curriculum starting with simple tasks and ending with complex tasks. Curriculum learning is then performed, which means the robot is trained on a simple task initially before moving to a more difficult task. In addition the learner would interact with an expert when needed, based on what task it is attempting to perform. A key aspect of this platform is benchmarking sample complexity, which is one of the biggest problems in training robotics systems today.

The authors built MiniGrid, which is a partially observable 2D gridworld environment for this research. The environment is populated with entities of different colors, such as the agent, balls, boxes, and doors. In this environment objects can be picked up, dropped and moved around by the agent. The authors train an imitation learning and reinforcement learning algorithm for each of the 19 levels using different initialization seeds, and parallelize these trainings across 20-50 GPUs. Each algorithm instance takes about one week to train on one GPU.

The authors show that in curriculum learning, moving from training simple to more complex tasks, is more sample efficient than directly training complex tasks. They also show that pre-training generally tends to help if the complex task is a superset of the base task. The paper also shows that interactive learning, which feeds the learner more data for tasks it performs the worst on, does not necessarily improve learning.

In short, generalization and data efficiency are big challenges in deep reinforcement learning. Hundreds of thousands of demonstrations are needed for training an agent on very simple tasks. BabyAI can be used for studying data efficiency in reinforcement learning.

Three BabyAI levels built using the MiniGrid environment. The red triangle represents
the agent, and the light-grey shaded area represents its field of view (partial observation).

UC Berkeley

UC Berkeley is presenting their work on predicting future video frames. Prediction in itself is a key aspect of intelligent systems, and visual prediction is key to advancing robotic systems. If a robot is able to visualize the result of its actions, it will have better chances of searching for the optimal actions.

The authors introduce the concept of time-agnostic prediction (TAP), which is meant to make predictions over longer horizons. The underlying assumption of this approach is that there are time instances at which it is easier to predict what a system would like. A standard iterative predictor is required to predict each time instance, difficult or easy, which leads to compounding error over time. In contrast, if a predictor is allowed to select at which instances to predict, it can simply skip over the difficult time instances to maintain accurate predictions farther into the future. This is the idea of time-agnostic prediction.

The authors show results both in simulation and in the real world. In simulation, three tasks are considered: object grasping, pick-and-place, and multi-object pushing. They show superior task performance using their approach as compared with direct planning using recursive forward prediction, which is the standard in the literature. For example, the error using their TAP algorithm is as low in 3 object pushing as 2 object pushing using direct planning, showing the gain from correctly decomposing complex tasks into simpler subtasks using TAP.

The authors also test their approach for the multi-object pushing task on a real-world dataset called “BAIR pushing”, which consists of 30-frame clips of random motions of a Sawyer arm tabletop. Qualitatively, the TAP algorithm predicts frames which plausibly lie on the path from start to goal image.

For the experiments, the authors used PyTorch and the MuJoCo simulator, and trained on V100 GPUs. Each training job took 8-10 hours on one GPU. Inference was extremely fast on the GPU, taking only a fraction of a second, on a Titan X or GeForce GTX 1080 Ti.

Forward prediction results on grasping comparing fixed-time predictors and the
approach. Each row is a separate example. First column is the input. Thereafter, each column corresponds to the output of a different model per the column title
Real-world pushing task results

TU Darmstadt

TU Darmstadt is presenting their work on incorporating a physics prior on a generic neural network to learn models for robot control.These learned models are constrained to be physically plausible, and thus achieve better sample complexity and extrapolation to unseen data. As such, the authors derive a network topology called Deep Lagrangian Networks (DeLaN), which encodes the differential equation of Lagrangian Mechanics as the physics prior. DeLaN  is trained by minimizing the error of the differential equation. This approach exploits the underlying structure of physics to learn the internal forces and system energies unsupervised, even though these forces or energies are not directly observable.

The authors demonstrate the applicability and extrapolation of DeLaN in real-time model-based control for a simulated 2-dof robot and the physical 7-dof robot Barrett WAM, where learning is performed online starting from a random initialization. In simulation, two datasets of trajectories were used: single stroke characters and cosine curves in joint space. The performance of DeLaN is evaluated using the tracking error on test trajectories and compared to a normal feed forward network (FF-NN) and an analytic inverse dynamics model.

Simulation results are shown for the two datasets: single stroke characters and cosine curves.  Despite training DeLaN using only the motor torques, DeLaN learns to disambiguate the inertia, Coriolis, centrifugal and gravitational force, which leads to lower tracking error with respect to different training set sizes compared to the FF-NN. On the physical robot, only cosine trajectories are studied since these produce dynamic movements. In this case, DeLaN achieves comparable performance to the analytic model and extrapolates better to higher velocities compared to the FF-NN.

For the experiments, the authors used Python and the Pybullet simulator. They used the DGX Station for offline hyperparameter sweeps, allowing to train different models in parallel. They then used a GeForce GTX 1070 GPU to train DeLaN online by directly using the samples from the robot and starting from a random initialization and without pretraining.