[ICLR2026]: Masked Generative Policy for Robotic Control

Lipeng Zhuang* ¹ Shiyu Fan* ¹ Florent P Audonnet ¹ Yingdong Ru ¹ Edmond S. L. Ho ¹ Gerardo Aragon-Camarasa ¹ Paul Henderson ¹

¹University of Glasgow

Paper arXiv Code

[ICLR2026]: Masked Generative Policy for Robotic Control

* Equal contribution

🌟 Overview

Masked Generative Policy (MGP) is a fast, accurate, adaptive, and globally-coherent generative policy for visuomotor imitation learning — combining the efficiency of autoregressive transformers and the flexibility of diffusion models.

MGP is the first masked generative framework for robot imitation learning which achieves low inference latency and high task success rates while supporting rapid plan edits during execution.

Two novel sampling paradigms are proposed:

MGP-Short – Real-time closed-loop control for Markovian tasks.
MGP-Long – Delivers globally coherent long-horizon predictions with dynamic adaptation, resilient execution under partial observability, and efficient, flexible replanning for non-Markovian, long-duration, dynamic, and missing-observation environments.

🧠 Method Summary

Left: Training Stage 1 – Action Tokenizer. Middle: Training Stage 2 – Masked Generative Transformer. Right: Short-horizon sampling (MGP-Short).

Long-horizon sampling (MGP-Long) through Adaptive Token Refinement (ATR).

MGP-Long integrates a novel Adaptive Token Refinement (ATR) strategy that predicts a global trajectory and iteratively refines the yet-to-be-executed action tokens as new observations arrive, while keeping executed actions fixed. During refinement, our Posterior-Confidence Estimation selectively masks and corrects low-likelihood unexecuted tokens, continuously updating confidence scores to guide targeted edits using current observations and historic states.

📊 Standard Benchmarks Evaluation

Evaluated on Meta-World, LIBERO-90, and LIBERO-Long benchmarks (150+ manipulation tasks).

Benchmark	Avg. Success ↑	Speed Gain ↓	Notes
Meta-World (50 tasks)	0.637	49× faster than DP3	SOTA on short-horizon control
LIBERO-90 (90 tasks)	0.889	3.4× faster than QueST	SOTA on Multitask performance
LIBERO-Long (10 tasks)	0.820	3× faster than QueST	SOTA on Long-horizon tasks

🖼️ Robustness To Missing Observations

MGP (MGP-Long) demonstrates strong robustness to missing visual inputs, even when the probability of missing observations rises to 70%. MGP-Long increases the average success rate by 21% over the full-horizon model and by 32% over the short-horizon model.

🔄 Dynamic Environment Results

Basketball

Pick Place Wall–Wall

Pick Place Wall–Target

Push

Push Wall

MGP (MGP-Long) delivers the best performance on five dynamic environments: Basketball, Pick Place Wall–Wall, Pick Place Wall–Target, Push, and Push Wall.

🧩 Non-Markovian Environment Results

Examples of Expert Demonstration

Button Press Color Change Task.

Expert Demonstration 1

Expert Demonstration 2

Visualization of two expert demonstrations used for training. Videos are played at N× real-time speed.

Button Press On/Off Task.

Expert Demonstration 1

Expert Demonstration 2

Visualization of two expert demonstrations used for training. Videos are played at N× real-time speed.

Visualization Results

Button Press Color Change Task.

MGP-Long

Short-horizon Methods

Qualitative results of MGP-Long (Left) and short-horizon methods (Right).

Button Press On/Off Task.

MGP-Long

Short-horizon Methods

Qualitative results of MGP-Long (Left) and short-horizon methods (Right).

MGP-Long is the only method that succeeds on both non-Markovian button tasks, achieving a 100% success rate.

BibTeX


      @article{zhuang2025masked,
          title={Masked Generative Policy for Robotic Control},
          author={Zhuang, Lipeng and Fan, Shiyu and Audonnet, Florent P and Ru, Yingdong and Ho, Edmond SL and Camarasa, Gerardo Aragon and Henderson, Paul},
          journal={arXiv preprint arXiv:2512.09101},
          year={2025}
      }