Masked Generative Policy (MGP) is a fast, accurate, adaptive, and globally-coherent generative policy for visuomotor imitation learning — combining the efficiency of autoregressive transformers and the flexibility of diffusion models.
MGP is the first masked generative framework for robot imitation learning which achieves low inference latency and high task success rates while supporting rapid plan edits during execution.
Two novel sampling paradigms are proposed:
Left: Training Stage 1 – Action Tokenizer. Middle: Training Stage 2 – Masked Generative Transformer. Right: Short-horizon sampling (MGP-Short).
Long-horizon sampling (MGP-Long) through Adaptive Token Refinement (ATR).
MGP-Long integrates a novel Adaptive Token Refinement (ATR) strategy that predicts a global trajectory and iteratively refines the yet-to-be-executed action tokens as new observations arrive, while keeping executed actions fixed. During refinement, our Posterior-Confidence Estimation selectively masks and corrects low-likelihood unexecuted tokens, continuously updating confidence scores to guide targeted edits using current observations and historic states.
Evaluated on Meta-World, LIBERO-90, and LIBERO-Long benchmarks (150+ manipulation tasks).
| Benchmark | Avg. Success ↑ | Speed Gain ↓ | Notes |
|---|---|---|---|
| Meta-World (50 tasks) | 0.637 | 49Ă— faster than DP3 | SOTA on short-horizon control |
| LIBERO-90 (90 tasks) | 0.889 | 3.4Ă— faster than QueST | SOTA on Multitask performance |
| LIBERO-Long (10 tasks) | 0.820 | 3Ă— faster than QueST | SOTA on Long-horizon tasks |
MGP (MGP-Long) demonstrates strong robustness to missing visual inputs, even when the probability of missing observations rises to 70%. MGP-Long increases the average success rate by 21% over the full-horizon model and by 32% over the short-horizon model.
Basketball
Pick Place Wall–Wall
Pick Place Wall–Target
Push
Push Wall
MGP (MGP-Long) delivers the best performance on five dynamic environments: Basketball, Pick Place Wall–Wall, Pick Place Wall–Target, Push, and Push Wall.
Button Press Color Change Task.
Expert Demonstration 1
Expert Demonstration 2
Visualization of two expert demonstrations used for training. Videos are played at NĂ— real-time speed.
Button Press On/Off Task.
Expert Demonstration 1
Expert Demonstration 2
Visualization of two expert demonstrations used for training. Videos are played at NĂ— real-time speed.
Button Press Color Change Task.
MGP-Long
Short-horizon Methods
Qualitative results of MGP-Long (Left) and short-horizon methods (Right).
Button Press On/Off Task.
MGP-Long
Short-horizon Methods
Qualitative results of MGP-Long (Left) and short-horizon methods (Right).
MGP-Long is the only method that succeeds on both non-Markovian button tasks, achieving a 100% success rate.
@article{zhuang2025masked,
title={Masked Generative Policy for Robotic Control},
author={Zhuang, Lipeng and Fan, Shiyu and Audonnet, Florent P and Ru, Yingdong and Ho, Edmond SL and Camarasa, Gerardo Aragon and Henderson, Paul},
journal={arXiv preprint arXiv:2512.09101},
year={2025}
}