Abstract

Long-horizon manipulation tasks represent a significant challenge in robotics, demanding both strategic, high-level reasoning and fast, precise, low-level control. While recent advances in generative models have shown promise in generating behavior plans for long-horizon tasks, they often lack a principled framework for hierarchical decomposition and struggle with the computational demands of real-time execution, due to their iterative denoising process. In this work, we introduce Hierarchical Diffusion-Flow (HDFlow), a novel hierarchical planning framework that optimally leverages the strengths of diffusion and rectified flow models. HDFlow employs a high-level diffusion planner to generate sequences of strategic subgoals in a learned latent space, capitalizing on diffusion's powerful exploratory capabilities. These subgoals then guide a low-level rectified flow planner that generates smooth and dense trajectories, exploiting the speed and efficiency of ordinary differential equation (ODE)-based trajectory generation. This hybrid approach synergistically combines the strengths of both models to overcome the limitations of single-paradigm generative planners, enabling robust and efficient long-horizon planning. We evaluate HDFlow on four challenging furniture assembly tasks in both simulation and real-world, where it significantly outperforms state-of-the-art methods.

Key Insight

The iterative denoising process of diffusion models is computationally expensive, making them ill-suited for the fast, low-level control required for real-time robotic interaction. Applying diffusion models naively at all levels of a hierarchy inherits this critical drawback, creating a bottleneck at the trajectory generation stage. This raises a fundamental question: Is a single generative modeling paradigm optimal for all levels of a planning hierarchy?

teaser

We empirically show that the answer is no. The requirements for high-level strategic planning are fundamentally different from those of low-level trajectory generation. High-level planning demands exploration and multi-modal diversity to discover viable sequences of subgoals. In contrast, low-level planning demands speed, precision, and deterministic execution to translate a chosen subgoal into a smooth, dense trajectory.

HDFlow

In this work, we introduce Hierarchical Diffusion-Flow (HDFlow), a novel hierarchical planning framework that optimally leverages the strengths of diffusion and rectified flow models. Our framework consists of two main stages: World Model Learning (left), where observations are encoded into a structured latent space, and Hierarchical Planner Training (right). The latter involves a High-Level diffusion planner generating sparse strategic subgoals with EBM guidance, and a Low-Level rectified flow planner synthesizing dense trajectories between subgoals using an ODE solver.

pipeline

Simulation Results

One Leg Low

One Leg Med

One Leg High

Lamp Low

Lamp Med

Lamp High

Round Low

Round Med

Round High

Cabinet Low

Cabinet Med

Real-world Results

One Leg Low

One Leg Med

Lamp Low

Lamp Med

Round Low

Round Med