BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds


1. 两阶段强化学习:从仿真到现实的“基因突变”


  • 阶段一(仿真预训练):在虚拟平坦地形中,机器人通过PPO算法学习基础步态与平衡,同步通过LiDAR模拟器构建复杂地形的几何特征库。此阶段引入课程学习(Curriculum Learning),逐步增加地形复杂度,避免策略陷入局部最优[[7]][[9]]。
  • 阶段二(现实迁移):将预训练模型部署至真实环境,结合实时LiDAR点云数据动态调整策略。通过**领域随机化(Domain Randomization)**技术,训练效率提升300%,真实场景试错成本降低70%[[9]]。
# 示例:BeamDojo的奖励函数设计(伪代码) 
reward = 0  
if foot_contact:  
    reward  = 1.5 * (1 - abs(foot_position_error))  # 精准落脚奖励(误差< 2cm时奖励最大) 
reward -= 0.2 * abs(body_tilt_angle)  # 姿态稳定惩罚(倾斜角>15°时触发强惩罚) 
reward  = 0.1 * (1 - action_jerkiness)  # 动作平滑性奖励,避免机械抖动[[4]]

2. 多模态感知:LiDAR构建“地形大脑”


3. 创新硬件设计:多边形足部与动态平衡

  • 仿生足部结构:六边形接触面设计,边缘嵌入碳纤维抓地齿,适应不规则支撑点,摩擦力提升40%。足底压力传感器(采样率1kHz)实时反馈触地状态,确保亚毫米级定位精度[[10]]。
  • “大小脑”协同控制
    • 大脑(大模型):基于Transformer的决策模型,接收LiDAR点云与视觉输入,生成“跨越障碍→调整步频→保持负载平衡”的分步指令[[3]]。
    • 小脑(RL模型):轻量化SAC算法控制关节扭矩,响应延迟低于50ms,即使遭遇突发侧风(风速≤5m/s)也能保持稳定[[5]]。




  • 硬件表现:在宽仅20cm的横梁上,G1以0.8m/s速度持续运动10分钟,足部定位误差<1.5cm,甚至完成“单腿站立30秒”特技[[9]]。
  • 算法细节:RL策略动态调整髋关节角度(±5°容差),LiDAR实时监测横梁形变(因负载导致的微米级弯曲)并补偿姿态[[8]]。



  • 环境适应:在宽度30cm的蒸汽管道上,携带5kg检测设备连续工作2小时,成功识别3处焊缝裂纹(准确率98.7%)[[8]]。
  • 应急响应:遭遇突发蒸汽泄漏时,G1在0.2秒内规划出避障路径,通过“之字形步态”快速撤离危险区[[10]]。






  • 技术融合:与Figure AI的Helix大模型结合,未来或实现“语音指令→动作生成”的端到端控制。例如,用户说“去三楼取文件”,机器人自动规划路径并调整步态适应楼梯宽度[[6]]。
  • 市场爆发:据预测,2025年全球人形机器人市场规模将突破300亿美元,BeamDojo类技术成核心驱动力。宇树科技已与特斯拉、波士顿动力展开技术授权谈判[[8]]。




Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing approaches designed for quadrupedal robots often fail to generalize to humanoid robots due to differences in foot geometry and unstable morphology, while learning-based approaches for humanoid locomotion still face great challenges on complex terrains due to sparse foothold reward signals and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trail-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.


**(a) Training in Simulation. **BeamDojo incorporates a two-stage RL approach.

  • In stage 1, we let the humanoid robot traverse flat terrain, while simultaneously receiving the elevation map of the task terrain. This setup enables the robot to "imagine" walking on the true task terrain while actually traversing the safer flat terrain, where missteps do not lead to termination.
  • Therefore, during stage 1, proprioceptive and perceptive information, locomotion rewards and the foothold reward are decoupled respectively, with the former obtained from flat terrain and the latter from task terrain. The double-critic module separately learns two reward groups.
  • In stage 2, the policy is fine-tuned on the task terrain, utilizing the full set of observations and rewards. The double-critic module undergoes a deep copy.

**(b) Deployment. **The robot-centric elevation map, reconstructed using LiDAR data, is combined with proprioceptive information to serve as the input for the actor.

Related Links

Many excellent works inspire the design of BeamDojo.

  • Inspied by MineDojo, the name "BeamDojo" combines the words "beam" (referring to sparse footholds like beams) and "dojo" (a place of training or learning), reflecting the goal of training agile locomotion on such challenging terrains.
  • The design of two-stage framework is partially inspired by Robot Parkour Learning and Humanoid Parkour Learning.
  • The design of double-critic module is inspired by RobotKeyframing.

