Isaac for Manipulation
Learning Objectives:
- Train a robot arm to grasp objects using RL
- Implement curriculum learning for complex manipulation tasks
- Deploy trained manipulation policies on physical hardware
- Evaluate grasp success rates and generalization
Prerequisites: Chapter 1: Isaac Sim Fundamentals
Estimated Reading Time: 50 minutes
Manipulation as an RL Problem
Robot manipulation — picking up, placing, and rearranging objects — is one of the hardest problems in robotics. RL lets the robot learn manipulation skills from experience rather than hand-coding every motion.
The Grasping Task
A typical grasp task has:
- Observations: joint positions, object pose, fingertip positions
- Actions: joint velocity or position targets (7 DoF arm + gripper)
- Reward: distance to object → contact → lift → hold
class GraspEnvCfg(ManagerBasedRLEnvCfg):
"""Configuration for object grasping task."""
scene = InteractiveSceneCfg(num_envs=2048)
robot = ArticulationCfg(
prim_path="/World/Franka",
spawn=sim_utils.UsdFileCfg(usd_path="franka_panda.usd"),
)
object = RigidObjectCfg(
prim_path="/World/Object",
spawn=sim_utils.UsdFileCfg(usd_path="cube_5cm.usd"),
)
Multi-Phase Reward Design
Grasping requires a phased reward to guide the robot through the task:
def compute_grasp_reward(env):
"""Multi-phase reward for grasping."""
ee_pos = env.end_effector_pos
obj_pos = env.object_pos
# Phase 1: Approach — move hand near the object
approach_dist = torch.norm(ee_pos - obj_pos, dim=-1)
approach_reward = 1.0 - torch.tanh(5.0 * approach_dist)
# Phase 2: Grasp — close fingers when near
is_near = (approach_dist < 0.05).float()
grasp_force = env.gripper_force.squeeze(-1)
grasp_reward = is_near * torch.clamp(grasp_force, 0, 1)
# Phase 3: Lift — raise the object
is_grasped = (grasp_force > 0.1).float()
obj_height = env.object_pos[:, 2]
lift_reward = is_grasped * torch.clamp(obj_height - 0.1, 0, 0.3)
# Phase 4: Hold — maintain grasp for N steps
hold_bonus = is_grasped * (obj_height > 0.2).float() * 5.0
return approach_reward + grasp_reward + lift_reward + hold_bonus
Curriculum Learning
Start with easy tasks and progressively increase difficulty:
class GraspCurriculum:
"""Curriculum for gradually increasing task difficulty."""
def __init__(self):
self.level = 0
self.success_threshold = 0.7
def get_task_params(self):
if self.level == 0:
# Easy: object always at same position, no rotation
return {'pos_range': 0.02, 'rot_range': 0.0}
elif self.level == 1:
# Medium: varying position
return {'pos_range': 0.10, 'rot_range': 0.0}
elif self.level == 2:
# Hard: varying position and rotation
return {'pos_range': 0.15, 'rot_range': 3.14}
else:
# Expert: random objects, cluttered scene
return {'pos_range': 0.20, 'rot_range': 3.14,
'num_distractors': 5}
def update(self, success_rate):
if success_rate > self.success_threshold:
self.level = min(self.level + 1, 3)
Training with PPO
Proximal Policy Optimization (PPO) is the standard algorithm for robot RL:
from isaaclab_rl.rsl_rl import PPOCfg
ppo_cfg = PPOCfg(
num_learning_epochs=5,
num_mini_batches=4,
learning_rate=3e-4,
gamma=0.99,
lam=0.95,
clip_param=0.2,
value_loss_coef=1.0,
entropy_coef=0.01,
)
# Train grasping policy
python -m isaaclab.train \
--task grasp \
--num_envs 2048 \
--max_iterations 5000 \
--algo ppo
Deployment to Physical Hardware
After training in Isaac Sim:
# Export trained policy for deployment
import torch
policy = trained_agent.actor
policy.eval()
# Trace the model for deployment
example_obs = torch.zeros(1, obs_dim)
traced_policy = torch.jit.trace(policy, example_obs)
traced_policy.save("grasp_policy.pt")
On the real robot:
# Load and run the policy
import torch
policy = torch.jit.load("grasp_policy.pt")
while True:
obs = get_robot_observation() # from ROS 2
obs_tensor = torch.tensor(obs).unsqueeze(0)
with torch.no_grad():
action = policy(obs_tensor).squeeze(0).numpy()
send_joint_command(action) # via ROS 2
Exercise: Train a Block Stacking Policy
- Create an Isaac Lab environment with a robot arm and 3 colored blocks
- Design a curriculum: pick 1 block → stack 2 → stack 3
- Train with PPO for 5,000 iterations
- Report grasp success rate and stack completion rate per curriculum level
Summary
- Manipulation is naturally framed as an RL problem with phased rewards
- Multi-phase rewards guide the robot through approach → grasp → lift → hold
- Curriculum learning progressively increases task difficulty
- Trained policies can be exported and deployed on physical robots via ROS 2
Next: Module 4: Vision-Language-Action — unify vision, language, and action.