Skip to main content

Isaac for Manipulation

Learning Objectives:

  • Train a robot arm to grasp objects using RL
  • Implement curriculum learning for complex manipulation tasks
  • Deploy trained manipulation policies on physical hardware
  • Evaluate grasp success rates and generalization

Prerequisites: Chapter 1: Isaac Sim Fundamentals

Estimated Reading Time: 50 minutes


Manipulation as an RL Problem

Robot manipulation — picking up, placing, and rearranging objects — is one of the hardest problems in robotics. RL lets the robot learn manipulation skills from experience rather than hand-coding every motion.

The Grasping Task

A typical grasp task has:

  • Observations: joint positions, object pose, fingertip positions
  • Actions: joint velocity or position targets (7 DoF arm + gripper)
  • Reward: distance to object → contact → lift → hold
class GraspEnvCfg(ManagerBasedRLEnvCfg):
"""Configuration for object grasping task."""

scene = InteractiveSceneCfg(num_envs=2048)

robot = ArticulationCfg(
prim_path="/World/Franka",
spawn=sim_utils.UsdFileCfg(usd_path="franka_panda.usd"),
)

object = RigidObjectCfg(
prim_path="/World/Object",
spawn=sim_utils.UsdFileCfg(usd_path="cube_5cm.usd"),
)

Multi-Phase Reward Design

Grasping requires a phased reward to guide the robot through the task:

def compute_grasp_reward(env):
"""Multi-phase reward for grasping."""
ee_pos = env.end_effector_pos
obj_pos = env.object_pos

# Phase 1: Approach — move hand near the object
approach_dist = torch.norm(ee_pos - obj_pos, dim=-1)
approach_reward = 1.0 - torch.tanh(5.0 * approach_dist)

# Phase 2: Grasp — close fingers when near
is_near = (approach_dist < 0.05).float()
grasp_force = env.gripper_force.squeeze(-1)
grasp_reward = is_near * torch.clamp(grasp_force, 0, 1)

# Phase 3: Lift — raise the object
is_grasped = (grasp_force > 0.1).float()
obj_height = env.object_pos[:, 2]
lift_reward = is_grasped * torch.clamp(obj_height - 0.1, 0, 0.3)

# Phase 4: Hold — maintain grasp for N steps
hold_bonus = is_grasped * (obj_height > 0.2).float() * 5.0

return approach_reward + grasp_reward + lift_reward + hold_bonus

Curriculum Learning

Start with easy tasks and progressively increase difficulty:

class GraspCurriculum:
"""Curriculum for gradually increasing task difficulty."""

def __init__(self):
self.level = 0
self.success_threshold = 0.7

def get_task_params(self):
if self.level == 0:
# Easy: object always at same position, no rotation
return {'pos_range': 0.02, 'rot_range': 0.0}
elif self.level == 1:
# Medium: varying position
return {'pos_range': 0.10, 'rot_range': 0.0}
elif self.level == 2:
# Hard: varying position and rotation
return {'pos_range': 0.15, 'rot_range': 3.14}
else:
# Expert: random objects, cluttered scene
return {'pos_range': 0.20, 'rot_range': 3.14,
'num_distractors': 5}

def update(self, success_rate):
if success_rate > self.success_threshold:
self.level = min(self.level + 1, 3)

Training with PPO

Proximal Policy Optimization (PPO) is the standard algorithm for robot RL:

from isaaclab_rl.rsl_rl import PPOCfg

ppo_cfg = PPOCfg(
num_learning_epochs=5,
num_mini_batches=4,
learning_rate=3e-4,
gamma=0.99,
lam=0.95,
clip_param=0.2,
value_loss_coef=1.0,
entropy_coef=0.01,
)
# Train grasping policy
python -m isaaclab.train \
--task grasp \
--num_envs 2048 \
--max_iterations 5000 \
--algo ppo

Deployment to Physical Hardware

After training in Isaac Sim:

# Export trained policy for deployment
import torch

policy = trained_agent.actor
policy.eval()

# Trace the model for deployment
example_obs = torch.zeros(1, obs_dim)
traced_policy = torch.jit.trace(policy, example_obs)
traced_policy.save("grasp_policy.pt")

On the real robot:

# Load and run the policy
import torch

policy = torch.jit.load("grasp_policy.pt")

while True:
obs = get_robot_observation() # from ROS 2
obs_tensor = torch.tensor(obs).unsqueeze(0)
with torch.no_grad():
action = policy(obs_tensor).squeeze(0).numpy()
send_joint_command(action) # via ROS 2

Exercise: Train a Block Stacking Policy

  1. Create an Isaac Lab environment with a robot arm and 3 colored blocks
  2. Design a curriculum: pick 1 block → stack 2 → stack 3
  3. Train with PPO for 5,000 iterations
  4. Report grasp success rate and stack completion rate per curriculum level

Summary

  • Manipulation is naturally framed as an RL problem with phased rewards
  • Multi-phase rewards guide the robot through approach → grasp → lift → hold
  • Curriculum learning progressively increases task difficulty
  • Trained policies can be exported and deployed on physical robots via ROS 2

Next: Module 4: Vision-Language-Action — unify vision, language, and action.