Isaac for Manipulation

Learning Objectives:

Train a robot arm to grasp objects using RL
Implement curriculum learning for complex manipulation tasks
Deploy trained manipulation policies on physical hardware
Evaluate grasp success rates and generalization

Prerequisites: Chapter 1: Isaac Sim Fundamentals

Estimated Reading Time: 50 minutes

Manipulation as an RL Problem

Robot manipulation — picking up, placing, and rearranging objects — is one of the hardest problems in robotics. RL lets the robot learn manipulation skills from experience rather than hand-coding every motion.

The Grasping Task

A typical grasp task has:

Observations: joint positions, object pose, fingertip positions
Actions: joint velocity or position targets (7 DoF arm + gripper)
Reward: distance to object → contact → lift → hold

class GraspEnvCfg(ManagerBasedRLEnvCfg):
    """Configuration for object grasping task."""

    scene = InteractiveSceneCfg(num_envs=2048)

    robot = ArticulationCfg(
        prim_path="/World/Franka",
        spawn=sim_utils.UsdFileCfg(usd_path="franka_panda.usd"),
    )

    object = RigidObjectCfg(
        prim_path="/World/Object",
        spawn=sim_utils.UsdFileCfg(usd_path="cube_5cm.usd"),
    )

Multi-Phase Reward Design

Grasping requires a phased reward to guide the robot through the task:

def compute_grasp_reward(env):
    """Multi-phase reward for grasping."""
    ee_pos = env.end_effector_pos
    obj_pos = env.object_pos

    # Phase 1: Approach — move hand near the object
    approach_dist = torch.norm(ee_pos - obj_pos, dim=-1)
    approach_reward = 1.0 - torch.tanh(5.0 * approach_dist)

    # Phase 2: Grasp — close fingers when near
    is_near = (approach_dist < 0.05).float()
    grasp_force = env.gripper_force.squeeze(-1)
    grasp_reward = is_near * torch.clamp(grasp_force, 0, 1)

    # Phase 3: Lift — raise the object
    is_grasped = (grasp_force > 0.1).float()
    obj_height = env.object_pos[:, 2]
    lift_reward = is_grasped * torch.clamp(obj_height - 0.1, 0, 0.3)

    # Phase 4: Hold — maintain grasp for N steps
    hold_bonus = is_grasped * (obj_height > 0.2).float() * 5.0

    return approach_reward + grasp_reward + lift_reward + hold_bonus

Curriculum Learning

Start with easy tasks and progressively increase difficulty:

class GraspCurriculum:
    """Curriculum for gradually increasing task difficulty."""

    def __init__(self):
        self.level = 0
        self.success_threshold = 0.7

    def get_task_params(self):
        if self.level == 0:
            # Easy: object always at same position, no rotation
            return {'pos_range': 0.02, 'rot_range': 0.0}
        elif self.level == 1:
            # Medium: varying position
            return {'pos_range': 0.10, 'rot_range': 0.0}
        elif self.level == 2:
            # Hard: varying position and rotation
            return {'pos_range': 0.15, 'rot_range': 3.14}
        else:
            # Expert: random objects, cluttered scene
            return {'pos_range': 0.20, 'rot_range': 3.14,
                    'num_distractors': 5}

    def update(self, success_rate):
        if success_rate > self.success_threshold:
            self.level = min(self.level + 1, 3)

Training with PPO

Proximal Policy Optimization (PPO) is the standard algorithm for robot RL:

from isaaclab_rl.rsl_rl import PPOCfg

ppo_cfg = PPOCfg(
    num_learning_epochs=5,
    num_mini_batches=4,
    learning_rate=3e-4,
    gamma=0.99,
    lam=0.95,
    clip_param=0.2,
    value_loss_coef=1.0,
    entropy_coef=0.01,
)

# Train grasping policy
python -m isaaclab.train \
    --task grasp \
    --num_envs 2048 \
    --max_iterations 5000 \
    --algo ppo

Deployment to Physical Hardware

After training in Isaac Sim:

# Export trained policy for deployment
import torch

policy = trained_agent.actor
policy.eval()

# Trace the model for deployment
example_obs = torch.zeros(1, obs_dim)
traced_policy = torch.jit.trace(policy, example_obs)
traced_policy.save("grasp_policy.pt")

On the real robot:

# Load and run the policy
import torch

policy = torch.jit.load("grasp_policy.pt")

while True:
    obs = get_robot_observation()  # from ROS 2
    obs_tensor = torch.tensor(obs).unsqueeze(0)
    with torch.no_grad():
        action = policy(obs_tensor).squeeze(0).numpy()
    send_joint_command(action)  # via ROS 2

Exercise: Train a Block Stacking Policy

Create an Isaac Lab environment with a robot arm and 3 colored blocks
Design a curriculum: pick 1 block → stack 2 → stack 3
Train with PPO for 5,000 iterations
Report grasp success rate and stack completion rate per curriculum level

Summary

Manipulation is naturally framed as an RL problem with phased rewards
Multi-phase rewards guide the robot through approach → grasp → lift → hold
Curriculum learning progressively increases task difficulty
Trained policies can be exported and deployed on physical robots via ROS 2

Next: Module 4: Vision-Language-Action — unify vision, language, and action.

Manipulation as an RL Problem​

The Grasping Task​

Multi-Phase Reward Design​

Curriculum Learning​

Training with PPO​

Deployment to Physical Hardware​

Exercise: Train a Block Stacking Policy​

Summary​