Global Planning for Contact-Rich Manipulation via Local Smoothing of Quasi-dynamic Contact Models
Tao Pang*, H.J. Terry Suh*, Lujie Yang, Russ Tedrake
MIT CSAIL, Robot Locomotion Group
Published in T-RO 2023
ICRA 2024 Presentation



The Software Bottleneck in Robotics


A
B
Do not make contact
Make contact
Today's robots are not fully leveraging its hardware capabilities

Larger objects = Larger Robots?
Contact-Rich Manipulation Enables Same Hardware to do More Things
Case Study 1. Whole-body Manipulation

What is Contact-Rich Manipulation?
What is Contact-rich Manipulation?
Case Study 2. Dexterous Hands
OpenAI
What our paper is about
1. Why is RL succeeding where model-based methods struggle?
2. Can we do better by understanding?
What our paper is about
1. Why is RL succeeding where model-based methods struggle?
2. Can we do better by understanding?
- RL Regularizes Landscapes using stochasticity
- Allows Monte-Carlo Abstraction of Contact Modes
- Global optimization with stochasticity
What our paper is about
1. Why is RL succeeding where model-based methods struggle?
2. Can we do better by understanding?
- RL Regularizes Landscapes using stochasticity
- Allows Monte-Carlo Abstraction of Contact Modes
- Global optimization with stochasticity
- interior-point smoothing of contact dynamics
- Efficient gradient computation using sensitivity analysis
- Use of RRT to perform fast online global planning
Toy Problem
Simplified Problem
Given initial and goal ,
which action minimizes distance to the goal?
Toy Problem
Simplified Problem
Consider simple gradient descent,

Dynamics of the system
No Contact
Contact
The gradient is zero if there is no contact!
The gradient is zero if there is no contact!
Local gradient-based methods get stuck due to the flat / non-smooth landscapes
Previous Approaches to Tackling the Problem




[MDGBT 2017]
[HR 2016]
[CHHM 2022]
[AP 2022]

Contact
No Contact
Cost
Mixed Integer Programming
Mode Enumeration
Active Set Approach
Why don't we search more globally for each contact mode?
In no-contact, run gradient descent.
In contact, run gradient descent.
Problems with Mode Enumeration


System
Number of Modes

The number of modes scales terribly with system complexity
No Contact
Sticking Contact
Sliding Contact
Number of potential active contacts
How does RL power through these problems?
Reinforcement Learning fundamentally considers a stochastic objective
Previous Formulations
Reinforcement Learning

Contact
No Contact
Cost
How does RL power through these problems?
Previous Formulations
Reinforcement Learning

Contact
No Contact
Cost
How does RL power through these problems?
Previous Formulations
Reinforcement Learning

Contact
No Contact
Cost

Contact
No Contact
Averaged
Randomized smoothing
regularizes landscapes
Cost
How does RL power through these problems?
Previous Formulations
Reinforcement Learning

Contact
No Contact
Cost

Contact
No Contact
Averaged
Randomized smoothing
regularizes landscapes
Cost
But leads to high variance,
low sample-efficiency.

Non-smooth Contact Dynamics
Smooth Surrogate Dynamics

No Contact
Contact

Averaged
Dynamic Smoothing
What if we had smoothed dynamics instead of the overall cost?
Effects of Dynamic Smoothing
Reinforcement Learning

Cost
Contact
No Contact
Averaged
Dynamic Smoothing

Averaged
Contact
No Contact
No Contact
Can still claim benefits of averaging multiple modes leading to better landscapes
Importantly, we know structure for these dynamics!
Can often acquire smoothed dynamics & gradients without Monte-Carlo.
Example: Box vs. wall
Commanded next position
Actual next position
Cannot penetrate into the wall
Implicit Time-stepping simulation
No Contact
Contact
Structured Smoothing: An Example
Importantly, we know structure for these dynamics!
Can often acquire smoothed dynamics & gradients without Monte-Carlo.
Example: Box vs. wall
Implicit Time-stepping simulation
Commanded next position
Actual next position
Cannot penetrate into the wall
Log-Barrier Relaxation
Structured Smoothing: An Example
Differentiating with Sensitivity Analysis
How do we obtain the gradients from an optimization problem?


Differentiating with Sensitivity Analysis
How do we obtain the gradients from an optimization problem?


Differentiate through the optimality conditions!
Stationarity Condition
Implicit Function Theorem

Differentiate by u
Quasi-dynamic Simulator
Quasidynamic Equations of Motion
Object Dynamics
Impedance-Controlled Actuator Dynamics
Non-Penetration
Friction Cone Constraints
Conic Complementarity
Quasi-dynamic Simulator
Quasidynamic Equations of Motion
Object Dynamics
Impedance-Controlled Actuator Dynamics
Non-Penetration
Friction Cone Constraints
Conic Complementarity

KKT Optimality
Conditions of SOCP
Quasi-dynamic Simulator


Original SOCP Problem
Interior-Point Relaxation

Example: Box vs. wall
Randomized smoothing
Barrier smoothing
Randomized smoothing distribution that results in barrier smoothing
Barrier vs. Randomized Smoothing
Barrier & Randomized Smoothing are Equivalent
Gradient-based Optimization with Dynamics Smoothing
Scales extremely well in highly-rich contact
Efficient solutions in ~10s.
Single Horizon
Single Horizon
Multi-Horizon
Fundamental Limitations with Local Search



How do we push in this direction?
How do we rotate further in presence of joint limits?
Highly non-local movements are required to solve these problems

Rapidly Exploring Random Tree (RRT) Algorithm
[10] Figure Adopted from Tao Pang's Thesis Defense, MIT, 2023
(1) Sample subgoal
(2) Find nearest node
(3) Grow towards
RRT for Dynamics

Works well for Euclidean spaces. Why is it hard to use for dynamical systems?

What is "Nearest" in a dynamical system?


Rajamani et al., "Vehicle Dynamics", Springer Mechanical Engineering Series, 2011
Suh et al., "A Fast PRM Planner for Car-like Vehicles", self-hosted, 2018.
Closest in Euclidean space might not be closest for dynamics.
A Dynamically Consistent Distance Metric

What is the right distance metric
What is the right distance metric
Fix some nominal values for ,
How far is from ?
The least amount of "Effort"
to reach the goal
A Dynamically Consistent Distance Metric
We can derive a closed-form solution under linearization of dynamics
Mahalanobis Distance induced by the Jacobian

Linearize around (no movement)
Jacobian of dynamics
A Dynamically Consistent Distance Metric
Mahalanobis Distance induced by the Jacobian


Locally, dynamics are:
Large Singular Values,
Less Required Input
A Dynamically Consistent Distance Metric


Locally, dynamics are:
(In practice, requires regularization)
Mahalanobis Distance induced by the Jacobian
Zero Singular Values,
Requires Infinite Input
A Dynamically Consistent Distance Metric
Contact problem strikes again.

According to this metric, infinite distance if no contact is made!
What if there is no contact?
Mahalanobis Distance induced by the Jacobian
A Dynamically Consistent Distance Metric
Mahalanobis Distance induced by the Jacobian

Again, dynamic smoothing comes to the rescue!
A Dynamically Consistent Distance Metric



Now we can apply RRT to contact-rich systems!



However, these still require lots of random extensions!
With some chance, place the actuated object in a different configuration.
(Regrasping / Contact-Sampling)
Contact-Rich RRT with Dynamic Smoothing

Our method can find solutions through contact-rich systems in few iterations! (~ 1 minute)

What our paper is about
1. Why is RL succeeding where model-based methods struggle?
2. Can we do better by understanding?
- RL Regularizes Landscapes using stochasticity
- Allows Monte-Carlo Abstraction of Contact Modes
- Global optimization with stochasticity
- interior-point smoothing of contact dynamics
- Efficient gradient computation using sensitivity analysis
- Use of RRT to perform fast online global planning
Much to Learn & Improve on from RL's success
Teaser
Meet us at Posters!




Tao Pang*
H.J. Terry Suh*
Lujie Yang
Russ Tedrake
ThBT 27.07

Paper
Code

Poster


ICRA Presentation
By Terry Suh
ICRA Presentation
- 233