ROBOTCORE® Framework | FPGA and GPU Hardware Acceleration framework for ROS

Benchmarks

(plots are interactive)

ROS 2 (core)

ROBOTCORE® Framework intra-FPGA ROS 2 communication queue speedup
(measured with perception_2nodes ROS 2 pakcage, between subsequent Nodes, using AMD KV260)

1.5x

Performance-per-watt (Hz/W)
(measured during iterations 10-100 using faster_doublevadd_publisher, with AMD KV260)

Performance-per-watt improvement (Hz/W)

3.69x

ROS 2 PERCEPTION NODES

(requires ROBOTCORE® Perception)

Resize - speedup

2.61x

Resize - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Rectify - speedup

7.34x

Rectify - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Harris - speedup

30.27x

Harris - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Histogram of Oriented Gradients - speedup

509.52x

Histogram of Oriented Gradients - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

ROS 2 PERCEPTION GRAPHS

(requires ROBOTCORE® Perception)

2-Node pre-processing perception graph latency (ms)
(Simple graph with 2 Nodes (Rectify-Resize) demonstrating perception pre-processing with the image_pipeline ROS 2 package. AMD's KV260 and NVIDIA's Jetson Nano 2GB boards are used for benchmarking, the former featuring a Quad-core arm Cortex-A53 and the latter a Quad-core arm Cortex-A57. Source code used for the benchmark is available at the perception_2nodes ROS 2 package)

2-Node pre-processing perception graph performance-per-watt (Hz/W)

Graph speedup - 2-Node pre-processing perception graph latency

3.6x

Performance-per-watt improvement - 2-Node pre-processing perception graph

3-Node pre-processing and region of interest detector perception graph latency (ms)
(3-Nodes graph (Rectify-Resize-Harris) demonstrating perception pre-processing and region of interest detection with the image_pipeline ROS 2 package. AMD's KV260 featuring a Quad-core arm Cortex-A53 is used for benchmarking. Source code used for the benchmark available at the perception_3nodes ROS 2 package)

Graph speedup - 3-Node pre-processing and region of interest detector perception graph

4.5x

ROS 2 TRANSFORM (tf2)

(requires ROBOTCORE® Transform)

tf tree subscription latency (us),
2 subscribers
(Measured the worse case subscription latency in a graph with 2 tf tree subscribers. Using AMD's KV260 board, NVIDIA's Jetson Nano 2GB and Microchip's PolarFire Icicle Kit. AMD's KV260 board has been used for benchmarking the CPU default tf2 baseline. )

tf tree subscription latency (us),
20-100 subscribers
(Measured the worse case subscription latency in a graph with multiple tf tree subscribers. AMD's KV260 board has been used for benchmarking all results. )

ROS 2 CLOUD NODES

(requires ROBOTCORE® Cloud)

ORB-SLAM2 Simultaneous Localization and Mapping (SLAM) Node runtime (s)
(Measured the mean per-frame runtime obtained from the ORB-SLAM2 Node while running in two scenarios: 1) Default ROS 2 running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud running in the cloud with a 36-core cloud computer provisioned.)

Node speedup - ORB-SLAM2 SLAM Node runtime

CLOUD (OTHER)

(requires ROBOTCORE® Cloud)

Grasp Planning with Dex-Net compute runtime (s)
(Measured the mean compute runtime obtained over 10 trials while using a a Dex-Net Grasp Quality Convolutional Neural Network to compute grasps from raw RGBD image observations. Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud - the same edge machine to collect raw image observations sent to a cloud computer equiped with an Nvidia Tesla T4 GPU. )

Grasp Planning speedup - Dex-Net computation total runtime (including network)

11.7x

Motion Planning Templates (MPT) compute runtime (s)
(Measuring the mean compute runtime while running multi-core motion planners from the Motion Planning Templates (MPT) on reference planning problems from the Open Motion Planning Library (OMPL). Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud - the same edge offloads computations to a 96-core cloud computer. )

Motion planning speedup - Motion Planning Templates (MPT) compute runtime (including network)

28.9x

Other products

ROBOTCORE Framework

FPGA and GPU hardware acceleration framework for ROS

How does it work?

Supported
hardware solutions

Supporting the most popular hardware acceleration solutions and development kits to build robots with hardware acceleration and ROS.

One framework for all
hardware acceleration vendors

Providing a vendor-agnostic ROS-centric development flow for hardware acceleration. Reduced development time in robotics. Easier to integrate in existing applications.

Build acceleration kernels (IP cores) directly from your ROS packages

ROBOTCORE® Framework extends the ROS 2 build system (ament) with CMake logic to simplify the creation of acceleration kernels directly from your ROS packages' CMakeLists.txt files.

Avoid intra-process ROS 2 data conversions with type adaptation

Leverages an extension to rclcpp that helps convert between ROS types and custom, user-defined types for Topics, Services, and Actions to maximize performance.

Dynamically negotiate the message types with ROS 2 type negotiation

Allow ROS 2 Nodes to dynamically negotiate the message types used by publishers and subscriptions, as well adaptively modifying the behavior of publisher and subscriptions.

An open architecture for hardware
acceleration in robotics

ROS 2 cores ready to use

ROS 2 API-compatible pre-built cores and extensions

ROS and ROS 2 support

Developer-ready documentation and support

Benchmarks

ROS 2 (core)

ROS 2 PERCEPTION NODES

ROS 2 PERCEPTION GRAPHS

ROS 2 TRANSFORM (tf2)

ROS 2 CLOUD NODES

CLOUD (OTHER)

Do you have any questions?

Other products

ROBOTCORE Framework

FPGA and GPU hardware acceleration framework for ROS

How does it work?

Supported hardware solutions

Supporting the most popular hardware acceleration solutions and development kits to build robots with hardware acceleration and ROS.

One framework for all hardware acceleration vendors

Providing a vendor-agnostic ROS-centric development flow for hardware acceleration. Reduced development time in robotics. Easier to integrate in existing applications.

Build acceleration kernels (IP cores) directly from your ROS packages

ROBOTCORE® Framework extends the ROS 2 build system (ament) with CMake logic to simplify the creation of acceleration kernels directly from your ROS packages' CMakeLists.txt files.

Avoid intra-process ROS 2 data conversions with type adaptation

Leverages an extension to rclcpp that helps convert between ROS types and custom, user-defined types for Topics, Services, and Actions to maximize performance.

Dynamically negotiate the message types with ROS 2 type negotiation

Allow ROS 2 Nodes to dynamically negotiate the message types used by publishers and subscriptions, as well adaptively modifying the behavior of publisher and subscriptions.

An open architecture for hardware acceleration in robotics

ROS 2 cores ready to use

ROS 2 API-compatible pre-built cores and extensions

ROS and ROS 2 support

Developer-ready documentation and support

Benchmarks

ROS 2 (core)

ROS 2 PERCEPTION NODES

ROS 2 PERCEPTION GRAPHS

ROS 2 TRANSFORM (tf2)

ROS 2 CLOUD NODES

CLOUD (OTHER)

Do you have any questions?

Supported
hardware solutions

One framework for all
hardware acceleration vendors

An open architecture for hardware
acceleration in robotics