Other products

ROBOTCORE Framework

FPGA and GPU hardware acceleration framework for ROS

ROBOTCORE® Framework helps build custom compute architectures for robots, or IP cores, directly from ROS workspaces without complex third-party tool integrations. Make your robots faster, more deterministic and/or power-efficient. Simply put, it provides a vendor-agnostic development, build and deployment experience for creating robot hardware and hardware accelerators similar to the standard, non-accelerated ROS development flow.

Get ROBOTCORE® Framework Read the paper
ROBOTCORE FRAMEWORK

How does it work?

Traditional software development in robotics is about building dataflows with ROS computational graphs. These dataflows go from sensors to compute technologies, all the way down to actuators and back, representing the "brain" of the robot. Generally, ROS computational graphs are built in the CPUs of a given robot. But CPUs have fixed hardware, with pre-defined memory architectures and constraints which limit the performance. Sparked by the decline of Moore's Law and Dennard Scaling, specialized computing units capable of hardware acceleration have proven to be the answer for achieving higher performance in robotics.

ROBOTCORE® Framework allows to easily leverage FPGA and GPU hardware acceleration in a ROS-centric manner and build custom compute architectures for robots, or "IP cores". With these IP cores, roboticists can adapt one or simultaneously more of the properties of their computational graphs (e.g., its speed, determinism, power consumption) optimizing the amount of hardware resources and, as a consequence, the performance in an accelerated dataflow.

Benchmarks

Supported
hardware solutions

Supporting the most popular hardware acceleration solutions and development kits to build robots with hardware acceleration and ROS.

ROBOTCORE® robotic processing unit
Intel's Agilex® 7 FPGA F-Series FPGA
AMD's KR260
AMD's KV260
AMD's KD240
AMD's K26
AMD's K24
AMD's ZCU102
AMD's ZCU104
NVIDIA's Jetson Nano
NVIDIA's Jetson Nano 2GB
NVIDIA's Jetson Xavier NX
NVIDIA's Jetson AGX Xavier
NVIDIA's Jetson AGX Orin
Microchip's PolarFire Icicle Kit
NVIDIA's Jetson TX1
AVNET's Ultra96-V2

One framework for all
hardware acceleration vendors

Providing a vendor-agnostic ROS-centric development flow for hardware acceleration. Reduced development time in robotics. Easier to integrate in existing applications.

Build acceleration kernels (IP cores) directly from your ROS packages

ROBOTCORE® Framework extends the ROS 2 build system (ament) with CMake logic to simplify the creation of acceleration kernels directly from your ROS packages' CMakeLists.txt files.

acceleration_kernel(
NAME vadd
FILE src/vadd.cpp
INCLUDE
  include
)

Avoid intra-process ROS 2 data conversions with type adaptation

Leverages an extension to rclcpp that helps convert between ROS types and custom, user-defined types for Topics, Services, and Actions to maximize performance.

Dynamically negotiate the message types with ROS 2 type negotiation

Allow ROS 2 Nodes to dynamically negotiate the message types used by publishers and subscriptions, as well adaptively modifying the behavior of publisher and subscriptions.

An open architecture for hardware
acceleration in robotics

ROBOTCORE® Framework deals with vendor proprietary libraries for hardware acceleration in robotics. It helps accelerate computations, increase performance and abstract away the complexity of bringing your ROS computational graphs to your favourite silicon architecture. All while delivering the common ROS development flow.

Need a customization? ROS community contributions

ROS 2 cores ready to use

ROS 2 API-compatible pre-built cores and extensions

ROBOTCORE Perception is a ROS 2-API compatible optimized perception stack that leverages hardware acceleration to provide a speedup in your perception computations.

ROBOTCORE Perception

ROBOTCORE Transform is an optimized robotics transform library. API-compatible with the ROS 2 transform (tf2) library, it manages efficiently the transformations between coordinate systems in a robot.

ROBOTCORE Transform

ROS and ROS 2 support

ROBOTCORE® Framework extends the ROS and ROS 2 build systems to allows roboticists to generate acceleration kernels in the same way they generate CPU libraries. Support for legacy ROS systems, and extensions to other middlewares is also possible.

Accelerate ROS Accelerate ROS 2

Developer-ready documentation and support

ROBOTCORE® Framework is served by seasoned ROS developers for ROS development. It includes documentation, examples, reference designs and the possibility of various levels of support.

Ask about support levels

Benchmarks

(plots are interactive)

ROS 2 (core)

ROBOTCORE® Framework intra-FPGA ROS 2 communication queue speedup
(measured with perception_2nodes ROS 2 pakcage, between subsequent Nodes, using AMD KV260)

1.5x

Performance-per-watt (Hz/W)
(measured during iterations 10-100 using faster_doublevadd_publisher, with AMD KV260)

Performance-per-watt improvement (Hz/W)

3.69x

ROS 2 PERCEPTION NODES

(requires ROBOTCORE® Perception)

Resize - speedup

2.61x

Resize - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Rectify - speedup

7.34x

Rectify - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Harris - speedup

30.27x

Harris - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

Histogram of Oriented Gradients - speedup

509.52x

Histogram of Oriented Gradients - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)

ROS 2 PERCEPTION GRAPHS

(requires ROBOTCORE® Perception)

2-Node pre-processing perception graph latency (ms)
(Simple graph with 2 Nodes (Rectify-Resize) demonstrating perception pre-processing with the image_pipeline ROS 2 package. AMD's KV260 and NVIDIA's Jetson Nano 2GB boards are used for benchmarking, the former featuring a Quad-core arm Cortex-A53 and the latter a Quad-core arm Cortex-A57. Source code used for the benchmark is available at the perception_2nodes ROS 2 package)

2-Node pre-processing perception graph performance-per-watt (Hz/W)

Graph speedup - 2-Node pre-processing perception graph latency

3.6x

Performance-per-watt improvement - 2-Node pre-processing perception graph

6x

3-Node pre-processing and region of interest detector perception graph latency (ms)
(3-Nodes graph (Rectify-Resize-Harris) demonstrating perception pre-processing and region of interest detection with the image_pipeline ROS 2 package. AMD's KV260 featuring a Quad-core arm Cortex-A53 is used for benchmarking. Source code used for the benchmark available at the perception_3nodes ROS 2 package)

Graph speedup - 3-Node pre-processing and region of interest detector perception graph

4.5x

ROS 2 TRANSFORM (tf2)

(requires ROBOTCORE® Transform)

tf tree subscription latency (us),
2 subscribers
(Measured the worse case subscription latency in a graph with 2 tf tree subscribers. Using AMD's KV260 board, NVIDIA's Jetson Nano 2GB and Microchip's PolarFire Icicle Kit. AMD's KV260 board has been used for benchmarking the CPU default tf2 baseline. )

tf tree subscription latency (us),
20-100 subscribers
(Measured the worse case subscription latency in a graph with multiple tf tree subscribers. AMD's KV260 board has been used for benchmarking all results. )

ROS 2 CLOUD NODES

(requires ROBOTCORE® Cloud)

ORB-SLAM2 Simultaneous Localization and Mapping (SLAM) Node runtime (s)
(Measured the mean per-frame runtime obtained from the ORB-SLAM2 Node while running in two scenarios: 1) Default ROS 2 running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud running in the cloud with a 36-core cloud computer provisioned.)

Node speedup - ORB-SLAM2 SLAM Node runtime

4x

CLOUD (OTHER)

(requires ROBOTCORE® Cloud)

Grasp Planning with Dex-Net compute runtime (s)
(Measured the mean compute runtime obtained over 10 trials while using a a Dex-Net Grasp Quality Convolutional Neural Network to compute grasps from raw RGBD image observations. Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud - the same edge machine to collect raw image observations sent to a cloud computer equiped with an Nvidia Tesla T4 GPU. )

Grasp Planning speedup - Dex-Net computation total runtime (including network)

11.7x

Motion Planning Templates (MPT) compute runtime (s)
(Measuring the mean compute runtime while running multi-core motion planners from the Motion Planning Templates (MPT) on reference planning problems from the Open Motion Planning Library (OMPL). Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud - the same edge offloads computations to a 96-core cloud computer. )

Motion planning speedup - Motion Planning Templates (MPT) compute runtime (including network)

28.9x

Do you have any questions?

Get in touch with our team.

Let's talk Case studies