ROBOTCORE® Framework helps build custom compute architectures for robots, or IP cores, directly from ROS workspaces without complex third-party tool integrations. Make your robots faster, more deterministic and/or power-efficient. Simply put, it provides a vendor-agnostic development, build and deployment experience for creating robot hardware and hardware accelerators similar to the standard, non-accelerated ROS development flow.
Get ROBOTCORE® Framework Read the paper
Traditional software development in robotics is about building dataflows with ROS
computational graphs. These dataflows go from sensors to compute technologies, all the way down
to actuators and back,
representing the "brain" of the robot. Generally, ROS computational graphs are built in the
CPUs of a given robot. But CPUs have fixed hardware, with pre-defined memory architectures and
constraints which limit the performance. Sparked by the decline of Moore's Law and Dennard
Scaling, specialized computing units capable of hardware acceleration have proven to be the
answer for achieving higher performance in robotics.
ROBOTCORE® Framework allows to easily leverage FPGA and GPU hardware acceleration in a ROS-centric
manner and build custom compute architectures for robots, or "IP cores".
With these IP cores, roboticists can adapt one or simultaneously more of the properties of their
computational graphs
(e.g., its speed, determinism, power consumption) optimizing the amount of hardware resources
and, as a
consequence, the performance in an accelerated
dataflow.
acceleration_kernel(
NAME vadd
FILE src/vadd.cpp
INCLUDE
include
)
ROBOTCORE® Framework deals with vendor proprietary libraries for hardware acceleration in robotics. It helps accelerate computations, increase performance and abstract away the complexity of bringing your ROS computational graphs to your favourite silicon architecture. All while delivering the common ROS development flow.
Need a customization? ROS community contributions
ROBOTCORE Perception is a ROS 2-API compatible optimized perception stack that leverages hardware acceleration to provide a speedup in your perception computations.
ROBOTCORE PerceptionROBOTCORE Transform is an optimized robotics transform library. API-compatible with the ROS 2 transform (tf2) library, it manages efficiently the transformations between coordinate systems in a robot.
ROBOTCORE TransformROBOTCORE® Framework extends the ROS and ROS 2 build systems to allows roboticists to generate acceleration kernels in the same way they generate CPU libraries. Support for legacy ROS systems, and extensions to other middlewares is also possible.
Accelerate ROS Accelerate ROS 2
ROBOTCORE® Framework is served by seasoned ROS developers for ROS development. It includes documentation, examples, reference designs and the possibility of various levels of support.
Ask about support levels(plots are interactive)
ROBOTCORE® Framework intra-FPGA ROS 2 communication queue speedup
(measured with perception_2nodes
ROS 2 pakcage, between subsequent Nodes, using AMD KV260)
1.5x
Performance-per-watt (Hz/W)
(measured during
iterations 10-100 using faster_doublevadd_publisher,
with AMD
KV260)
Performance-per-watt improvement (Hz/W)
3.69x
Resize - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)
Rectify - speedup
7.34x
Rectify - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)
Harris - speedup
30.27x
Harris - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)
Histogram of Oriented Gradients - speedup
509.52x
Histogram of Oriented Gradients - kernel runtime latency (ms) (ROBOTCORE Perception running on an AMD KV260, NVIDIA Isaac ROS running in a Jetson Nano 2GB. Measurements present the kernel runtime in milliseconds (ms) and discard ROS 2 message-passing infrastructure overhead and host-device (GPU or FPGA) data transfer overhead)
(requires ROBOTCORE® Perception)
2-Node pre-processing perception graph latency (ms)
(Simple graph with 2 Nodes (Rectify-Resize) demonstrating perception
pre-processing with the image_pipeline ROS 2
package. AMD's KV260 and NVIDIA's Jetson Nano 2GB boards are used for benchmarking, the
former featuring a Quad-core arm Cortex-A53 and the latter a Quad-core arm Cortex-A57.
Source code used for the benchmark is available at the perception_2nodes
ROS 2 package)
2-Node pre-processing perception graph performance-per-watt (Hz/W)
Graph speedup - 2-Node pre-processing perception graph latency
3.6x
Performance-per-watt improvement - 2-Node pre-processing perception graph
6x
3-Node pre-processing and region of interest detector perception graph latency
(ms)
(3-Nodes graph (Rectify-Resize-Harris) demonstrating perception
pre-processing and region of interest detection with the image_pipeline ROS 2
package. AMD's KV260 featuring a Quad-core arm Cortex-A53 is used for benchmarking.
Source code used for the benchmark available at the perception_3nodes
ROS 2 package)
Graph speedup - 3-Node pre-processing and region of interest detector
perception
graph
4.5x
(requires ROBOTCORE® Transform)
tf tree subscription latency
(us),
2 subscribers
(Measured the worse case subscription latency in a graph with
2 tf tree subscribers. Using AMD's KV260 board, NVIDIA's Jetson
Nano 2GB and Microchip's PolarFire Icicle Kit.
AMD's KV260 board has been used for benchmarking the CPU default tf2 baseline.
)
tf tree subscription latency
(us),
20-100 subscribers
(Measured the worse case subscription latency in a graph with
multiple tf tree subscribers.
AMD's KV260 board has been used for benchmarking all results.
)
(requires ROBOTCORE® Cloud)
ORB-SLAM2 Simultaneous Localization and Mapping (SLAM) Node runtime (s)
(Measured the mean per-frame runtime obtained from the
ORB-SLAM2 Node while running in two scenarios: 1) Default ROS
2 running on the edge with an Intel NUC with an Intel® Pentium® Silver J5005 CPU
@ 1.50 GHz with 2 cores enabled and with a 10 Mbps network connection and 2) ROBOTCORE® Cloud running in the cloud with a 36-core cloud
computer provisioned.)
Node speedup - ORB-SLAM2 SLAM Node runtime
4x
(requires ROBOTCORE® Cloud)
Grasp Planning with Dex-Net compute runtime (s)
(Measured the mean compute runtime obtained over 10 trials
while using a a Dex-Net Grasp Quality Convolutional Neural Network to compute grasps from
raw RGBD image observations. Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel®
Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network
connection and 2) ROBOTCORE® Cloud - the same edge machine
to collect raw image observations sent to a cloud computer equiped with an Nvidia Tesla T4
GPU.
)
Grasp Planning speedup - Dex-Net computation total runtime (including
network)
11.7x
Motion Planning Templates (MPT) compute runtime (s)
(Measuring the mean compute runtime while running multi-core
motion planners from the Motion Planning Templates (MPT) on reference planning problems
from the Open Motion Planning Library (OMPL). Two scenarios are considered: 1) Edge - running on the edge with an Intel NUC with an Intel®
Pentium® Silver J5005 CPU @ 1.50 GHz with 2 cores enabled and with a 10 Mbps network
connection and 2) ROBOTCORE® Cloud - the same edge
offloads computations to a 96-core cloud computer.
)
Motion planning speedup - Motion Planning Templates (MPT) compute runtime
(including network)
28.9x