Concepts

Action Server

Just as in ROS, action servers are a common way to control long running tasks like navigation. This stack makes more extensive use of actions, and in some cases, without an easy topic interface. It is more important to understand action servers as a developer in ROS 2. Some simple CLI examples can be found in the ROS 2 documentation.

Action servers are similar to a canonical service server. A client will request some task to be completed, except, this task may take a long time. An example would be moving the shovel up from a bulldozer or ask a robot to travel 10 meters to the right.

In this situation, action servers and clients allow us to call a long-running task in another process or thread and return a future to its result. It is permissible at this point to block until the action is complete, however, you may want to occasionally check if the action is complete and continue to process work in the client thread. Since it is long-running, action servers will also provide feedback to their clients. This feedback can be anything and is defined in the ROS .action along with the request and result types. In the bulldozer example, a request may be an angle, a feedback may be the angle remaining to be moved, and the result is a success or fail boolean with the end angle. In the navigation example, a request may be a position, a feedback may be the time its been navigating for and the distance to the goal, and the result a boolean for success.

Feedback and results can be gathered synchronously by registering callbacks with the action client. They may also be gathered by asynchronously requesting information from the shared future objects. Both require spinning the client node to process callback groups.

Action servers are used in this stack to communicate with the highest level Behavior Tree (BT) navigator through a NavigateToPose action message. They are also used for the BT navigator to communicate with the subsequent smaller action servers to compute plans, control efforts, and recoveries. Each will have their own unique .action type in nav2_msgs for interacting with the servers.

Lifecycle Nodes and Bond

Lifecycle (or Managed, more correctly) nodes are unique to ROS 2. More information can be found here. They are nodes that contain state machine transitions for bringup and teardown of ROS 2 servers. This helps in deterministic behavior of ROS systems in startup and shutdown. It also helps users structure their programs in reasonable ways for commercial uses and debugging.

When a node is started, it is in the unconfigured state, only processing the node’s constructor which should not contain any ROS networking setup or parameter reading. By the launch system, or the supplied lifecycle manager, the nodes need to be transitioned to inactive by configuring. After, it is possible to activate the node by transitioning through the activating stage.

This state will allow the node to process information and be fully setup to run. The configuration stage, triggering the on_configure() method, will setup all parameters, ROS networking interfaces, and for safety systems, all dynamically allocated memory. The activation stage, triggering the on_activate() method, will active the ROS networking interfaces and set any states in the program to start processing information.

To shutdown, we transition into deactivating, cleaning up, shutting down and end in the finalized state. The networking interfaces are deactivated and stop processing, deallocate memory, exit cleanly, in those stages, respectively.

The lifecycle node framework is used extensively through out this project and all servers utilize it. It is best convention for all ROS systems to use lifecycle nodes if it is possible.

Within Nav2, we use a wrapper of LifecycleNodes, nav2_util LifecycleNode. This wrapper wraps much of the complexities of LifecycleNodes for typical applications. It also includes a bond connection for the lifecycle manager to ensure that after a server transitions up, it also remains active. If a server crashes, it lets the lifecycle manager know and transition down the system to prevent a critical failure. See Eloquent to Foxy for details.

Behavior Trees

Behavior trees (BT) are becoming increasingly common in complex robotics tasks. They are a tree structure of tasks to be completed. It creates a more scalable and human-understandable framework for defining multi-step or many state applications. This is opposed to a finite state machine (FSM) which may have dozens of states and hundreds of transitions. An example would be a soccer-playing robot. Embedding the logic of soccer game play into a FSM would be challenging and error prone with many possible states and rules. Additionally, modeling choices like to shoot at the goal from the left, right, or center, is particularly unclear. With a BT, basic primitives, like “kick”, “walk”, “go to ball”, can be created and reused for many behaviors. More information can be found in this book. I strongly recommend reading chapters 1-3 to get a good understanding of the nomenclature and workflow. It should only take about 30 minutes.

Behavior Trees provide a formal structure for navigation logic which can be both used to create complex systems but also be verifiable and validated as provenly correct using advanced tools. Having the application logic centralized in the behavior tree and with independent task servers (which only communicate data over the tree) allows for formal analysis.

For this project, we use BehaviorTree CPP V3 as the behavior tree library. We create node plugins which can be constructed into a tree, inside the BT Navigator. The node plugins are loaded into the BT and when the XML file of the tree is parsed, the registered names are associated. At this point, we can march through the behavior tree to navigate.

One reason this library is used is its ability to load subtrees. This means that the Nav2 behavior tree can be loaded into another higher-level BT to use this project as node plugin. An example would be in soccer play, using the Nav2 behavior tree as the “go to ball” node with a ball detection as part of a larger task. Additionally, we supply a NavigateToPoseAction plugin (among others) for BT so the Nav2 stack can be called from a client application through the usual action interface.

Other systems could be used to design complex autonomous behavior, namely Hierarchical FSMs (HFSM). Behavior Trees were selected due to popularity across the robotics and related industries and by largely user demand. However, due to the independent task server nature of Nav2, it is not difficult to offer a nav2_hfsm_navigator package in the future, pending interest and contribution.

State Estimation

Within the navigation project, there are 2 major transformations that need to be provided, according to community standards. The map to odom transform is provided by a positioning system (localization, mapping, SLAM) and odom to base_link by an odometry system.

Standards

REP 105 defines the frames and conventions required for navigation and the larger ROS ecosystem. These conventions should be followed at all times to make use of the rich positioning, odometry, and SLAM projects available in the community.

In a nutshell, REP-105 says that you must, at minimum, build a TF tree that contains a full map -> odom -> base_link -> [sensor frames] for your robot. TF2 is the time-variant transformation library in ROS 2 we use to represent and obtain time synchronized transformations. It is the job of the global positioning system (GPS, SLAM, Motion Capture) to, at minimum, provide the map -> odom transformation. It is then the role of the odometry system to provide the odom -> base_link transformation. The remainder of the transformations relative to base_link should be static and defined in your URDF.

Global Positioning: Localization and SLAM

It is the job of the global positioning system (GPS, SLAM, Motion Capture) to, at minimum, provide the map -> odom transformation. We provide amcl which is an Adaptive Monte-Carlo Localization technique based on a particle filter for localization in a static map. We also provide SLAM Toolbox as the default SLAM algorithm for use to position and generate a static map.

These methods may also produce other output including position topics, maps, or other metadata, but they must provide that transformation to be valid. Multiple positioning methods can be fused together using robot localization, discussed more below.

Odometry

It is the role of the odometry system to provide the odom -> base_link transformation. Odometry can come from many sources including LIDAR, RADAR, wheel encoders, VIO, and IMUs. The goal of the odometry is to provide a smooth and continuous local frame based on robot motion. The global positioning system will update the transformation relative to the global frame to account for the odometric drift.

Robot Localization is typically used for this fusion. It will take in N sensors of various types and provide a continuous and smooth odometry to TF and to a topic. A typical mobile robotics setup may have odometry from wheel encoders, IMUs, and vision fused in this manner.

The smooth output can be used then for dead-reckoning for precise motion and updating the position of the robot accurately between global position updates.

Environmental Representation

The environmental representation is the way the robot perceives its environment. It also acts as the central localization for various algorithms and data sources to combine their information into a single space. This space is then used by the controllers, planners, and recoveries to compute their tasks safely and efficiently.

Costmaps and Layers

The current environmental representation is a costmap. A costmap is a regular 2D grid of cells containing a cost from unknown, free, occupied, or inflated cost. This costmap is then searched to compute a global plan or sampled to compute local control efforts.

Various costmap layers are implemented as pluginlib plugins to buffer information into the costmap. This includes information from LIDAR, RADAR, sonar, depth images, etc. It may be wise to process sensor data before inputting it into the costmap layer, but that is up to the developer.

Costmap layers can be created to detect and track obstacles in the scene for collision avoidance using camera or depth sensors. Additionally, layers can be created to algorithmically change the underlying costmap based on some rule or heuristic. Finally, they may be used to buffer live data into the 2D or 3D world for binary obstacle marking.

Costmap Filters

Imagine, you’re annotating a map file (or any image file) in order to have a specific action occur based on the location in the annotated map. Examples of marking/annotating might be keep out zones to avoid planning inside, or have pixels belong to maximum speeds in marked areas. This annotated map is called “filter mask”. Just like a mask overlaid on a surface, it can or cannot be same size, pose and scale as a main map. The main goal of filter mask - is to provide the ability of marking areas on maps with some additional features or behavioral changes.

Costmap filters are a costmap layer-based approach of applying spatial-dependent behavioral changes, annotated in filter masks, into the Nav2 stack. Costmap filters are implemented as costmap plugins. These plugins are called “filters” as they are filtering a costmap by spatial annotations marked on filter masks. In order to make a filtered costmap and change a robot’s behavior in annotated areas, the filter plugin reads the data coming from the filter mask. This data is being linearly transformed into a feature map in a filter space. Having this transformed feature map along with a map/costmap, any sensor data and current robot coordinate filters can update the underlying costmap and change the behavior of the robot depending on where it is. For example, the following functionality could be made by use of costmap filters:

  • Keep-out/safety zones where robots will never enter.

  • Speed restriction areas. Maximum speed of robots going inside those areas will be limited.

  • Preferred lanes for robots moving in industrial environments and warehouses.

Other Forms

Various other forms of environmental representations exist. These include:

  • gradient maps, which are similar to costmaps but represent surface gradients to check traversibility over

  • 3D costmaps, which represent the space in 3D, but then also requires 3D planning and collision checking

  • Mesh maps, which are similar to gradient maps but with surface meshes at many angles

  • “Vector space”, taking in sensor information and using machine learning to detect individual items and locations to track rather than buffering discrete points.

Last updated