Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

In this chapter, we’ll discuss some of our dream requirements for a behavior architecture, in order to set up our goals and the desired properties of a particular implementation. In this thesis we focus on behavior systems that require human expertise to dream up, create, adapt, and modify. It is conceivable that a generally intelligent AI could replace this role in the future, but nevertheless we do not consider that here. This list of characteristics is therefore in the realm of Operator-Robot teams, be it any number of humans and robots. We leave the Operator-Robot ratio open-ended. Potentially, many humans could be managing one robot or one human could be managing a fleet of robots.

Capability

The goal of a behavior system should be to support doing as many tasks as possible to help it achieve maximum utility. The whole point of a robot behavior system, as we are concerned with it in this thesis, is to fill in for the dull, dirty, and dangerous work humans do. We define capability as how many different tasks and their variations can be performed successfully. For example, a system that only supports door traversals is not as capable as one that supports exploring buildings.

Feasibility

Any implementation of a behavior system must be feasible given real-world constraints. It is desirable to not require overly expensive computers or ones that are not readily available. If the robot needs to function autonomously in comms-degraded scenarios, the behavior should not rely on external comms or compute to operate. The behaviors cannot require robot actuation hardware that is not reliable, readily available, or that does not exist. There should be no jetpack flying requirement for behaviors if robots with jetpacks and controllers for them do not exist or only exist as prototypes. The behaviors should not rely on control software that does not exist. For example, modern whole-body controllers do not do much in the way of planning out how to achieve complicated bracing positions and techniques to avoid falling in dynamic scenarios, so the behavior system should support getting the robot into positions and authoring at a primitive action level that allows the operator to reason about what the controller can handle while authoring the behavior actions.

Speed

Behaviors should be watchable at 1x speed. Computational components of the system need to run within their allotted time boundaries, not causing any pauses. The robot hardware and the whole-body controller that the behavior system relies on should be capable of decently fast motions. We are not talking about super fast speeds here, just approaching casual human speed in performing day-to-day chore-like tasks. We want robots to be a drop-in for human work without an immediately huge tradeoff or question mark on speed.

Parallelizability

The system should support moving multiple parts of the body at once and the ability to walk while doing that. This is particularly useful when doing manipulation. Before performing many manipulations, the robot will need to prepare the grasping arm in a pre-grasp-ready pose while putting the other arm in a collision-avoidance pose. Having to get into these ready poses sequentially would cause an unnecessary delay. This goes towards the speed metric, but also important for capability. For example, for traversing spring-loaded doors, the robot must walk while keeping its arm out in the correct locations to prevent the door from closing on the robot.

Reliability

You should be able to execute tasks repeatably without failures. More formally, for a given task, given similar environmental conditions, the robot should consistently perform the task. If it succeeds, it should consistently succeed (and if it fails, it should consistently fail). In other words, you should be able to count on the robot to perform a task repeatedly without random failures that have little to do with environmental variance.

Robustness

The robot and behavior system should be robust to environmental disturbances. These can be both physical and visual. For example, slight pushes to the robot, unmodeled friction and inertias in manipulated items, and changes in lighting due to time of day should not cause task failures. Robustness mechanisms should be present via the whole-body controller or in the behavior system to address these. Any vision models should be trained using data from varied times of day and lighting conditions to prepare them for round-the-clock work.

Resilience

The behavior system should support resilience to changing task conditions and attempt to recover or work around gaps in reliability and robustness. We define resilience to mean being responsive and creative when facing task failures. For example, when the robot is trying to turn a door handle, perhaps a human is present and trying to test the reactivity of the system. In this case, the behavior system should be able to identify that the task is not proceeding nominally and enact some retry strategy. Retry strategies can include simply retry the action sequence, mutating the pose-grasp sequence, or aborting mission entirely and doing something else. Resilience ultimately means surviving day-to-day unexpected events and failures. Attaining resilience is a long-tail robotics problem, but the prior examples are good places to start.

Independence from External Systems

Ideally, the robot can execute behaviors without any connection to the outside world, given the behaviors have been authored and set up ahead of time. This mirrors animals in nature and supports a level of robustness by removing an unintuitive dependence on network communication. It also allows the robot to operate in more environments, including inside buildings with thick concrete walls and rural areas.

Some behavior systems rely on external perception. It is desirable to perceive the world only via the robot and not be dependent on motion capture systems or fiducial markers, which are implements of a laboratory setup. We want the robots using our behavior system to thrive beyond the lab environment and provide useful service in the real world. It is also desirable to have humanoid robots be a drop-in replacement for human workers without having to make robot-specific adjustments to the environment, such as placing fiducial markers. It would be better if robots were to read the same signage and maps as humans.

Dependence on Only Passive Color Vision

By using only passive color vision, the robot mirrors human nature and is more robust to varied lighting conditions. For example, structured light projection sensors can have degraded performance outdoors and in the presence of certain frequencies of light. Additionally, it can be more intuitive and understandable when the robot’s vision modality is similar to human vision. Since we are building humanoid robots to fill in for humans, it could be a more surefire drop-in replacement by matching the mode of vision.

Adaptability of the Operator-Robot Team

Given the near infinite world of tasks robots could help us with, we want a behavior system that can support creating new and adapting existing behaviors to tackle them. Adaptability means being able to survive in a changing environment and, for robots, this means you must be able to readily adapt robot behaviors to changing needs. One of the selling points of humanoid robots is their generality and similarity to humans, which means one application of them is to fill in for human work. Given the adaptiveness of humans and the existence and competitiveness of purpose-built machinery, the value-add for robots must exist in the realm of being an adaptable generalized form. Therefore, it is desirable to be able to create and modify behaviors to tackle various dull, dirty, and dangerous work tasks in a quick time frame.

The following three characteristics are inherently required in building in the adaptability components, as defined by Coactive Design .

Observability

This means knowing the current state of the system in order to understand what is going on. For a robot, there is a lot of information to take in at any particular moment and there are different levels of granularity in doing so. Knowing the current state is required for a human operator to reason about the behavior to modify or adapt it. It is also required to monitor what the robot is accomplishing and determine if it needs help. Let’s list some of the biggest ones:

  1. Seeing the current configuration of the robot’s body and hands, visual elements that indicate current forces on the environment, robot hardware status, motor temperatures, and joint faults.

  2. Seeing what the robot sees such as the current robot view video stream(s) and current semantic object identification.

  3. Getting a feel for the robot’s immediate environmental surroundings and how the robot is situated in it. For example, this can be done via a colored depth point cloud in the 3D view with the robot configuration. If the robot doesn’t have 360-degree vision, mapping may be required or the robot could move the head around to rescan.

  4. Knowing the current state of the behavior system and whole-body controller. For example, what state is the behavior in? Is anything currently executing? Has anything failed? What have we done in the recent past and was it successful?

  5. Knowing the robot’s current model of the environment. Which objects does it know about? Where does it think they are in 3D? Does the robot know where it is on a map? Is it aware of the major obstacles nearby?

Predictability

This one is about a sense of what is going to happen next both with the robot and the environment. This is required for a human operator to create, adapt, and diagnose behaviors. For example, when authoring the next action(s), it is desirable to see a preview of the motion of the robot and environment as a way of verifying that action. The preview can be inspected for collisions or bad inverse kinematics solutions to avoid failures before executing it on the real robot. Predictability also goes hand-in-hand with authoring at runtime and in mission-critical field scenarios. For example, in the DARPA Robotics Challenge, many tasks, such as getting out of the car, were a step-by-step sequence of predefined robot motions. The robot was being teleoperated live and if the robot fell, the competition would be lost. When doing this task, a preview of the next motions allowed the team to inspect the plan before executing it, increasing confidence and the reliability of the operator-robot team.

On a technical level, it is possible to provide predictability of whole-body motions by playing back a planned animation of future motion, as a transparent colored robot. Primitive graphics like footsteps can be shown to convey where the robot will step next. Color can be used to convey feasibility. For example, a blue transparent graphic of an arm can represent a feasible solution and it can turn red to notify the operator of an infeasible or hard-to-reach configuration. On a longer horizon, a browsable list of actions could be shown which lists all future actions and sub-sequences in the behavior.

Directability

The last of Johnson’s three characteristics in Coactive Design is directability. It is a measure of how expressive the operator can be in commanding the robot to do things. For a humanoid robot, at the basic level, it means being able to command the robot to take steps, walk, move its hands, look around with its head, and generally pose the whole body. At a higher level, the availability of planners increases expressiveness. Good examples of high expressiveness would be the ability to ask the robot to clean a room or to fetch a particular object. We also extend the scope of directability to include non-direct ways of commanding the robot, such as tuning parameters of primitive actions, behavior logic, perception, or scene management. In this way, we want to not only directly command the robot’s physical actions, but also its cognitive model of the world and its plan.

Learnability (Operator Learning Curve)

The operator interface should be designed in a way that facilitates a novice operator in learning how to use it. The behavior operator interface should be interactive and guide the user with cues to point them in the right direction and give them confidence that what they did is what they wanted to do. Nielsen’s 10 Usability Heuristics for User Interface Design is a good reference point for designing a user interface . These 10 heuristics include: visibility of system status, match between the system and the real world, user control and freedom, consistency and standards, error prevention, recognition rather than recall, flexibility and efficiency of use, aesthetic and minimalist design, help users recognize, diagnose, and recover from errors, and help and documentation.

Understandability (of the Implementation)

It would be nice if it were easy to learn how the behavior system works by reviewing the code and observing behaviors in operation. Viewing a behavior’s composition in the user interface should give a good idea of what the behavior does by being organized and supporting abstractions. The use of hierarchical abstractions, for example, can allow the reader to understand the high level at a glance and dive deeper where they want to learn more.

Usability

The user interface should be easy to use, even for an expert operator. Functionality should be organized in meaningful ways, such as grouping like functionalities, scene objects by category, and organizing primitive actions by part of the body. When behaviors get large, there should be mechanisms to abstract their contents into high-level parts. One way to do this, for example, is to structure behaviors hierarchically, such that the higher level layers are more generic, like “navigate to room C”, and lower level layers are more specific like “move hand forward 5 centimeters”. Functionalities should be organized into menus. Buttons and checkboxes should be easy to click. Text and widgets should be easy to read and size-adjustable.

Ability to Analyze in Post

A lot can happen in the course of a robot run and sometimes it can happen very fast. When there are failures or potential improvements, it is often useful to do a post-mortem analysis of what happened. This characteristic is desirable especially because running and supervising robots is stressful and requires attention. We want the ability to log all the data for a robot run and dive into that data later, without the cognitive overhead of the live run. This also gives the operator or behavior engineer the opportunity to view the system in non-live ways. They do not necessarily need the same observability data; they can choose to deep dive into control or logic data, using screen real estate for that instead of live monitoring. The logged data should include (ideally lossless) recordings of the robot’s sensors, behavior state and parameter data over time, controller variables over time, robot configuration state over time, and more. This also implies the availability of post-mortem analysis software, which would allow the logged data to be explored in a rich and interactive way. Examples of this include a slider to scrub data over time, a 3D reconstruction of robot configuration state and 3D depth data, and time plots of controller and behavior logic variables.

Debuggability

The system should provide outputs that assist in debugging when things go wrong or while bringing up new functionalities. Examples of this include good print statements in the robot processes and logging them, sending log messages from the robot to the user interface at runtime, and coloring the log messages by severity and importance. Dynamic user interface elements can also be helpful, for example, when an action fails, making it blink red to draw the operator’s attention. Another way to support debugging is to carefully select a representative set of state variables to log in a time-dependent buffer. These buffers can be streamed live or stored to disk and viewed as scrubbable plots.

Testability

It is desirable to be able to test the system in an automated way. This could be with the real robot, virtually with real data, or using fully simulated data. For example, having test fixtures available for code continuous integration tools to perform simulated behaviors and inspect the results for success and performance characteristics would be helpful to ensure quality and prevent regressions. Testing often requires significant resources as in the case of real robot automated testing. To support these cases, the behavior system should be able to be operated in an automated way and not just by a human operator.

Another case that requires significant resources is fully simulated testing. It can be very difficult to reduce the sim-to-real gap for loco-manipulation behaviors that need realistic vision and physics. Tasks often need to be rigged as articulated simulation assets as in the case of doors, which is a manual process that requires expertise. However, there is also a middle ground in which tradeoffs can be made and components replaced with dummies. For example, poses of objects in the scene could be given via ground truth knowledge, bypassing the vision system entirely or partially.

Extendability

We’re still firmly in the early stages of humanoid robots starting to work well. Any system for running behaviors on humanoid robots should be easy to extend, functionality-wise, to keep pace with the state of the art and maintain competitive usefulness. For example, given the availability of a new footstep planner, it should be a straightforward process to include it as an option. Likewise, if a new comms protocol is adopted, it should not require a complete redesign of the architecture to switch over. Some ways that could help in achieving extendability are keeping the code well tested and maintaining separation of concerns in the design and implementation.

There are a lot of different ways to achieve these characteristics, and in some sense there are tradeoffs depending on your specific requirements. The tradeoffs could be in engineering time or they could be theoretical. For example, if you know your system will only be used by trained expert operators, you can invest more engineering time in the functional and utilitarian aspects like reducing number of clicks or relying more heavily on keyboard shortcuts. However, if your system needs to be usable by a more general audience, more engineering time needs to be spent on Nielsen’s 10 heuristics and you may even want to conduct a user study.

An example of a more theoretical tradeoff would be what you show to the operator at any given time. There is only so much screen real estate and operator attention that can go around, so hard decisions need to be made about the value of information. We think this will vary from system to system and ultimately is based on the confidence levels of the particular subsystems. For example, if you have high trust in your controller’s ability to walk and balance, you may not show balance information to the operator. Conversely, if you depend on a semantic object detection subsystem that is always failing to detect objects, you will likely want that visible in high detail at all times so you can monitor and learn how to either exploit its properties or make informed improvements.

Now that we have defined some desirable characteristics of a good behavior architecture, we’ll tell the story of our journey in navigating the tradeoffs and building one from near-scratch that met our requirements.