Shared Autonomy in Unprepared Environments

Pilots perform a teleoperated manipulation task using the Jason remotely operated vehicle (ROV) (inset).

Existing robot systems designed for unprepared environments generally provide one of two operating modes—full teleoperation (primarily in the field) or full autonomy (primarily in the lab). Teleoperation places significant cognitive load on the user. They must reason over both high- and low-level objectives and control the robot’s low-level degrees-of-freedom using often unintuitive interfaces, while also interpreting the robot’s various sensor streams. As a result of this continuous demand for their visuospatial attention, users typically have a severely deficient situational awareness of the robot’s operating environment. Second, the challenge of controlling each of the mobile manipulators individual degrees-of-freedom precludes motions that involve complex coordination of the joints, thereby limiting dexterity and efficiency. Tasks amenable to full autonomy are inhibited by a robot’s limited proficiency at intervention (grasping and manipulation), long-term planning, and adapting to dynamic, cluttered environments. The ability to function in the continuum that exists between full teleoperation and full autonomy would enable operations that couple the complementary capabilities of humans and robots, improving the efficiency and effectiveness of human-robot collaboration.

Shared autonomy provides a framework for human-robot collaboration that takes advantage of the complementary strengths of humans and robots to achieve common goals. It may take the form of shared control, whereby the user and agent both control the same physical platform, or human-robot teaming, whereby humans and robots operate independently towards a shared goal.

Shared Autonomy for Remote Underwater Manipulation

(A) Illustration of the sampling process with SHARC. Remote scientists (green) using SHARC-VR (headset icon) and SHARC-desktop (monitor icon) collaborated with the onboard crew (blue) to take an XRF measurement and push core sample of a microbial mat within the San Pedro Basin. (B) XRF spectra indicate elevated iron concentrations in the microbial sample (red) above ambient (blue). (C) Snapshot of a representative video frame broadcasted with SHARC during measurement. (D) From the SHARC-VR interface, users can specify task-level objectives with hand gestures and natural language. SHARC automates the low-level planning and control, enabling safe and robust manipulation over low-bandwidth connections.

Scientific exploration of the deep ocean is vital for understanding natural Earth processes, but remains inaccessible to most. Dexterous sampling operations at depth are typically conducted by robotic manipulator arms onboard remotely operated vehicles (ROVs), which are directly teleoperated by pilots aboard surface support vessels. This presents barriers to access due to the infrastructure, training, and physical ability requirements for at-sea oceanographic research. Enabling shore-based participants to observe and control robotic sampling processes can reduce these barriers; however, the conventional direct-teleoperation approach is infeasible for remote operators due to the considerable bandwidth limitations and latency inherent in satellite communication. Thus, some degree of ROV autonomy is required to support remote operations.

To address this need, we have been working with collaborators at the Woods Hole Oceanographic Institution (WHOI) to develop the SHared Autonomy for Remote Collaboration (SHARC) framework (missing reference), which enables remote participants to conduct shipboard operations and control robotic manipulators using only a basic Internet connection and consumer-grade hardware, regardless of their prior piloting experience. SHARC extends current supervisory control methods (Billings et al., 2021) by enabling real-time collaboration between multiple remote operators, who can issue goal-directed commands through speech and hand gestures. SHARC couples these natural input modalities with an intuitive 3D workspace representation that segments the workspace and actions into a compact representation of known features, states, and policies. The flexibility of language enables users to succinctly issue complex commands that would otherwise be difficult to execute with conventional controllers. In addition to reducing cognitive load, the intuitive nature of speech and gestures minimizes the training required for operation and makes SHARC accessible to a diverse population of users. These natural input modalities also have the benefit of remaining functional under intermittent, low-bandwidth, and high-latency communications.

The ability to involve remote users during field operations became particularly important during the COVID-19 pandemic, when space onboard research vessels was especially restricted. During an oceanographic expedition in the San Pedro Basin of the Eastern Pacific Ocean, our remote team members operated the Nereid Under Ice (NUI) vehicle from thousands of kilometers away using SHARC’s virtual reality (VR) and desktop interfaces. The team collaboratively collected a physical push core sample and recorded in-situ X-ray fluorescence (XRF) measurements of seafloor microbial mats and sediments at water depths exceeding 1000 m, while being physically located in Chicago, Boston, and Woods Hole.

Model-free Shared Autonomy for Continuous Control

Human pilot trajectories in LunarLander (top) and LunarReacher (bottom) when controlling a spacecraft initialized with a random velocity with (left) and without (right) our shared autonomy agent. Trajectories rendered in green successfully landed between the flags or reached the target (large red circle). Red trajectories are those that crashed or went out of bounds. Circles are spaced evenly in time along each trajectory, with larger separation indicating greater velocity. The same assistant is used for both tasks and despite having no task-specific knowledge, it greatly improves the success rate of several humans and simulated pilots.

Early work in shared autonomy assumes that the user’s goals are known to the agent, which is rarely realized in practice. Recent methods instead infer the user’s goal from their actions and environment observations. These methods often assume a priori knowledge of the environment dynamics and the set of possible goals, and require access to demonstrations or the user’s policy for achieving each goal.

These assumptions can be limiting in practice, preventing the use of shared autonomy beyond simple tasks performed in structured, uncluttered environments. For example, estimating environment dynamics can be harder than learning to solve the task itself. Additionally, the goal space may be large, unknown, or may change over time. This will make it difficult or impossible to accurately infer the user’s goal or learn the user’s policy for each goal. At best, the tendency for goal inference to require the user to get close to the goal diminishes the advantages of shared autonomy. Inspired by recent work on shared autonomy, we seek to extend shared autonomy to more complicated domains through a framework in which the agent has no knowledge of the environment dynamics, the space of goals, or the user’s intent.

To that end, we developed a model-free deep reinforcement learning (RL) approach to shared autonomy (Schaff & Walter, 2020). Model-free deep RL has achieved great success on many complicated tasks such end-to-end sensor-based control, and robot manipulation and control. We avoid assuming knowledge of human’s reward function or the space of goals and instead focus on maintaining a set of goal-agnostic constraints. For example, a human driver is expected to follow traffic laws and not collide with other vehicles, pedestrians, or objects regardless of the destination or task. This idea is naturally captured by having the agent act to satisfy some criteria or set of constraints relevant to multiple tasks within the environment. Without knowing the task at hand, the robot should attempt to minimally intervene while maintaining these constraints.

Shared autonomy methods differ in the manner in which the agent augments the control of the user, which requires balancing the advantages of increased levels of agent autonomy with a human’s desire to maintain control authority. This complicates the use of standard deep reinforcement learning approaches, which traditionally assume full autonomy. In an effort to satisfy the user’s control preference, we approach shared autonomy from the perspective of residual policy learning, which learns residual (corrective) actions that adapt a nominal “hard-coded” policy. In our case, the (unknown) human policy plays the role of the nominal policy that the residual shared autonomy agent corrects to improve performance. We find this to be a natural way to combine continuous control inputs from a human and an agent.

Using this method, we are able to create robotic assistants that improve task performance across a variety of human and simulated actors, while also maintaining a set of safety constraints. Specifically, we apply our method in two assistive control environments: Lunar Lander and a 6-DOF quadrotor reaching task. For each task, we conduct experiments with human operators as well as with surrogate pilots that are handicapped in ways representative of human control. Trained only to satisfy a constraint on not crashing, we find that our method drastically improves task performance and reduces the number of catastrophic failures.

Learning Assistive Policies Without Reward

The role of the agent is to complement the control authority of the user, whether to improve the robot’s performance on the current task or to encourage/ensure safe behavior. An important consideration when providing assistance via shared autonomy is the degree to which the agent balances the user’s preference for maintaining control authority (i.e., the fidelity of the assisted behavior relative to the user’s actions), and the potential benefits of endowing more control to the agent (i.e., the conformity of the assisted behavior to that of an autonomous agent). In our work described above, we model this trade-off by using the agent to provide residual actions that correct those of the user, with the inherent assumption that the action spaces of the user and the agent are the same. By minimizing the norm of this residual correction, we encourage method to maintain the user’s control authority.

Like other recent work on shared autonomy, our method treats the user as a part of the environment, using an augmented state that includes the user’s action. Framing the problem in this way has a clear and significant advantage—it allows us to utilize the modern suite of tools for deep RL. However, these methods (including our own) have two notable limitations. First, they nominally require human-in-the-loop interaction during training in order to generate user actions while learning the assistant’s policy. Since the sample complexity of deep RL makes this interaction intractable, these methods replace the human with a surrogate policy. If this surrogate is misspecified or invalid, this approach can lead to copilots that are incompatible with actual human pilots, as we found in our previous work (Schaff & Walter, 2020). Second, these methods require access to task-specific reward during training, which may be difficult to obtain in practice.

Our framework utilizes a diffusion model to adapt a user's action (red) to those from a demonstration distribution (green) in a manner (blue) that balances a user's desire to maintain control authority with the benefits (e.g., safety) of conforming to the desired (demonstration) distribution. Without knowledge of the user's specific goal (e.g., the landing location), the demonstration distribution reflects different goals that the demonstration trajectories previously reached.

In light of these limitations, we developed a model-free approach to shared autonomy (Yoneda et al., 2023) that interpolates between the user’s action and an action sampled from a generative model that provides a distribution over desired behavior. Our approach has the distinct advantage that it does not require knowledge of or access to the user’s policy or any reward feedback during training, thus eliminating the need for reward engineering. Instead, our training process, which involves learning the generative model, only requires access to trajectories that are representative of desired behavior.

The generative model that underlies our approach is a diffusion model, which has proven highly effective for complex generation tasks including image synthesis. Diffusion models consist of two key processes: the forward process and the reverse process. The forward process iteratively adds Gaussian noise to the input with an increasing noise scale, while the reverse process is trained to iteratively denoise a noisy input in order to arrive at the target distribution. As part of this denoising process, the model produces the gradient direction in which the likelihood of its input increases under the target distribution. Once the model is trained, generating a sample from the (unknown) target distribution involves running the reverse process on a sample drawn from a zero-mean isotropic Gaussian distribution.

As we have empirically found, a direct use of diffusion models for shared autonomy ends up generating actions that ignore the user’s intent (i.e., they are low fidelity), even though the action would be consistent with the desired behaviors (i.e., high conformity to the target behaviors). To address this, we proposed a new algorithm that controls the effect of the forward and reverse process through a forward diffusion ratio that regulates the balance between the fidelity and the conformity of the generated actions. The forward diffusion ratio provides a formal bound on the extent to which the copilot’s action deviates from that of the user.

References

  1. To the Noise and Back: Diffusion for Shared Autonomy
    Takuma Yoneda, Luzhe Sun, Ge Yang, Bradly Stadie, and Matthew Walter
    arXiv:2302.12244 Feb 2023
  2. Towards Automated Sample Collection and Return in Extreme Underwater Environments
    Gideon Billings, Matthew R. Walter, Oscar Pizarro, Matthew Johnson-Roberson, and Richard Camilli
    Field Robotics 2021 (Accepted, to appear)
  3. Residual Policy Learning for Shared Autonomy
    Charles Schaff, and Matthew R. Walter
    In Proceedings of Robotics: Science and Systems (RSS) Jul 2020