36584072-8

2024_06_14_fc4160699dcb1e67ce45g

Mar Gonzalez-Franco and Andrea Colaco, Google
Mar Gonzalez-Franco 和 Andrea Colaco，Google

Most of our interactions with digital content currently occur inside 2D screens. Moving from that format to immersive setups, however, brings a paradigm shift: from content inside the screen to users inside the content. This change requires us to revisit how we blend the analog and the digital and how we transfer content between the two modes-perhaps it even asks for new guidelines, too. While different solutions appear in the space, the gulf between the two worlds only seems to widen. We can start to see what works, and what does not work so well, in an empirical or ethnographic approach, beyond laboratory studies. But if we want to accelerate adoption, we need to better understand how current tasks can be improved and how this new form of interaction can increase productivity. In this article, we analyze and converge what we think works, and envision how this new set of immersive devices and interactions can enable productivity beyond already existing tools.
我们与数字内容的大多数互动目前发生在 2D 屏幕内。然而，从那种格式转移到沉浸式设置带来了范式转变：从屏幕内的内容到用户内的内容。这种变化要求我们重新审视如何融合模拟和数字以及如何在两种模式之间传输内容-也许甚至需要新的指导方针。虽然在这个领域出现了不同的解决方案，但两个世界之间的鸿沟似乎只会加大。我们可以开始看到什么有效，以及什么效果不太好，通过经验主义或民族志方法，超越实验室研究。但如果我们想加速采用，我们需要更好地了解如何改进当前任务以及这种新形式的互动如何提高生产力。在本文中，我们分析并汇聚了我们认为有效的内容，并设想了这一新一套沉浸式设备和互动如何能够超越已有工具来提高生产力。

BACKGROUND 背景

We'll start with some lessons from previous inflection points in computing using a simplified history of events. First, we had computers. They became smaller and cheaper so they could be used at home. Then, the Internet
我们将从计算机领域以前的一些拐点中汲取教训，使用一个简化的事件历史。首先，我们有了计算机。它们变得更小更便宜，因此可以在家中使用。然后，互联网。
connected everything. Tech then got even smaller, so much so that we could move to mobile computing, which eventually jumped to our phones. And that is where we are now.
连接了一切。技术变得更小，以至于我们可以转向移动计算，最终跳到了我们的手机上。这就是我们现在的地方。

In retrospect, we can see how all these evolutionary leaps have coexisted. Despite all sorts of predictions, phones didn't kill the PC, and probably neither will immersive tech. They are likely to coexist for the foreseeable future. This is one of the early points we want to make: the importance of interoperability.
回顾起来，我们可以看到所有这些进化飞跃是如何共存的。尽管有各种各样的预测，手机并没有淘汰个人电脑，而且可能沉浸式技术也不会。它们很可能会在可预见的未来共存。这是我们想要强调的早期观点之一：互操作性的重要性。

Let's consider a world where digital content finally moves out of 2D screens into our 3D worlds (real worlds). As opposed to having two realities-a metaverse (virtual) and a real worldextended reality (XR) aims to blend the two. In this framework, there are some things that will continue being 2D planes even inside 3D; for example, documents. But these panels might resynthesize in our surroundings in more-affordable ways than on an actual screen: on our walls, on top of tables, attached to other objects that make them tangible, or taking into account the user for optimal ergonomic size and position. And yes, that means
让我们考虑一个数字内容最终从 2D 屏幕转移到我们的 3D 世界（真实世界）的世界。与拥有两个现实-元宇宙（虚拟）和真实世界扩展现实（XR）旨在融合这两个现实相反。在这个框架中，有一些东西将继续保持 2D 平面，即使在 3D 内部；例如，文件。但这些面板可能以更经济的方式重新合成在我们的周围，而不是在实际屏幕上：在我们的墙上，桌子上方，附着在其他物体上使它们有形，或者考虑用户的最佳人体工程学大小和位置。是的，这意味着

sometimes content might be just shown as screens inside XR. But there will always be a component of spatially arranging things, whether they are 2D or 3D.
有时内容可能只是显示为 XR 内的屏幕。但无论是 2D 还是 3D，空间排列事物的组成部分总是存在的。

Let's take a deeper look at the spatial and ergonomic components together. We will focus on wearable sets of glasses (even in the primitive form of an AR phone; since it is held in the hand, it could be temporarily considered a wearable). This type of immersive tech is an interface between the user and their environment, inviting embodied interaction. The dichotomy is then between body-locked content and interactions versus world-locked ones. Interestingly, in XR the boundaries are more dynamic; our input systems as well as our content will have many more options, which we discuss later.
让我们一起深入了解空间和人体工程学组件。我们将专注于可穿戴眼镜套装（即使是以 AR 手机的原始形式；因为它被手持，可以暂时被视为可穿戴设备）。这种沉浸式技术是用户与环境之间的界面，邀请身体互动。然后是身体锁定内容和互动与世界锁定内容之间的二分法。有趣的是，在 XR 中，边界更加动态；我们的输入系统以及我们的内容将有更多选择，我们稍后会讨论。

Mundane productivity topics, like interoperability or input, are the scaffolding for collaboration, multitasking, transitions, and interruptions. There are many other things users will need to do, but these are the ones we focus on here and that we believe are core for productivity.
平凡的生产力主题，如互操作性或输入，是协作、多任务处理、过渡和中断的支架。用户还需要做很多其他事情，但这些是我们在这里关注的重点，我们认为这些是生产力的核心。

If these topics are not addressed well, VR will not be widely adopted for work scenarios, especially for information workers, who currently spend most of their time using PCs. What can VR do better? is a good research question. For now, though, let's make sure VR isn't worse at these topics than current PCs.
如果这些问题没有得到很好解决，VR 将不会被广泛应用于工作场景，特别是信息工作者，他们目前大部分时间都在使用个人电脑。VR 能做得更好吗？这是一个很好的研究问题。不过，现在让我们确保 VR 在这些方面不比当前的个人电脑更差。

INFORMATION WORKERS 信息工作者

Information workers, also known as knowledge workers, are those who spend most of their productive time enabled by computers. Other workers, such as frontline workers in factories, farms, or other real-world settings, will also experience a big improvement in productivity when they are able to augment their realities. In fact, the impact on their productivity will perhaps be even greater than that of information workers (Figure 1), as many real-world tasks still lack advanced assisted computation, whereas the improvement in productivity with VR for information workers might be marginal.
信息工作者，也被称为知识工作者，是那些大部分生产时间都在计算机的帮助下度过的人。其他工作者，比如工厂、农场或其他现实世界环境中的一线工作者，在能够增强他们现实情境时，也会体验到生产力的大幅提升。事实上，他们的生产力影响可能会比信息工作者的还要大（图 1），因为许多现实世界的任务仍然缺乏先进的辅助计算，而信息工作者通过虚拟现实提高生产力的效果可能较小。

In this article we focus on how immersive tech will revolutionize productivity for information workers This narrower initial focus is for three reasons:
在本文中，我们关注沉浸式技术将如何彻底改变信息工作者的生产力。这种较窄的初始关注有三个原因：

Information workers are already intensively using devices and adopting new software and gadgets on a regular basis. They can be considered early adopters, the power users of digital content.
信息工作者已经在日常基础上密集地使用设备并采用新软件和小工具。他们可以被视为数字内容的早期采用者，数字内容的高级用户。
Safety issues will be reduced. Working at a desk is a much safer control space. Locomotion, the need to move around, is reduced and scenes are more constrained, with more-limited sets of objects, reducing dependencies from scene-understanding algorithms. It is an ideal petri dish for early XR, where the real world can start to blend with VR, with pass-through views and so on.
安全问题将会减少。在办公桌上工作是一个更安全的控制空间。移动的需求减少了，场景更受限制，物体集更有限，减少了对场景理解算法的依赖。这是早期 XR 的理想培养皿，现实世界可以开始与虚拟现实融合，具有透视视图等功能。
Ergonomic issues will be reduced. Head-mounted displays (HMDs) in a multidevice scenario might be used only temporarily, which means that tethered cables, as well as the size or weight of the device, might be less critical.
人体工程学问题将会减少。在多设备场景中，头戴式显示器（HMDs）可能仅被临时使用，这意味着有线电缆、设备的大小或重量可能不那么关键。

It's clear that HMDs, computer vision, and

will improve over time, enabling many other forms of XR. This follows the trend we have already seen with other specs that have improved in the past decade. We can comfortably read text in most HMDs, with

displays complemented by advances in optics-pancake lenses or threeelement lenses-finally making these devices viable for information workers.
很明显，HMD、计算机视觉和

将随着时间的推移得到改善，从而使许多其他形式的 XR 得以实现。这符合我们已经看到的其他规格在过去十年中得到改善的趋势。我们可以在大多数 HMD 中舒适地阅读文本，

显示器配合光学透镜（煎饼透镜或三元透镜）的进步，最终使这些设备对信息工作者来说可行。

WHY ADOPT VR FOR PRODUCTIVITY?
为什么采用虚拟现实提高生产力？

If people are going to adopt VR for work, it is because some tasks become easier or faster to perform in this medium. Researchers have been trying to find which specific tasks benefit. We can, for example, augment experiences by visualizing more information in context, augment presentations inside VR, and improve meeting experiences.
如果人们要在工作中采用虚拟现实技术，那是因为在这种媒介中执行某些任务变得更容易或更快。研究人员一直在努力找出哪些具体任务受益。例如，我们可以通过在上下文中可视化更多信息来增强体验，在虚拟现实中增强演示，并改善会议体验。

Indeed, meeting with other humans is an experience complex enough that we haven't managed to re-create it with video conferences in 2D; maybe VR can help with its spatial audio and vision. VR can enable more ecological validity and unlock evolutionary wonders such as directed attention, peripersonal spaces, and concurrent taking, as well as unlock the ability to use our body for interaction (pointing, gazing, and enabling spatial formations). Perhaps it will even enable the use of whiteboards in direct ways by multiple users, and ultimately support a form of collaborative spatial work unavailable with traditional 2D screens.
事实上，与其他人类见面是一种足够复杂的体验，我们尚未能够通过二维视频会议来重新创造它；也许虚拟现实可以通过其空间音频和视觉来帮助。虚拟现实可以实现更多的生态效度，并解锁诸如定向注意力、周围空间和同时进行的奇迹，以及解锁利用我们的身体进行互动（指向、凝视和启用空间形成）的能力。也许它甚至可以通过多个用户直接方式使用白板，并最终支持一种与传统二维屏幕不可用的协作空间工作形式。

In general, there is agreement that experiential tasks are good candidates to be improved with VR, even if they don't require colocated participation. That means going beyond meetings, to affect learning with improved recall and hippocampal activity. These experiential tasks are particular use cases for productivity, however, and might not justify full adoption of VR.
一般来说，人们普遍认为体验性任务是可以通过虚拟现实技术进行改进的良好选择，即使这些任务并不需要共同参与。这意味着不仅仅局限于会议，还可以通过提高回忆和海马体活动来影响学习。然而，这些体验性任务对于提高生产力是特定的用例，可能并不足以证明完全采用虚拟现实技术的必要性。

If these topics are not addressed well, VR will not be widely adopted for work scenarios, especially for information workers, who currently spend most of their time using PCs.
如果这些问题没有得到很好解决，VR 将不会被广泛应用于工作场景，特别是信息工作者，他们目前大部分时间都在使用个人电脑。

A different look into VR productivity opportunities can focus on the uniqueness of the medium, instead of on specific tasks. VR has traditionally been labeled as an isolating medium. But despite the fact that this isolating effect can go away with current and future video pass-through technology, the ability to transform HMDs into a monastery on demand has big potential for helping with productivity. That could mean XR offers an increased capacity to upstream focus, creating fewer distractions so we can channel larger chunks of attention to work. The monastery example is perhaps very extreme, but it is illustrative of the "private space on the go" potential of this technology.
对 VR 生产力机会的不同看法可以聚焦在媒介的独特性上，而不是特定任务。传统上，VR 被标记为一种孤立的媒介。但尽管这种孤立效应可以通过当前和未来的视频透传技术消失，将 HMDs 转变为按需的修道院具有帮助生产力的巨大潜力。这可能意味着 XR 提供了更大的上游关注能力，减少了干扰，使我们能够将更多的注意力集中在工作上。修道院的例子可能非常极端，但它说明了这项技术的“随时随地的私人空间”潜力。

Even if it sounds like the antithesis of focus, the other superpower of VR could be its scaffolds for multitasking. These scaffolds would be enabled by its large horizontal field of view (FOV), which would become the largest real estate display of any available device, providing users access to an optimal set of concurrent screens, applications, and fast layouts at the same time. These two properties transcend the type of task and highlight particularities of the medium. If you want to focus, go to VR. If you want to multitask, go to VR. These will need to be enabled by both the hardware and software, however, and with good practices, which we highlight below.
即使听起来像是专注的对立面，VR 的另一个超级能力可能是它的多任务支架。这些支架将由其大水平视野（FOV）启用，这将成为任何可用设备的最大房地产显示，为用户提供同时访问一组最佳的屏幕、应用程序和快速布局。这两个属性超越了任务的类型，并突出了媒体的特殊性。如果你想专注，去 VR。如果你想多任务，去 VR。然而，这些都需要硬件和软件的支持，并且需要良好的实践，我们在下面进行了重点介绍。

PASS-THROUGH VERSUS SEE-THROUGH
透过与透视

In pass-through video HMDs (Figure 2), a complete occlusion of the real world is possible by turning off the video feed. A set of cameras record the real world and then stream the recording to the (opaque) displays inside the headset. When the camera feed is off, a user feels they are in another location entirelyfull VR-and their presence in that location can be so strong that they forget about their real-world surroundings [1].
在透视视频头戴式显示器（图 2）中，通过关闭视频源，可以完全遮蔽现实世界。一组摄像头记录现实世界，然后将录像流式传输到头戴式显示器内（不透明）。当摄像头关闭时，用户会感觉自己完全处于另一个位置，完全进入虚拟现实，他们在那个位置的存在感可能会非常强烈，以至于忘记了周围的现实环境。

In optical see-through HMDs, the user wears transparent glasses and can overlay projections of synthetic content on top of the real world (Figure 2). In see-through devices, totally occluding the world would probably require a display the size of the human FOV, and the technology isn't there yet.
在光学透视头戴式显示器中，用户戴着透明眼镜，可以将合成内容的投影叠加在现实世界之上（图 2）。在透视设备中，完全遮挡世界可能需要一个与人类视野大小相当的显示器，但目前技术还未达到这一水平。

But the uninterrupted work-focus scenario—independent, if one is
但是不间断的工作焦点场景——独立，如果一个

Figure 1. Vignettes showing use cases of immersive technology for productivity. Left: Complementing the information worker experience. Center: Augmenting the real world for frontline workers, with in-context access to information le.g., with instructions on fixing a broken device). Right: Enabling a factory worker to better operate, design, and control a process through the use of augmented reality tools.
图 1.展示沉浸式技术用于提高生产力的使用案例。左：补充信息工作者的体验。中：为一线工人增强现实世界，提供上下文信息访问（例如，提供修理设备的指导）。右：通过增强现实工具，使工厂工人更好地操作、设计和控制流程。

Figure 2. Two main ways of blending digital and real content, with either optical see-through or video pass-through (sometimes referred to as video see-through). In optical see-through, the digital world is projected on a surface that has a level of transparency. In video pass-through, the eyes are completely occluded from the world with opaque displays.
图 2. 将数字和实际内容混合的两种主要方式，分别是光学透视和视频透视（有时称为视频透视）。在光学透视中，数字世界投影在具有一定透明度的表面上。在视频透视中，眼睛完全被不透明的显示屏遮挡住，无法看到外界。

Figure 3. Sketches of real desktops with multiple displays and devices. writing a document on a screen, answering emails, or preparing a set of slides-also needs to be compatible with the times when users will want to multitask and be connected to other devices and their real world. Most people don't want to live in isolation for their entire work life. They need to be able to transition in and out of such an immersive device, with reduced mental-switch burden.
图 3. 多个显示屏和设备的真实桌面草图。在屏幕上撰写文档，回复电子邮件，或准备一套幻灯片-还需要与用户希望进行多任务处理并连接到其他设备和现实世界的时代兼容。大多数人不希望在整个工作生活中与世隔绝。他们需要能够在这样的沉浸式设备中进出，减少心理切换负担。

GUIDELINES 指导方针

An HMD can become a complete interface between the user-their body and brain - and their environment. As a wearable, it is continuously adapting in first-person perspective-looking out, projecting in-affording an opportunity for reimagining computing. But we also need to make sure the basics are covered.
HMD 可以成为用户-他们的身体和大脑-与环境之间的完整接口。作为可穿戴设备，它在第一人称视角中不断适应-向外看，向内投影，为重新构想计算提供机会。但我们也需要确保基础知识得到覆盖。

At the bare minimum an HMD should do very well what the other devices already do: interacting with digital content. So, even at the risk of sounding trite, here we highlight the need to make sure this technology is interoperable, has good input systems, mediates interruptions and transitions, is accessible, and allows for multitasking. And that it does all this while reducing the mental cost of switching in and out. Perhaps, then, the
在最基本的层面上，头戴式显示器应该很好地完成其他设备已经做到的事情：与数字内容互动。因此，即使冒着显得陈词滥调的风险，我们在这里强调确保这项技术具有互操作性、良好的输入系统、调解中断和过渡、易于访问，并允许多任务处理的必要性。而且，在减少切换的心理成本的同时完成所有这些。或许，那么，

Figure 4. Architectural layout of two rooms with the overlay of common free space (red stripes: unavailable; green: free). In most cases, and as the number of users increases, areas of available overlay on dissimilar spaces will tend toward zero.
图 4. 两个房间的建筑布局，覆盖了共同的自由空间（红色条纹：不可用；绿色：空闲）。在大多数情况下，随着用户数量的增加，不同空间上可用覆盖区域将趋近于零。

"killer app" for HMDs is just a very good interaction paradigm that simplifies the use of this technology on a daily basis, even if for very short periods of time.
HMDs 的“杀手级应用”只是一个非常好的交互范式，简化了这项技术在日常生活中的使用，即使只是很短的时间。

Interoperability. Introducing any new device to an ecosystem comes at a cost. Immersive technologies for information workers arrive in a space that is already heavily populated with other devices. Workers use a large set of layouts with multiple devices and display configurations in their offices (Figure 3). They might be in a semipermanent setting or on the go, on mobile devices, tablets, or laptops. Workers expect their multiple devices to transfer content seamlessly and to be able to use the same set of apps with corresponding actions, perhaps even with the same inputs. This is a key requirement for VR devices that need to be designed within this context: to account for solo users who transition quickly between devices.
互操作性。将任何新设备引入生态系统都会带来成本。信息工作者的沉浸式技术进入一个已经拥挤着其他设备的空间。工作者在办公室使用大量布局，配备多个设备和显示配置（图 3）。他们可能处于半永久设置或在移动中，使用移动设备、平板电脑或笔记本电脑。工作者期望他们的多个设备能够无缝传输内容，并能够使用相同的应用程序集和相应的操作，甚至可能使用相同的输入。这是需要在此背景下设计的 VR 设备的关键要求：考虑到快速在设备之间切换的独立用户。

Collaboration. Interoperability is not just a single-user issue. People work together. VR users cannot expect everyone to be wearing an HMD. This will be especially true for early adopters, who will face hybrid interaction paradigms when other users don't have VR headsets. This puts emphasis on the importance of figuring out ways both for traditional users to engage in VR collaborative spaces and for VR users to appear on their collaborators' 2D tools. For example, using avatars to represent VR users might make more sense inside a regular videoconferencing tool [2] than inside collaborative VR, where the focus should instead be on how to tile spatially correct 2D participants in coherent spots of the VR environment.
合作。互操作性不仅仅是一个单用户问题。人们一起工作。虚拟现实用户不能期望每个人都戴着头戴式显示器。这对于早期采用者来说尤其如此，当其他用户没有虚拟现实头戴设备时，他们将面临混合交互范式。这强调了找出传统用户参与虚拟现实协作空间的方式以及虚拟现实用户出现在他们合作者的二维工具上的重要性。例如，使用化身来代表虚拟现实用户可能在常规视频会议工具中更有意义[2]，而不是在协作虚拟现实中，那里的重点应该是如何将空间正确的二维参与者平铺在虚拟现实环境的连贯位置。

Even as adoption grows, when more people have HMDs, users will not be able to assume their current spaces are similar enough to have totally free collaborative environments (Figure 4). It will be hard to share immersive spaces between people, and interaction might need abstractions of semantics from motions, meaning that if you are, for example, pointing at one object in your environment but that same object is positioned in a different location in the other person's environment, we will have to adjust that interaction. There will be some artificial repositioning of users and content in space according to the scene understanding in each specific case, trying to maintain certain interaction consistencies that enable both communication and collaboration.
随着采用的增长，当更多人拥有头戴式显示器时，用户将无法假设他们当前的空间足够相似，以至于完全自由地进行协作环境（图 4）。在人们之间共享沉浸式空间将变得困难，互动可能需要从动作中抽象出语义，这意味着，例如，如果您在环境中指向一个物体，但同一个物体在另一个人的环境中位置不同，我们将不得不调整该互动。根据每种特定情况中的场景理解，用户和内容在空间中会有一些人为的重新定位，试图保持某些互动一致性，以实现沟通和协作。

Interruptions and transitions.
中断和过渡。

Interruptions and transitions have a significant impact on productivity and workflow. Coworkers, kids, pets, app notifications, other devices, calls, emails, messages-they can all disrupt focus, break momentum, and lead to inefficiencies. While transitions between contexts to respond to interruptions isn't just a problem of immersive setups, it can be amplified in VR, where the HMD creates a visual barrier to the external world that can be overcome only with a good system for detection and mediation of interruptions.
中断和过渡对生产力和工作流程有重大影响。同事、孩子、宠物、应用通知、其他设备、电话、电子邮件、消息-它们都可能打断注意力，破坏动力，并导致低效。虽然在不同环境之间转换以应对中断不仅仅是沉浸式设置的问题，但在虚拟现实中可能会被放大，因为头戴式显示器会在外部世界之间创建视觉障碍，只有通过良好的中断检测和调解系统才能克服。

The truth is that inside VR even ordinary activities like drinking coffee could be considered interruptions. Being able to access the real world and bring parts of it to the VR environment in a blended manner will be key. This transformation of an isolated VR experience into an extended reality (XR) that is connected to its surroundings is essential for effective productivity.
事实是，在 VR 内部，即使是像喝咖啡这样的普通活动也可能被视为干扰。能够访问现实世界并以混合方式将其部分带入 VR 环境将是关键。将孤立的 VR 体验转变为与周围环境相连的扩展现实（XR）对于有效的生产力至关重要。

Once interruptions are presented inside the HMD, users will expect to seamlessly transition from one activity to another, perhaps even in and out of their devices.
一旦 HMD 内出现中断，用户将期望能够无缝地从一项活动过渡到另一项活动，甚至可能在设备内外之间进行切换。

One effective strategy is to handle as many external interruptions as possible inside VR, minimizing the need to takeThis is a key requirement for VR devices
一种有效的策略是尽可能在 VR 内处理尽可能多的外部干扰，最大限度地减少需要进行的操作。这是 VR 设备的关键要求
that need to be designed within this
需要在其中设计的
context: to account for solo users who
上下文：考虑到独立用户
transition quickly between devices.
快速在设备之间切换。

This is a key requirement for VR devices that need to be designed within this context: to account for solo users who transition quickly between devices.
这是 VR 设备的一个关键要求，需要在这种情况下设计：考虑那些快速在设备之间转换的独立用户。
off the headset. That means VR systems should ensure other devices are tracked, visible, and interactable, and that their screens are mirrored inside VR. Smart pass-through allows users to view the real world without taking off the headset. But full pass-through might not always be necessary. There are many different forms of adaptive pass-through that can enable this XR experience, ranging from full passthrough, segmented objects, and digital versions of the real world reconstructed via Gaussian splats or neural radiance fields (NeRFs). Dynamic chaperones, for example, can display relevant information through edge-detection filters that activate when movement is detected near the boundaries, serving as an intermediate step before enabling full pass-through. Partial pass-through and segmentation techniques can also be employed to preserve a sense of connection with real-world landmarks, such as a sofa, window, or bed, while immersed in VR-a sense of presence in the real world.
关闭耳机。这意味着 VR 系统应确保其他设备被跟踪、可见且可交互，并且它们的屏幕在 VR 内镜像。智能透视允许用户在不摘下耳机的情况下查看真实世界。但完全透视可能并非总是必要的。有许多不同形式的自适应透视可以实现这种 XR 体验，包括完全透视、分段对象以及通过高斯斑点或神经辐射场（NeRFs）重建的真实世界的数字版本。动态防护者，例如，可以通过边缘检测滤镜显示相关信息，当检测到边界附近的移动时激活，作为启用完全透视之前的中间步骤。部分透视和分割技术也可以用来保留与真实世界地标（如沙发、窗户或床）的联系感，同时沉浸在 VR 中-一种存在于真实世界中的感觉。

Indeed, presence is a unique feature of VR technologies that can create the illusion of being elsewhere, in another place; for users, this often means losing track of their real space [1]. In terms of productivity, however, it may be desirable to enable users to feel present simultaneously in both the virtual and real spaces.
事实上，存在感是虚拟现实技术的一个独特特征，可以营造出身临其境的错觉，仿佛置身于另一个地方；对用户来说，这通常意味着失去对他们真实空间的感知。然而，就生产力而言，让用户同时感受到虚拟空间和真实空间可能是可取的。

One solution to be in two placesthe virtual and the real-at the same time is to bring users to an intermediate space that closely resembles their current physical environment but in an improved format: a clutter-free, clean, and productive setting. By curating objects through blended reality, VR can effectively "clean your room," eliminating distractions and aiding concentration. Leveraging generative AI tools, XR can go beyond simply removing elements from the scene and generate an entirely new space where users feel present in both the physical and virtual realms (Figure 5).
一种同时存在于虚拟和现实两个地方的解决方案是将用户带到一个中间空间，该空间与他们当前的物理环境非常相似，但以改进的格式呈现：一个没有杂乱、干净且高效的设置。通过混合现实策划物体，虚拟现实可以有效地“整理您的房间”，消除干扰并帮助集中注意力。利用生成式人工智能工具，扩展现实可以超越简单地从场景中移除元素，并生成一个完全新的空间，让用户感觉同时存在于物理和虚拟领域（图 5）。

If this mediated interruptions system is well managed, it can reduce the likelihood of unnecessary interruptions while facilitating transitions and help achieve focus.
如果这个中介干扰系统得到良好管理，它可以减少不必要的干扰可能性，同时促进过渡并帮助实现专注。

Multitasking. Multitasking is an essential skill that greatly affects productivity, and it is closely tied to seamless transitions. Ultimately, multitasking involves handling activities simultaneously while swiftly switching between multiple tasks. It can be considered a by-product of an effective interruption management system. While some argue that multitasking can decrease productivity, when used appropriately it can increase efficiency, enabling individuals to make progress on multiple tasks concurrently. Multitasking allows people to optimize their time by avoiding idle periods; for example, when downloading files or waiting for a computational response, one can work on another task, thereby maximizing productivity throughout the day. Additionally, multitasking can help prevent monotony and provide mental stimulation. Though it might seem counterintuitive, switching between tasks helps individuals maintain focus and interest, effectively combating boredom and fatigue. This, in turn, can promote higher levels of engagement and motivation.
多任务处理。多任务处理是一项极为重要的技能，它极大地影响生产力，并与无缝过渡密切相关。最终，多任务处理涉及同时处理多项活动，同时迅速在多个任务之间切换。它可以被视为有效的中断管理系统的副产品。虽然有人认为多任务处理可能会降低生产力，但在适当使用时，它可以提高效率，使个人能够同时在多个任务上取得进展。多任务处理使人们能够通过避免空闲时间来优化他们的时间；例如，在下载文件或等待计算响应时，可以处理另一个任务，从而在一天中最大程度地提高生产力。此外，多任务处理可以帮助防止单调乏味并提供精神刺激。尽管这似乎有悖常理，但在不同任务之间切换有助于个人保持专注和兴趣，有效地对抗无聊和疲劳。这反过来可以促进更高水平的参与和动力。
In XR, multitasking can be further enhanced by the wide horizontal FOV and head tracking, which in essence create an "infinite 360 " real estate around the user that allows for multiple display arrangements. The ability to handle multiple tasks simultaneously and change layouts is also very inviting to render other devices inside the HMD, so the user can have everything accessible in one place.
在 XR 中，多任务处理可以通过宽广的水平视野和头部追踪进一步增强，本质上创造了一个围绕用户的“无限 360”房地产，允许多种显示布局。同时处理多个任务并更改布局的能力也非常吸引人，可以在 HMD 内渲染其他设备，使用户可以在一个地方访问所有内容。

Input system and content. Input is perhaps the hardest issue for VR, with numerous new interaction paradigms and input combinations to explore and an ever-divergent vocabulary of actions and interactions that continues to expand (Figures 6 and 7). HMDs offer new ways to interpret intent, attention, and action through enhanced sensing capabilities and wearable formats. But these newfound capabilities also come with challenges in terms of expressiveness, and often input
输入系统和内容。输入可能是 VR 中最困难的问题，有许多新的交互范式和输入组合需要探索，以及一个不断扩大的行为和互动词汇表（图 6 和 7）。头戴式显示器提供了通过增强的感知能力和可穿戴格式来解释意图、注意力和行动的新方式。但这些新发现的能力也带来了表达能力方面的挑战，通常也涉及输入。

Figure 5. Extended reality (XR) can also be used to declutter spaces or even transform spaces using generative Al to become less cognitively demanding (GenXR), where it's easier to focus. Some basic aspects remain, however, like the layout or the interactable devices to facilitate context switching. In this figure, we showcase how the decluttering can be enabled in passthrough situations, by inpainting (top) or by using generative Al that renders a whole new space with similar structural constrains (bottom).
图 5. 扩展现实（XR）也可以利用生成 AI 来减少空间杂乱，甚至转变空间，使其变得不那么需要认知负担（GenXR），从而更容易集中注意力。然而，一些基本方面仍然保留，比如布局或可交互设备，以促进上下文切换。在这个图中，我们展示了如何通过透视情况启用减少杂乱，通过修补（顶部）或使用生成 AI 渲染一个具有类似结构约束的全新空间（底部）。

FEATURE 特征

methods that capture the intent or the action might be lacking in other ways and not scale well to all activities.
捕捉意图或行动的方法在其他方面可能存在不足，并且无法很好地适用于所有活动。

Reachability versus vision ergonomy. Embodied interaction is finally a possibility, and many have been enamored of Minority Report paradigms, using hands to reach virtual objects and grab them (Figure 6). The problem is twofold; not only can we not assume everything will be within reach, especially for individuals with accessibility needs, but also bringing content too close can be visually uncomfortable, as it strains vergence and accommodation (it is un-ergonomic, for example, to look at things very close to the eyes) (Figure 6). The area within 30 centimeters from the eyes can be considered a no-zone for those reasons [3]. At the same time, users' arm length is also
可达性与视觉人体工程学。具身体互动终于成为可能，许多人着迷于《少数派报告》的范例，使用手来触及虚拟物体并抓取它们（图 6）。问题是双重的；我们不能假设一切都在触及范围内，尤其是对于有辅助需求的个体，而且将内容放得太近也可能在视觉上造成不适，因为它会对准视和调节造成压力（例如，看东西离眼睛很近是不符人体工程学的）（图 6）。从眼睛 30 厘米范围内的区域出发，出于这些原因可以被视为一个禁区[3]。同时，用户的手臂长度也是

Figure 6. Al Ergonomic spaces around a person inside VR are primarily determined by the comfortable visualization range. B) Within the intersection of reachable space and comfortable visualization, close-range interactions with the body and direct manipulations of the content are possible. C) Remapping of motions to a cursor could be used for interacting with content that is beyond reachable space and/or to reduce fatigue. D,E) Ray casting from the hands, eyes, and/ or head can serve as input modalities for far-off content.
图 6. VR 内部人员周围的人体工程学空间主要由舒适的可视范围确定。B) 在可达空间和舒适可视化的交集内，可以进行与身体的近距离互动和内容的直接操作。C) 将动作重新映射到光标可用于与超出可达空间的内容进行交互和/或减少疲劳。D，E) 从手部、眼睛和/或头部进行射线投射可以作为远距离内容的输入方式。

Figure 7. Classification of types of input by intent. The additional complexity is that for some cases the same input can be explicit or implicit depending on whether it has been done consciously or not (e.g., gestures) [6,7]. Different user actions can be aggregated from a series of micro-operations conforming to particular interactions. On the loop we simplify a schematic of the human model of motor control [4] that allows for learning of the interaction paradigm, optimized by the transfer function.
图 7. 输入类型的分类。额外的复杂性在于，对于某些情况，相同的输入可以是显式的或隐式的，这取决于是否是有意识地完成的（例如手势）[6,7]。不同的用户操作可以从一系列符合特定交互的微操作中聚合。在循环中，我们简化了一个允许学习交互范式的人体运动控制模型的示意图[4]，通过传递函数进行优化。

limited, working best at two-thirds of its extension. Therefore, there is a small space, roughly between 30 to 50 centimeters, where interfaces will be comfortable for both reach and vision. This highlights the reality that trade-offs are necessary. Objects beyond the reachable space, typically considered to be more than 70 centimeters, require alternative forms of interaction, such as remapped motions, pointing, and ray casting from the head, hands, or eyes, with additional tracking, dwelling, gestures, or combinations thereof.
有限，最好在其延伸的三分之二处工作。因此，有一个小空间，大约在 30 到 50 厘米之间，接口对于触及和视觉都会感到舒适。这突显了权衡是必要的现实。超出可触及空间的物体，通常被认为超过 70 厘米，需要替代形式的交互，例如重新映射动作，指向，以及从头部，手部或眼睛进行的射线投射，配合额外的跟踪，停留，手势或其组合。

Moreover, users can manipulate the virtual world to reach the unreachable, enabling direct interaction again. Clutching is a form of temporarily attaching the content to the user's body. However, relying heavily on clutch mechanisms to transition between reach and vision comfort would significantly increase the time required to perform any task, and introduce yet another item on the vocabulary of interactions that users will need to learn.
此外，用户可以操纵虚拟世界来达到无法触及的地方，再次实现直接互动。抓取是一种将内容暂时附着在用户身体上的形式。然而，过度依赖抓取机制在触及和视觉舒适之间转换会显著增加执行任何任务所需的时间，并引入用户需要学习的交互词汇表中的另一项。

There are other ways to bridge the reachability gap without resorting to clutching. Interaction at a distance can be achieved through ray casting or remapping by abstraction from a gesture, gaze, or posture. These abstractions from implicit inputs, however, can make them prone to unintended interactions [4].
有其他方法可以弥合可达性差距，而无需诉诸于紧握。可以通过射线投射或通过从手势、凝视或姿势中抽象重映射来实现远程交互。然而，这些来自隐式输入的抽象可能会使它们容易发生意外交互。

Body locked versus world locked. When rendering content, a key question arises: Is the content attached to the person? Generally, the ability to track and adjust the content position relative to the user opens the possibility of having both body-locked and world-locked content. Transitions between modes, such as clutching to extend reach, become possible.
身体锁定与世界锁定。在渲染内容时，一个关键问题出现了：内容是否与人物相关联？通常，跟踪和调整内容位置相对于用户的能力打开了既可以有身体锁定又可以有世界锁定内容的可能性。在不同模式之间的过渡，比如抓取以延伸到达，变得可能。

A starting recommendation would be to match existing affordances of the real world: Assume content will be world locked and input interaction body locked. In the real world, we interact with surrounding objects, such as grabbing a cup of coffee, and at that moment the object becomes body locked-it has been "clutched"
一个起始建议是匹配现实世界的现有便利条件：假设内容将被世界锁定，输入交互将被身体锁定。在现实世界中，我们与周围的物体互动，比如拿起一杯咖啡，此时物体变得与身体锁定-它已经被“抓住”

(Figure 7) （图 7）

This approach will provide better visualization ergonomics and supports better mental-mapping persistence, allowing users to remember where they placed something, like a spatial anchor.
这种方法将提供更好的可视化人体工程学，并支持更好的心理映射持久性，使用户能够记住他们放置某物的位置，就像一个空间锚点。

There are perhaps some exceptions, such as notifications or menus, or cases involving extreme distances that benefit from proximity to the user. In general, users can either have inputs that work from a reasonable distance, bring the content closer for direct manipulation, or employ locomotion to interact with the content.
也许有一些例外情况，比如通知或菜单，或者涉及极端距离的情况，这些情况受益于与用户的接近。一般来说，用户可以选择从合理距离工作的输入，将内容拉近以进行直接操作，或者利用移动来与内容交互。

Interaction paradigms. The input process can be more precisely explained when abstracted into actions and interactions. Actions can be further divided into micro-operations: selection, confirmation, and feedback [5], while the interactions provide the events with meaning retrospectively, such as manipulation, pick, and release, and will create a vocabulary of user actions together with the context (Figure 7). Actions are more basic primitives that lack the semantics of the larger task.
交互范式。将输入过程抽象为动作和互动时，可以更精确地解释。动作可以进一步分为微操作：选择、确认和反馈[5]，而互动则为事件提供了事后的含义，例如操作、拾取和释放，并将与上下文一起创建用户动作的词汇表（图 7）。动作是更基本的原语，缺乏更大任务的语义。

Mastering any input system means creating a good internal model of this vocabulary. Without a model, users have to rely on high cognitive processing of the sensorimotor feedback all the time. This has a cost. Feedbackmonitoring loops for driven actions are rather slow ( 400 milliseconds) when compared with internal models of motor control ( 100 milliseconds) in the brain [4], with its corresponding impact on reaction time and increased errors.
掌握任何输入系统意味着创建一个良好的词汇内部模型。没有模型，用户必须始终依赖对感觉运动反馈的高认知处理。这是有成本的。与大脑中运动控制的内部模型（100 毫秒）相比，驱动动作的反馈监控循环相当缓慢（400 毫秒），这对反应时间和错误增加产生相应影响。

In practice, this means that for humans to be able to learn, the process needs to be deterministic. Additionally, a good interaction paradigm should aim to create an easy vocabulary of actions and context combinations that can be internalized to achieve expertise.
在实践中，这意味着人类要学习，过程需要是确定性的。此外，一个良好的交互范式应该旨在创造一种简单的行动和上下文组合的词汇，这些可以内化以达到专业水平。

One key aspect to create a deterministic system is to minimize false positives. For that, confirmation actions can be linked to more-reliable intentional explicit inputs-a click, a pinch, a particular voice commandwhile selection can be more blended with implicit and explicit inputs such as pointing and eye gaze. If the same implicit input needs to be used as both selection and confirmation, then the best option generally is to use a dwell timer as confirmation. With good awareness of the environment, context can also be used to improve intent.
创建确定性系统的一个关键方面是最小化误报。为此，确认操作可以与更可靠的有意识明确输入相关联-点击、捏、特定语音命令，而选择可以更融合隐式和明确输入，如指向和眼神。如果同一隐式输入需要用作选择和确认，则通常最佳选择是使用停留计时器作为确认。通过对环境有良好的认识，上下文也可以用来改善意图。

Let's consider one example of suboptimal and optimal interaction paradigms: hand interactions. If selection is unreliable due to occlusions, jitter, or a non-negligible error rate in the tracking system, and if gesture recognition for confirmation also introduces additional error rates, the result is a suboptimal interaction system.
让我们考虑一个次优和最佳交互范式的例子：手部交互。如果由于遮挡、抖动或跟踪系统中存在不可忽略的错误率而导致选择不可靠，如果用于确认的手势识别也引入了额外的错误率，那么结果就是一个次优的交互系统。

Now let's introduce this system as an alternative to someone who regularly uses a mouse for eight hours a day-a high-precision tool that has been optimized and has remained stable for almost 30 years, offering just two primary clicks, right and left, to access a whole vocabulary.
现在让我们将这个系统介绍给那些每天使用鼠标八个小时的人作为一个替代方案-这是一个经过优化并保持稳定近 30 年的高精度工具，只提供两个主要点击，右键和左键，以访问整个词汇表。

Inputguidelines. For XR input in productivity scenarios, we suggest the following guidelines, always bearing in mind that the vocabulary of interactions will need to be internalized and learned by the users, so it will need to feel deterministic and easy (Figure 7):
输入指南。对于生产力场景中的 XR 输入，我们建议遵循以下准则，始终牢记用户需要内化和学习交互词汇，因此它需要感觉确定性和简单（图 7）：

Backward compatibility. Traditional peripherals offer well-known and precise, explicit input techniques. A mouse with depth [8], an augmented physical keyboard, and other trackable devices like phones and tablets can become input tools for the information worker. They are readily available, offer high precision, and are already familiar to users.
向后兼容性。传统外围设备提供众所周知和精确的明确输入技术。具有深度[8]的鼠标，增强物理键盘以及其他可跟踪设备，如手机和平板电脑，可以成为信息工作者的输入工具。它们易于获得，提供高精度，并且对用户已经很熟悉。
Embodied interactions. Midair gestures can be physically tiring and have lower precision. Reliable hand tracking is essential for successful hand gesture input. If the tracking is not reliable enough, hand tracking will still serve as a valuable tool for communication, and as a backup input when a user doesn't have access to others.
具身体互动。空中手势可能会让人感到疲劳，并且精度较低。可靠的手部跟踪对于成功的手势输入至关重要。如果跟踪不够可靠，手部跟踪仍将作为一种有价值的沟通工具，并作为用户无法访问其他输入时的备用输入。
Combined techniques. Different combinations of input modalities can work well for specific users and applications. Generally, explicit input methods (e.g., mouse, voice, etc.) can be amplified and complemented by implicit interaction signals like eye gaze or head gaze. While one might be used for selection, the other might work as the confirmation of input.
结合技术。不同的输入模式组合可以很好地适用于特定用户和应用程序。通常，显式输入方法（例如鼠标、语音等）可以通过眼神或头部注视等隐式交互信号进行增强和补充。一个可能用于选择，另一个可能用作输入的确认。

CONCLUSION 结论

This article explores the potential of virtual reality for enhancing productivity, particularly for frontline workers, while also addressing the challenges that must be overcome. With the proposed set of guidelines, we aim to reduce the friction of transitioning between devices, that is, coming in and out of the VR headset when working in combination with a laptop or desktop PC. Additionally, we aim to simplify and streamline interactions within the XR environment, making them straightforward and predictable, while harnessing the enhanced capacity for focus and multitasking offered by VR headsets.
本文探讨了虚拟现实技术提升生产力的潜力，尤其是对一线工作者，同时也解决必须克服的挑战。通过提出的一套指南，我们旨在减少在与笔记本电脑或台式电脑配合工作时，即在 VR 头戴设备内外切换时的阻力。此外，我们旨在简化和优化 XR 环境内的互动，使其简单直观，同时利用 VR 头戴设备提供的增强专注力和多任务处理能力。

ENDNOTES 尾注

Sanchez-Vives, M.V. and Slater, M. From presence to consciousness through virtual reality. Nature Reviews Neuroscience 6, 4 (2005), 332-339.
桑切斯-维夫斯（Sanchez-Vives）和斯莱特（Slater）。从虚拟现实到意识。自然神经科学评论，6，4（2005），332-339。
Panda, P. et al. AllTogether: Effect of avatars in mixed-modality conferencing environments. Proc. of 2022 Symposium on Human-Computer Interaction for Work. ACM, New York, 2022.
熊猫，P.等人。AllTogether：混合模态会议环境中头像的影响。2022 年人机交互工作研讨会论文集。ACM，纽约，2022 年。
Shibata, T., Kim, J., Hoffman, D.M., and Banks, M.S. The zone of comfort: Predicting visual discomfort with stereo displays. Fournal of Vision 11, 8 (2011), 11.
柴田，T.，金，J.，霍夫曼，D.M.，和班克斯，M.S. 舒适区：用立体显示预测视觉不适。视觉杂志 11，8（2011），11。
Padrao, G. et al. Violating body movement semantics: Neural signatures of selfgenerated and external-generated errors. Neuroimage 124 (2016), 147-156.
Padrao, G.等人。违反身体运动语义：自发和外部生成错误的神经特征。神经影像 124（2016），147-156。
LaViola, J.J., Jr., Kruijff, E., McMahan, R.P., Bowman, D., and Poupyrev, I.P. 3D User Interfaces: Theory and Practice. Addison-Wesley Professional, 2017.
LaViola, J.J., Jr., Kruijff, E., McMahan, R.P., Bowman, D., 和 Poupyrev, I.P. 3D 用户界面：理论与实践。Addison-Wesley Professional，2017。
Schmidt, A. Implicit human computer interaction through context. Personal Technologies 4 (2000), 191-199.
Schmidt, A. 通过上下文实现隐式人机交互。个人技术 4 (2000), 191-199.
Argelaguet, F. and Andujar, C. A survey of 3D object selection techniques for virtual environments. Computers & Graphics 37,3 (2013), 121-136.
Argelaguet, F.和 Andujar, C.关于虚拟环境中 3D 物体选择技术的调查。计算机与图形学 37,3 (2013), 121-136。
Zhou, Q., Fitzmaurice, G., and Anderson, F. In-depth mouse: Integrating desktop mouse into virtual reality. Proc. of the 2022 CHI Conference on Human Factors in Computing Systems. ACM, New York, 2022 .
周，Q.，菲茨莫里斯，G.，和安德森，F. 深度鼠标：将桌面鼠标整合到虚拟现实中。2022 年人机交互计算系统 CHI 会议论文集。ACM，纽约，2022 年。

Insights 洞察

VR headsets can become productivity tools if we enable multitasking and transitions in and out of devices.
如果我们在设备之间启用多任务处理和切换，VR 头显可以成为生产工具。

Inside VR, people can achieve higher focus and improve remote collaboration.
在虚拟现实中，人们可以实现更高的专注度并改善远程协作。

New combinations of multimodal input will need to enable fast and high-precision work in reachable and unreachable spaces.
新的多模输入组合将需要在可达和不可达空间中实现快速和高精度工作。
(6) Mar Gonzalez-Franco is a neuroscientist and computer scientist at Google. Her work is at the intersection of human perception and computer science. In her research, she fosters new forms of interaction that will revolutionize how humans use technologies. Her interest lies in spatial computing and on the wild use of technology.
(6) Mar Gonzalez-Franco 是 Google 的神经科学家和计算机科学家。她的工作处于人类感知和计算机科学的交汇点。在她的研究中，她促进了新形式的互动，将彻底改变人类使用技术的方式。她的兴趣在于空间计算和对技术的广泛应用。

margonagoogle.com

(4. Andrea Colaco is a software engineer at Google introducing novel applied machine learning techniques for context-based human input and intent understanding into new product categories like AR/VR and connected home devices. With a background in computational techniques and computer vision, she studies how these tools bring real-time systems to the next level.
4. Andrea Colaco 是 Google 的软件工程师，致力于将新颖的应用机器学习技术引入基于上下文的人类输入和意图理解，应用于 AR/VR 和连接家庭设备等新产品类别。凭借计算技术和计算机视觉背景，她研究这些工具如何将实时系统推向新的水平。

andreacolacodgoogle.com