这是用户在 2024-6-14 22:00 为 https://app.immersivetranslate.com/pdf-pro/843c9e3f-cbfa-4ed9-b58c-2f55743272f8 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Mar Gonzalez-Franco and Andrea Colaco, Google
Mar Gonzalez-Franco 和 Andrea Colaco,Google
Most of our interactions with digital content currently occur inside 2D screens. Moving from that format to immersive setups, however, brings a paradigm shift: from content inside the screen to users inside the content. This change requires us to revisit how we blend the analog and the digital and how we transfer content between the two modes-perhaps it even asks for new guidelines, too. While different solutions appear in the space, the gulf between the two worlds only seems to widen. We can start to see what works, and what does not work so well, in an empirical or ethnographic approach, beyond laboratory studies. But if we want to accelerate adoption, we need to better understand how current tasks can be improved and how this new form of interaction can increase productivity. In this article, we analyze and converge what we think works, and envision how this new set of immersive devices and interactions can enable productivity beyond already existing tools.
我们与数字内容的大多数互动目前发生在 2D 屏幕内。然而,从那种格式转移到沉浸式设置带来了范式转变:从屏幕内的内容到用户内的内容。这种变化要求我们重新审视如何融合模拟和数字以及如何在两种模式之间传输内容-也许甚至需要新的指导方针。虽然在这个领域出现了不同的解决方案,但两个世界之间的鸿沟似乎只会加大。我们可以开始看到什么有效,以及什么效果不太好,通过经验主义或民族志方法,超越实验室研究。但如果我们想加速采用,我们需要更好地了解如何改进当前任务以及这种新形式的互动如何提高生产力。在本文中,我们分析并汇聚了我们认为有效的内容,并设想了这一新一套沉浸式设备和互动如何能够超越已有工具来提高生产力。


We'll start with some lessons from previous inflection points in computing using a simplified history of events. First, we had computers. They became smaller and cheaper so they could be used at home. Then, the Internet

connected everything. Tech then got even smaller, so much so that we could move to mobile computing, which eventually jumped to our phones. And that is where we are now.
In retrospect, we can see how all these evolutionary leaps have coexisted. Despite all sorts of predictions, phones didn't kill the PC, and probably neither will immersive tech. They are likely to coexist for the foreseeable future. This is one of the early points we want to make: the importance of interoperability.
Let's consider a world where digital content finally moves out of 2D screens into our 3D worlds (real worlds). As opposed to having two realities-a metaverse (virtual) and a real worldextended reality (XR) aims to blend the two. In this framework, there are some things that will continue being 2D planes even inside 3D; for example, documents. But these panels might resynthesize in our surroundings in more-affordable ways than on an actual screen: on our walls, on top of tables, attached to other objects that make them tangible, or taking into account the user for optimal ergonomic size and position. And yes, that means
让我们考虑一个数字内容最终从 2D 屏幕转移到我们的 3D 世界(真实世界)的世界。与拥有两个现实-元宇宙(虚拟)和真实世界扩展现实(XR)旨在融合这两个现实相反。在这个框架中,有一些东西将继续保持 2D 平面,即使在 3D 内部;例如,文件。但这些面板可能以更经济的方式重新合成在我们的周围,而不是在实际屏幕上:在我们的墙上,桌子上方,附着在其他物体上使它们有形,或者考虑用户的最佳人体工程学大小和位置。是的,这意味着
sometimes content might be just shown as screens inside XR. But there will always be a component of spatially arranging things, whether they are 2D or 3D.
有时内容可能只是显示为 XR 内的屏幕。但无论是 2D 还是 3D,空间排列事物的组成部分总是存在的。
Let's take a deeper look at the spatial and ergonomic components together. We will focus on wearable sets of glasses (even in the primitive form of an AR phone; since it is held in the hand, it could be temporarily considered a wearable). This type of immersive tech is an interface between the user and their environment, inviting embodied interaction. The dichotomy is then between body-locked content and interactions versus world-locked ones. Interestingly, in XR the boundaries are more dynamic; our input systems as well as our content will have many more options, which we discuss later.
让我们一起深入了解空间和人体工程学组件。我们将专注于可穿戴眼镜套装(即使是以 AR 手机的原始形式;因为它被手持,可以暂时被视为可穿戴设备)。这种沉浸式技术是用户与环境之间的界面,邀请身体互动。然后是身体锁定内容和互动与世界锁定内容之间的二分法。有趣的是,在 XR 中,边界更加动态;我们的输入系统以及我们的内容将有更多选择,我们稍后会讨论。
Mundane productivity topics, like interoperability or input, are the scaffolding for collaboration, multitasking, transitions, and interruptions. There are many other things users will need to do, but these are the ones we focus on here and that we believe are core for productivity.
If these topics are not addressed well, VR will not be widely adopted for work scenarios, especially for information workers, who currently spend most of their time using PCs. What can VR do better? is a good research question. For now, though, let's make sure VR isn't worse at these topics than current PCs.
如果这些问题没有得到很好解决,VR 将不会被广泛应用于工作场景,特别是信息工作者,他们目前大部分时间都在使用个人电脑。VR 能做得更好吗?这是一个很好的研究问题。不过,现在让我们确保 VR 在这些方面不比当前的个人电脑更差。


Information workers, also known as knowledge workers, are those who spend most of their productive time enabled by computers. Other workers, such as frontline workers in factories, farms, or other real-world settings, will also experience a big improvement in productivity when they are able to augment their realities. In fact, the impact on their productivity will perhaps be even greater than that of information workers (Figure 1), as many real-world tasks still lack advanced assisted computation, whereas the improvement in productivity with VR for information workers might be marginal.
信息工作者,也被称为知识工作者,是那些大部分生产时间都在计算机的帮助下度过的人。其他工作者,比如工厂、农场或其他现实世界环境中的一线工作者,在能够增强他们现实情境时,也会体验到生产力的大幅提升。事实上,他们的生产力影响可能会比信息工作者的还要大(图 1),因为许多现实世界的任务仍然缺乏先进的辅助计算,而信息工作者通过虚拟现实提高生产力的效果可能较小。
In this article we focus on how immersive tech will revolutionize productivity for information workers This narrower initial focus is for three reasons:
  • Information workers are already intensively using devices and adopting new software and gadgets on a regular basis. They can be considered early adopters, the power users of digital content.
  • Safety issues will be reduced. Working at a desk is a much safer control space. Locomotion, the need to move around, is reduced and scenes are more constrained, with more-limited sets of objects, reducing dependencies from scene-understanding algorithms. It is an ideal petri dish for early XR, where the real world can start to blend with VR, with pass-through views and so on.
    安全问题将会减少。在办公桌上工作是一个更安全的控制空间。移动的需求减少了,场景更受限制,物体集更有限,减少了对场景理解算法的依赖。这是早期 XR 的理想培养皿,现实世界可以开始与虚拟现实融合,具有透视视图等功能。
  • Ergonomic issues will be reduced. Head-mounted displays (HMDs) in a multidevice scenario might be used only temporarily, which means that tethered cables, as well as the size or weight of the device, might be less critical.
It's clear that HMDs, computer vision, and will improve over time, enabling many other forms of XR. This follows the trend we have already seen with other specs that have improved in the past decade. We can comfortably read text in most HMDs, with displays complemented by advances in optics-pancake lenses or threeelement lenses-finally making these devices viable for information workers.
很明显,HMD、计算机视觉和 将随着时间的推移得到改善,从而使许多其他形式的 XR 得以实现。这符合我们已经看到的其他规格在过去十年中得到改善的趋势。我们可以在大多数 HMD 中舒适地阅读文本, 显示器配合光学透镜(煎饼透镜或三元透镜)的进步,最终使这些设备对信息工作者来说可行。


If people are going to adopt VR for work, it is because some tasks become easier or faster to perform in this medium. Researchers have been trying to find which specific tasks benefit. We can, for example, augment experiences by visualizing more information in context, augment presentations inside VR, and improve meeting experiences.
Indeed, meeting with other humans is an experience complex enough that we haven't managed to re-create it with video conferences in 2D; maybe VR can help with its spatial audio and vision. VR can enable more ecological validity and unlock evolutionary wonders such as directed attention, peripersonal spaces, and concurrent taking, as well as unlock the ability to use our body for interaction (pointing, gazing, and enabling spatial formations). Perhaps it will even enable the use of whiteboards in direct ways by multiple users, and ultimately support a form of collaborative spatial work unavailable with traditional 2D screens.
In general, there is agreement that experiential tasks are good candidates to be improved with VR, even if they don't require colocated participation. That means going beyond meetings, to affect learning with improved recall and hippocampal activity. These experiential tasks are particular use cases for productivity, however, and might not justify full adoption of VR.

If these topics are not addressed well, VR will not be widely adopted for work scenarios, especially for information workers, who currently spend most of their time using PCs.
如果这些问题没有得到很好解决,VR 将不会被广泛应用于工作场景,特别是信息工作者,他们目前大部分时间都在使用个人电脑。

A different look into VR productivity opportunities can focus on the uniqueness of the medium, instead of on specific tasks. VR has traditionally been labeled as an isolating medium. But despite the fact that this isolating effect can go away with current and future video pass-through technology, the ability to transform HMDs into a monastery on demand has big potential for helping with productivity. That could mean XR offers an increased capacity to upstream focus, creating fewer distractions so we can channel larger chunks of attention to work. The monastery example is perhaps very extreme, but it is illustrative of the "private space on the go" potential of this technology.
对 VR 生产力机会的不同看法可以聚焦在媒介的独特性上,而不是特定任务。传统上,VR 被标记为一种孤立的媒介。但尽管这种孤立效应可以通过当前和未来的视频透传技术消失,将 HMDs 转变为按需的修道院具有帮助生产力的巨大潜力。这可能意味着 XR 提供了更大的上游关注能力,减少了干扰,使我们能够将更多的注意力集中在工作上。修道院的例子可能非常极端,但它说明了这项技术的“随时随地的私人空间”潜力。
Even if it sounds like the antithesis of focus, the other superpower of VR could be its scaffolds for multitasking. These scaffolds would be enabled by its large horizontal field of view (FOV), which would become the largest real estate display of any available device, providing users access to an optimal set of concurrent screens, applications, and fast layouts at the same time. These two properties transcend the type of task and highlight particularities of the medium. If you want to focus, go to VR. If you want to multitask, go to VR. These will need to be enabled by both the hardware and software, however, and with good practices, which we highlight below.
即使听起来像是专注的对立面,VR 的另一个超级能力可能是它的多任务支架。这些支架将由其大水平视野(FOV)启用,这将成为任何可用设备的最大房地产显示,为用户提供同时访问一组最佳的屏幕、应用程序和快速布局。这两个属性超越了任务的类型,并突出了媒体的特殊性。如果你想专注,去 VR。如果你想多任务,去 VR。然而,这些都需要硬件和软件的支持,并且需要良好的实践,我们在下面进行了重点介绍。


In pass-through video HMDs (Figure 2), a complete occlusion of the real world is possible by turning off the video feed. A set of cameras record the real world and then stream the recording to the (opaque) displays inside the headset. When the camera feed is off, a user feels they are in another location entirelyfull VR-and their presence in that location can be so strong that they forget about their real-world surroundings [1].
在透视视频头戴式显示器(图 2)中,通过关闭视频源,可以完全遮蔽现实世界。一组摄像头记录现实世界,然后将录像流式传输到头戴式显示器内(不透明)。当摄像头关闭时,用户会感觉自己完全处于另一个位置,完全进入虚拟现实,他们在那个位置的存在感可能会非常强烈,以至于忘记了周围的现实环境。
In optical see-through HMDs, the user wears transparent glasses and can overlay projections of synthetic content on top of the real world (Figure 2). In see-through devices, totally occluding the world would probably require a display the size of the human FOV, and the technology isn't there yet.
在光学透视头戴式显示器中,用户戴着透明眼镜,可以将合成内容的投影叠加在现实世界之上(图 2)。在透视设备中,完全遮挡世界可能需要一个与人类视野大小相当的显示器,但目前技术还未达到这一水平。
But the uninterrupted work-focus scenario—independent, if one is

Figure 1. Vignettes showing use cases of immersive technology for productivity. Left: Complementing the information worker experience. Center: Augmenting the real world for frontline workers, with in-context access to information le.g., with instructions on fixing a broken device). Right: Enabling a factory worker to better operate, design, and control a process through the use of augmented reality tools.
图 1.展示沉浸式技术用于提高生产力的使用案例。左:补充信息工作者的体验。中:为一线工人增强现实世界,提供上下文信息访问(例如,提供修理设备的指导)。右:通过增强现实工具,使工厂工人更好地操作、设计和控制流程。
Figure 2. Two main ways of blending digital and real content, with either optical see-through or video pass-through (sometimes referred to as video see-through). In optical see-through, the digital world is projected on a surface that has a level of transparency. In video pass-through, the eyes are completely occluded from the world with opaque displays.
图 2. 将数字和实际内容混合的两种主要方式,分别是光学透视和视频透视(有时称为视频透视)。在光学透视中,数字世界投影在具有一定透明度的表面上。在视频透视中,眼睛完全被不透明的显示屏遮挡住,无法看到外界。
Figure 3. Sketches of real desktops with multiple displays and devices. writing a document on a screen, answering emails, or preparing a set of slides-also needs to be compatible with the times when users will want to multitask and be connected to other devices and their real world. Most people don't want to live in isolation for their entire work life. They need to be able to transition in and out of such an immersive device, with reduced mental-switch burden.
图 3. 多个显示屏和设备的真实桌面草图。在屏幕上撰写文档,回复电子邮件,或准备一套幻灯片-还需要与用户希望进行多任务处理并连接到其他设备和现实世界的时代兼容。大多数人不希望在整个工作生活中与世隔绝。他们需要能够在这样的沉浸式设备中进出,减少心理切换负担。


An HMD can become a complete interface between the user-their body and brain - and their environment. As a wearable, it is continuously adapting in first-person perspective-looking out, projecting in-affording an opportunity for reimagining computing. But we also need to make sure the basics are covered.
HMD 可以成为用户-他们的身体和大脑-与环境之间的完整接口。作为可穿戴设备,它在第一人称视角中不断适应-向外看,向内投影,为重新构想计算提供机会。但我们也需要确保基础知识得到覆盖。
At the bare minimum an HMD should do very well what the other devices already do: interacting with digital content. So, even at the risk of sounding trite, here we highlight the need to make sure this technology is interoperable, has good input systems, mediates interruptions and transitions, is accessible, and allows for multitasking. And that it does all this while reducing the mental cost of switching in and out. Perhaps, then, the

Figure 4. Architectural layout of two rooms with the overlay of common free space (red stripes: unavailable; green: free). In most cases, and as the number of users increases, areas of available overlay on dissimilar spaces will tend toward zero.
图 4. 两个房间的建筑布局,覆盖了共同的自由空间(红色条纹:不可用;绿色:空闲)。在大多数情况下,随着用户数量的增加,不同空间上可用覆盖区域将趋近于零。
"killer app" for HMDs is just a very good interaction paradigm that simplifies the use of this technology on a daily basis, even if for very short periods of time.
HMDs 的“杀手级应用”只是一个非常好的交互范式,简化了这项技术在日常生活中的使用,即使只是很短的时间。
Interoperability. Introducing any new device to an ecosystem comes at a cost. Immersive technologies for information workers arrive in a space that is already heavily populated with other devices. Workers use a large set of layouts with multiple devices and display configurations in their offices (Figure 3). They might be in a semipermanent setting or on the go, on mobile devices, tablets, or laptops. Workers expect their multiple devices to transfer content seamlessly and to be able to use the same set of apps with corresponding actions, perhaps even with the same inputs. This is a key requirement for VR devices that need to be designed within this context: to account for solo users who transition quickly between devices.
互操作性。将任何新设备引入生态系统都会带来成本。信息工作者的沉浸式技术进入一个已经拥挤着其他设备的空间。工作者在办公室使用大量布局,配备多个设备和显示配置(图 3)。他们可能处于半永久设置或在移动中,使用移动设备、平板电脑或笔记本电脑。工作者期望他们的多个设备能够无缝传输内容,并能够使用相同的应用程序集和相应的操作,甚至可能使用相同的输入。这是需要在此背景下设计的 VR 设备的关键要求:考虑到快速在设备之间切换的独立用户。
Collaboration. Interoperability is not just a single-user issue. People work together. VR users cannot expect everyone to be wearing an HMD. This will be especially true for early adopters, who will face hybrid interaction paradigms when other users don't have VR headsets. This puts emphasis on the importance of figuring out ways both for traditional users to engage in VR collaborative spaces and for VR users to appear on their collaborators' 2D tools. For example, using avatars to represent VR users might make more sense inside a regular videoconferencing tool [2] than inside collaborative VR, where the focus should instead be on how to tile spatially correct 2D participants in coherent spots of the VR environment.
Even as adoption grows, when more people have HMDs, users will not be able to assume their current spaces are similar enough to have totally free collaborative environments (Figure 4). It will be hard to share immersive spaces between people, and interaction might need abstractions of semantics from motions, meaning that if you are, for example, pointing at one object in your environment but that same object is positioned in a different location in the other person's environment, we will have to adjust that interaction. There will be some artificial repositioning of users and content in space according to the scene understanding in each specific case, trying to maintain certain interaction consistencies that enable both communication and collaboration.
随着采用的增长,当更多人拥有头戴式显示器时,用户将无法假设他们当前的空间足够相似,以至于完全自由地进行协作环境(图 4)。在人们之间共享沉浸式空间将变得困难,互动可能需要从动作中抽象出语义,这意味着,例如,如果您在环境中指向一个物体,但同一个物体在另一个人的环境中位置不同,我们将不得不调整该互动。根据每种特定情况中的场景理解,用户和内容在空间中会有一些人为的重新定位,试图保持某些互动一致性,以实现沟通和协作。

Interruptions and transitions.

Interruptions and transitions have a significant impact on productivity and workflow. Coworkers, kids, pets, app notifications, other devices, calls, emails, messages-they can all disrupt focus, break momentum, and lead to inefficiencies. While transitions between contexts to respond to interruptions isn't just a problem of immersive setups, it can be amplified in VR, where the HMD creates a visual barrier to the external world that can be overcome only with a good system for detection and mediation of interruptions.
The truth is that inside VR even ordinary activities like drinking coffee could be considered interruptions. Being able to access the real world and bring parts of it to the VR environment in a blended manner will be key. This transformation of an isolated VR experience into an extended reality (XR) that is connected to its surroundings is essential for effective productivity.
事实是,在 VR 内部,即使是像喝咖啡这样的普通活动也可能被视为干扰。能够访问现实世界并以混合方式将其部分带入 VR 环境将是关键。将孤立的 VR 体验转变为与周围环境相连的扩展现实(XR)对于有效的生产力至关重要。
Once interruptions are presented inside the HMD, users will expect to seamlessly transition from one activity to another, perhaps even in and out of their devices.
一旦 HMD 内出现中断,用户将期望能够无缝地从一项活动过渡到另一项活动,甚至可能在设备内外之间进行切换。
One effective strategy is to handle as many external interruptions as possible inside VR, minimizing the need to takeThis is a key requirement for VR devices
一种有效的策略是尽可能在 VR 内处理尽可能多的外部干扰,最大限度地减少需要进行的操作。这是 VR 设备的关键要求

that need to be designed within this

context: to account for solo users who

transition quickly between devices.
This is a key requirement for VR devices that need to be designed within this context: to account for solo users who transition quickly between devices.
这是 VR 设备的一个关键要求,需要在这种情况下设计:考虑那些快速在设备之间转换的独立用户。

off the headset. That means VR systems should ensure other devices are tracked, visible, and interactable, and that their screens are mirrored inside VR. Smart pass-through allows users to view the real world without taking off the headset. But full pass-through might not always be necessary. There are many different forms of adaptive pass-through that can enable this XR experience, ranging from full passthrough, segmented objects, and digital versions of the real world reconstructed via Gaussian splats or neural radiance fields (NeRFs). Dynamic chaperones, for example, can display relevant information through edge-detection filters that activate when movement is detected near the boundaries, serving as an intermediate step before enabling full pass-through. Partial pass-through and segmentation techniques can also be employed to preserve a sense of connection with real-world landmarks, such as a sofa, window, or bed, while immersed in VR-a sense of presence in the real world.
关闭耳机。这意味着 VR 系统应确保其他设备被跟踪、可见且可交互,并且它们的屏幕在 VR 内镜像。智能透视允许用户在不摘下耳机的情况下查看真实世界。但完全透视可能并非总是必要的。有许多不同形式的自适应透视可以实现这种 XR 体验,包括完全透视、分段对象以及通过高斯斑点或神经辐射场(NeRFs)重建的真实世界的数字版本。动态防护者,例如,可以通过边缘检测滤镜显示相关信息,当检测到边界附近的移动时激活,作为启用完全透视之前的中间步骤。部分透视和分割技术也可以用来保留与真实世界地标(如沙发、窗户或床)的联系感,同时沉浸在 VR 中-一种存在于真实世界中的感觉。
Indeed, presence is a unique feature of VR technologies that can create the illusion of being elsewhere, in another place; for users, this often means losing track of their real space [1]. In terms of productivity, however, it may be desirable to enable users to feel present simultaneously in both the virtual and real spaces.
One solution to be in two placesthe virtual and the real-at the same time is to bring users to an intermediate space that closely resembles their current physical environment but in an improved format: a clutter-free, clean, and productive setting. By curating objects through blended reality, VR can effectively "clean your room," eliminating distractions and aiding concentration. Leveraging generative AI tools, XR can go beyond simply removing elements from the scene and generate an entirely new space where users feel present in both the physical and virtual realms (Figure 5).
一种同时存在于虚拟和现实两个地方的解决方案是将用户带到一个中间空间,该空间与他们当前的物理环境非常相似,但以改进的格式呈现:一个没有杂乱、干净且高效的设置。通过混合现实策划物体,虚拟现实可以有效地“整理您的房间”,消除干扰并帮助集中注意力。利用生成式人工智能工具,扩展现实可以超越简单地从场景中移除元素,并生成一个完全新的空间,让用户感觉同时存在于物理和虚拟领域(图 5)。
If this mediated interruptions system is well managed, it can reduce the likelihood of unnecessary interruptions while facilitating transitions and help achieve focus.
Multitasking. Multitasking is an essential skill that greatly affects productivity, and it is closely tied to seamless transitions. Ultimately, multitasking involves handling activities simultaneously while swiftly switching between multiple tasks. It can be considered a by-product of an effective interruption management system. While some argue that multitasking can decrease productivity, when used appropriately it can increase efficiency, enabling individuals to make progress on multiple tasks concurrently. Multitasking allows people to optimize their time by avoiding idle periods; for example, when downloading files or waiting for a computational response, one can work on another task, thereby maximizing productivity throughout the day. Additionally, multitasking can help prevent monotony and provide mental stimulation. Though it might seem counterintuitive, switching between tasks helps individuals maintain focus and interest, effectively combating boredom and fatigue. This, in turn, can promote higher levels of engagement and motivation.

In XR, multitasking can be further enhanced by the wide horizontal FOV and head tracking, which in essence create an "infinite 360 " real estate around the user that allows for multiple display arrangements. The ability to handle multiple tasks simultaneously and change layouts is also very inviting to render other devices inside the HMD, so the user can have everything accessible in one place.
在 XR 中,多任务处理可以通过宽广的水平视野和头部追踪进一步增强,本质上创造了一个围绕用户的“无限 360”房地产,允许多种显示布局。同时处理多个任务并更改布局的能力也非常吸引人,可以在 HMD 内渲染其他设备,使用户可以在一个地方访问所有内容。
Input system and content. Input is perhaps the hardest issue for VR, with numerous new interaction paradigms and input combinations to explore and an ever-divergent vocabulary of actions and interactions that continues to expand (Figures 6 and 7). HMDs offer new ways to interpret intent, attention, and action through enhanced sensing capabilities and wearable formats. But these newfound capabilities also come with challenges in terms of expressiveness, and often input
输入系统和内容。输入可能是 VR 中最困难的问题,有许多新的交互范式和输入组合需要探索,以及一个不断扩大的行为和互动词汇表(图 6 和 7)。头戴式显示器提供了通过增强的感知能力和可穿戴格式来解释意图、注意力和行动的新方式。但这些新发现的能力也带来了表达能力方面的挑战,通常也涉及输入。

Figure 5. Extended reality (XR) can also be used to declutter spaces or even transform spaces using generative Al to become less cognitively demanding (GenXR), where it's easier to focus. Some basic aspects remain, however, like the layout or the interactable devices to facilitate context switching. In this figure, we showcase how the decluttering can be enabled in passthrough situations, by inpainting (top) or by using generative Al that renders a whole new space with similar structural constrains (bottom).
图 5. 扩展现实(XR)也可以利用生成 AI 来减少空间杂乱,甚至转变空间,使其变得不那么需要认知负担(GenXR),从而更容易集中注意力。然而,一些基本方面仍然保留,比如布局或可交互设备,以促进上下文切换。在这个图中,我们展示了如何通过透视情况启用减少杂乱,通过修补(顶部)或使用生成 AI 渲染一个具有类似结构约束的全新空间(底部)。


methods that capture the intent or the action might be lacking in other ways and not scale well to all activities.
Reachability versus vision ergonomy. Embodied interaction is finally a possibility, and many have been enamored of Minority Report paradigms, using hands to reach virtual objects and grab them (Figure 6). The problem is twofold; not only can we not assume everything will be within reach, especially for individuals with accessibility needs, but also bringing content too close can be visually uncomfortable, as it strains vergence and accommodation (it is un-ergonomic, for example, to look at things very close to the eyes) (Figure 6). The area within 30 centimeters from the eyes can be considered a no-zone for those reasons [3]. At the same time, users' arm length is also
可达性与视觉人体工程学。具身体互动终于成为可能,许多人着迷于《少数派报告》的范例,使用手来触及虚拟物体并抓取它们(图 6)。问题是双重的;我们不能假设一切都在触及范围内,尤其是对于有辅助需求的个体,而且将内容放得太近也可能在视觉上造成不适,因为它会对准视和调节造成压力(例如,看东西离眼睛很近是不符人体工程学的)(图 6)。从眼睛 30 厘米范围内的区域出发,出于这些原因可以被视为一个禁区[3]。同时,用户的手臂长度也是
Figure 6. Al Ergonomic spaces around a person inside VR are primarily determined by the comfortable visualization range. B) Within the intersection of reachable space and comfortable visualization, close-range interactions with the body and direct manipulations of the content are possible. C) Remapping of motions to a cursor could be used for interacting with content that is beyond reachable space and/or to reduce fatigue. D,E) Ray casting from the hands, eyes, and/ or head can serve as input modalities for far-off content.
图 6. VR 内部人员周围的人体工程学空间主要由舒适的可视范围确定。B) 在可达空间和舒适可视化的交集内,可以进行与身体的近距离互动和内容的直接操作。C) 将动作重新映射到光标可用于与超出可达空间的内容进行交互和/或减少疲劳。D,E) 从手部、眼睛和/或头部进行射线投射可以作为远距离内容的输入方式。
Figure 7. Classification of types of input by intent. The additional complexity is that for some cases the same input can be explicit or implicit depending on whether it has been done consciously or not (e.g., gestures) [6,7]. Different user actions can be aggregated from a series of micro-operations conforming to particular interactions. On the loop we simplify a schematic of the human model of motor control [4] that allows for learning of the interaction paradigm, optimized by the transfer function.
图 7. 输入类型的分类。额外的复杂性在于,对于某些情况,相同的输入可以是显式的或隐式的,这取决于是否是有意识地完成的(例如手势)[6,7]。不同的用户操作可以从一系列符合特定交互的微操作中聚合。在循环中,我们简化了一个允许学习交互范式的人体运动控制模型的示意图[4],通过传递函数进行优化。
limited, working best at two-thirds of its extension. Therefore, there is a small space, roughly between 30 to 50 centimeters, where interfaces will be comfortable for both reach and vision. This highlights the reality that trade-offs are necessary. Objects beyond the reachable space, typically considered to be more than 70 centimeters, require alternative forms of interaction, such as remapped motions, pointing, and ray casting from the head, hands, or eyes, with additional tracking, dwelling, gestures, or combinations thereof.
有限,最好在其延伸的三分之二处工作。因此,有一个小空间,大约在 30 到 50 厘米之间,接口对于触及和视觉都会感到舒适。这突显了权衡是必要的现实。超出可触及空间的物体,通常被认为超过 70 厘米,需要替代形式的交互,例如重新映射动作,指向,以及从头部,手部或眼睛进行的射线投射,配合额外的跟踪,停留,手势或其组合。
Moreover, users can manipulate the virtual world to reach the unreachable, enabling direct interaction again. Clutching is a form of temporarily attaching the content to the user's body. However, relying heavily on clutch mechanisms to transition between reach and vision comfort would significantly increase the time required to perform any task, and introduce yet another item on the vocabulary of interactions that users will need to learn.
There are other ways to bridge the reachability gap without resorting to clutching. Interaction at a distance can be achieved through ray casting or remapping by abstraction from a gesture, gaze, or posture. These abstractions from implicit inputs, however, can make them prone to unintended interactions [4].
Body locked versus world locked. When rendering content, a key question arises: Is the content attached to the person? Generally, the ability to track and adjust the content position relative to the user opens the possibility of having both body-locked and world-locked content. Transitions between modes, such as clutching to extend reach, become possible.
A starting recommendation would be to match existing affordances of the real world: Assume content will be world locked and input interaction body locked. In the real world, we interact with surrounding objects, such as grabbing a cup of coffee, and at that moment the object becomes body locked-it has been "clutched"
(Figure 7) (图 7)
This approach will provide better visualization ergonomics and supports better mental-mapping persistence, allowing users to remember where they placed something, like a spatial anchor.
There are perhaps some exceptions, such as notifications or menus, or cases involving extreme distances that benefit from proximity to the user. In general, users can either have inputs that work from a reasonable distance, bring the content closer for direct manipulation, or employ locomotion to interact with the content.
Interaction paradigms. The input process can be more precisely explained when abstracted into actions and interactions. Actions can be further divided into micro-operations: selection, confirmation, and feedback [5], while the interactions provide the events with meaning retrospectively, such as manipulation, pick, and release, and will create a vocabulary of user actions together with the context (Figure 7). Actions are more basic primitives that lack the semantics of the larger task.
交互范式。将输入过程抽象为动作和互动时,可以更精确地解释。动作可以进一步分为微操作:选择、确认和反馈[5],而互动则为事件提供了事后的含义,例如操作、拾取和释放,并将与上下文一起创建用户动作的词汇表(图 7)。动作是更基本的原语,缺乏更大任务的语义。
Mastering any input system means creating a good internal model of this vocabulary. Without a model, users have to rely on high cognitive processing of the sensorimotor feedback all the time. This has a cost. Feedbackmonitoring loops for driven actions are rather slow ( 400 milliseconds) when compared with internal models of motor control ( 100 milliseconds) in the brain [4], with its corresponding impact on reaction time and increased errors.
掌握任何输入系统意味着创建一个良好的词汇内部模型。没有模型,用户必须始终依赖对感觉运动反馈的高认知处理。这是有成本的。与大脑中运动控制的内部模型(100 毫秒)相比,驱动动作的反馈监控循环相当缓慢(400 毫秒),这对反应时间和错误增加产生相应影响。
In practice, this means that for humans to be able to learn, the process needs to be deterministic. Additionally, a good interaction paradigm should aim to create an easy vocabulary of actions and context combinations that can be internalized to achieve expertise.
One key aspect to create a deterministic system is to minimize false positives. For that, confirmation actions can be linked to more-reliable intentional explicit inputs-a click, a pinch, a particular voice commandwhile selection can be more blended with implicit and explicit inputs such as pointing and eye gaze. If the same implicit input needs to be used as both selection and confirmation, then the best option generally is to use a dwell timer as confirmation. With good awareness of the environment, context can also be used to improve intent.
Let's consider one example of suboptimal and optimal interaction paradigms: hand interactions. If selection is unreliable due to occlusions, jitter, or a non-negligible error rate in the tracking system, and if gesture recognition for confirmation also introduces additional error rates, the result is a suboptimal interaction system.
Now let's introduce this system as an alternative to someone who regularly uses a mouse for eight hours a day-a high-precision tool that has been optimized and has remained stable for almost 30 years, offering just two primary clicks, right and left, to access a whole vocabulary.
现在让我们将这个系统介绍给那些每天使用鼠标八个小时的人作为一个替代方案-这是一个经过优化并保持稳定近 30 年的高精度工具,只提供两个主要点击,右键和左键,以访问整个词汇表。
Inputguidelines. For XR input in productivity scenarios, we suggest the following guidelines, always bearing in mind that the vocabulary of interactions will need to be internalized and learned by the users, so it will need to feel deterministic and easy (Figure 7):
输入指南。对于生产力场景中的 XR 输入,我们建议遵循以下准则,始终牢记用户需要内化和学习交互词汇,因此它需要感觉确定性和简单(图 7):
  • Backward compatibility. Traditional peripherals offer well-known and precise, explicit input techniques. A mouse with depth [8], an augmented physical keyboard, and other trackable devices like phones and tablets can become input tools for the information worker. They are readily available, offer high precision, and are already familiar to users.
  • Embodied interactions. Midair gestures can be physically tiring and have lower precision. Reliable hand tracking is essential for successful hand gesture input. If the tracking is not reliable enough, hand tracking will still serve as a valuable tool for communication, and as a backup input when a user doesn't have access to others.
  • Combined techniques. Different combinations of input modalities can work well for specific users and applications. Generally, explicit input methods (e.g., mouse, voice, etc.) can be amplified and complemented by implicit interaction signals like eye gaze or head gaze. While one might be used for selection, the other might work as the confirmation of input.


This article explores the potential of virtual reality for enhancing productivity, particularly for frontline workers, while also addressing the challenges that must be overcome. With the proposed set of guidelines, we aim to reduce the friction of transitioning between devices, that is, coming in and out of the VR headset when working in combination with a laptop or desktop PC. Additionally, we aim to simplify and streamline interactions within the XR environment, making them straightforward and predictable, while harnessing the enhanced capacity for focus and multitasking offered by VR headsets.
本文探讨了虚拟现实技术提升生产力的潜力,尤其是对一线工作者,同时也解决必须克服的挑战。通过提出的一套指南,我们旨在减少在与笔记本电脑或台式电脑配合工作时,即在 VR 头戴设备内外切换时的阻力。此外,我们旨在简化和优化 XR 环境内的互动,使其简单直观,同时利用 VR 头戴设备提供的增强专注力和多任务处理能力。


  1. Sanchez-Vives, M.V. and Slater, M. From presence to consciousness through virtual reality. Nature Reviews Neuroscience 6, 4 (2005), 332-339.
  2. Panda, P. et al. AllTogether: Effect of avatars in mixed-modality conferencing environments. Proc. of 2022 Symposium on Human-Computer Interaction for Work. ACM, New York, 2022.
    熊猫,P.等人。AllTogether:混合模态会议环境中头像的影响。2022 年人机交互工作研讨会论文集。ACM,纽约,2022 年。
  3. Shibata, T., Kim, J., Hoffman, D.M., and Banks, M.S. The zone of comfort: Predicting visual discomfort with stereo displays. Fournal of Vision 11, 8 (2011), 11.
    柴田,T.,金,J.,霍夫曼,D.M.,和班克斯,M.S. 舒适区:用立体显示预测视觉不适。视觉杂志 11,8(2011),11。
  4. Padrao, G. et al. Violating body movement semantics: Neural signatures of selfgenerated and external-generated errors. Neuroimage 124 (2016), 147-156.
    Padrao, G.等人。违反身体运动语义:自发和外部生成错误的神经特征。神经影像 124(2016),147-156。
  5. LaViola, J.J., Jr., Kruijff, E., McMahan, R.P., Bowman, D., and Poupyrev, I.P. 3D User Interfaces: Theory and Practice. Addison-Wesley Professional, 2017.
    LaViola, J.J., Jr., Kruijff, E., McMahan, R.P., Bowman, D., 和 Poupyrev, I.P. 3D 用户界面:理论与实践。Addison-Wesley Professional,2017。
  6. Schmidt, A. Implicit human computer interaction through context. Personal Technologies 4 (2000), 191-199.
    Schmidt, A. 通过上下文实现隐式人机交互。个人技术 4 (2000), 191-199.
  7. Argelaguet, F. and Andujar, C. A survey of 3D object selection techniques for virtual environments. Computers & Graphics 37,3 (2013), 121-136.
    Argelaguet, F.和 Andujar, C.关于虚拟环境中 3D 物体选择技术的调查。计算机与图形学 37,3 (2013), 121-136。
  8. Zhou, Q., Fitzmaurice, G., and Anderson, F. In-depth mouse: Integrating desktop mouse into virtual reality. Proc. of the 2022 CHI Conference on Human Factors in Computing Systems. ACM, New York, 2022 .
    周,Q.,菲茨莫里斯,G.,和安德森,F. 深度鼠标:将桌面鼠标整合到虚拟现实中。2022 年人机交互计算系统 CHI 会议论文集。ACM,纽约,2022 年。

  1. Insights 洞察
    VR headsets can become productivity tools if we enable multitasking and transitions in and out of devices.
    如果我们在设备之间启用多任务处理和切换,VR 头显可以成为生产工具。
    Inside VR, people can achieve higher focus and improve remote collaboration.
    New combinations of multimodal input will need to enable fast and high-precision work in reachable and unreachable spaces.
  2. (6) Mar Gonzalez-Franco is a neuroscientist and computer scientist at Google. Her work is at the intersection of human perception and computer science. In her research, she fosters new forms of interaction that will revolutionize how humans use technologies. Her interest lies in spatial computing and on the wild use of technology.
    (6) Mar Gonzalez-Franco 是 Google 的神经科学家和计算机科学家。她的工作处于人类感知和计算机科学的交汇点。在她的研究中,她促进了新形式的互动,将彻底改变人类使用技术的方式。她的兴趣在于空间计算和对技术的广泛应用。
    (4. Andrea Colaco is a software engineer at Google introducing novel applied machine learning techniques for context-based human input and intent understanding into new product categories like AR/VR and connected home devices. With a background in computational techniques and computer vision, she studies how these tools bring real-time systems to the next level.
    4. Andrea Colaco 是 Google 的软件工程师,致力于将新颖的应用机器学习技术引入基于上下文的人类输入和意图理解,应用于 AR/VR 和连接家庭设备等新产品类别。 凭借计算技术和计算机视觉背景,她研究这些工具如何将实时系统推向新的水平。