这是用户在 2024-12-15 15:19 为 https://lil.law.harvard.edu/century-scale-storage/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

If you had to
store something
for 100 years,
如果你需要保存一样东西一百年,
how would
you do it?
你会怎么做?

Century-Scale Storage  世纪级存储

Maxwell Neely-Cohen   麦克斯韦·尼利-科恩

The Building by the Plum Orchard
梅园边的建筑

On the north side of downtown San Jose, tucked against a gentle curve in California State Route 87 and the Guadalupe River that it follows, sits a nondescript single-story off-white building with tinted windows. As of the time of writing this, signs exclaim that 99 Notre Dame Avenue is available for lease. The two adjacent lots are also empty and being used as parking lots, mostly as overflow for the municipal courthouse a few blocks away. Across the street the condos start, glassy and standardized, one pale red and cream and the next a mixture of aqua and silver, stretching to the sky for blocks east toward City Hall and south until you hit the looming corporate headquarters of Adobe Inc. The low building’s only active neighbor is a weightlifting and martial arts gym in the adjoining warehouse that gives no hint of what used to be built there. Otherwise it sits alone.
圣何塞市中心北侧,依偎着加州 87 号州际公路和它所沿着的瓜达卢佩河的一个缓和弯道,坐落着一栋其貌不扬的单层米白色建筑,窗户是深色的。截至撰写本文时,标牌显示 99 号诺特丹大道正在出租。相邻的两块地也空着,用作停车场,主要用于几条街外的市政法院的溢出停车。街道对面是公寓楼,玻璃幕墙,千篇一律,一栋是浅红奶油色,另一栋是青绿色和银色的混合色,向东一直延伸到市政厅,向南延伸到 Adobe 公司高耸的总部大楼。这栋矮小的建筑唯一活跃的邻居是一家位于相邻仓库里的举重和武术健身房,那里丝毫看不出曾经是什么建筑。除此之外,它孤零零地矗立在那里。

99 Notre Dame Avenue building

99 Notre Dame Avenue, 1953
诺特丹大道 99 号,1953 年

Reprint Courtesy of IBM Corporation © 2024
经 IBM 公司授权转载 © 2024

In the 1950s, 99 Notre Dame Avenue housed IBM’s first West Coast laboratory. Back then it overlooked a plum orchard. Between 1952 and 1956, a team of engineers led by a former high school science teacher designed and built the IBM 350 disk storage unit, part of the IBM 305 RAMAC, the first computer system that included something resembling a hard drive.
20 世纪 50 年代,位于诺特丹大道 99 号的是 IBM 在西海岸的第一个实验室。那时,它俯瞰着一个李子园。1952 年到 1956 年间,一个由前高中科学教师领导的工程师团队设计并制造了 IBM 350 磁盘存储单元,它是 IBM 305 RAMAC 系统的一部分,这是第一个包含类似硬盘驱动器的计算机系统。

Before RAMAC, to store and access computer data was a laborious process involving feeding stacks of punch cards through machines. Other early solutions, like storing data on magnetic tape, were effective but slow. The IBM team created spinning aluminum disks readable by a magnetic arm which allowed data to be retrieved in a literal blink. The 24-inch platters were stacked 50 at a time in a cylinder. They rotated at close to 1200 rpm. Even in the 1950s, with the room-sized console only capable of storing 3.75 megabytes and weighing over a ton, this machine could retrieve data in 800 milliseconds.
在 RAMAC 出现之前,存储和访问计算机数据是一个费力的过程,需要将成堆的穿孔卡片送入机器进行处理。其他早期的解决方案,例如将数据存储在磁带上,虽然有效,但速度很慢。IBM 团队创造了可由磁臂读取的旋转铝盘,这使得数据能够在眨眼间被检索。24 英寸的磁盘盘片一次可以 50 个叠放在一个圆柱体中。它们的旋转速度接近每分钟 1200 转。即使在 20 世纪 50 年代,这台占地一间屋子、重达一吨以上、仅能存储 3.75 兆字节的机器,也能在 800 毫秒内检索数据。

The revolutionary element of the hard disk drive was not that it stored data for computers—there were plenty of other methods for that—but that you could store data that could then be accessed almost instantly. Your storage could be constantly connected to your system, an integral component, representing a tremendous shift in both the technical and conceptual idea of what a computer could even be.
硬盘驱动器的革命性之处不在于它为计算机存储数据——当时有很多其他的方法可以做到这一点——而在于你可以存储数据,然后几乎可以立即访问这些数据。你的存储可以始终连接到你的系统,成为一个不可或缺的组成部分,这代表着计算机的实际功能和概念的巨大转变。

RAMAC is the ancestor of every hard drive, every server, every relational database, every cloud. 99 Notre Dame Avenue is its birthplace. For digital storage, this is the Trinity Test Site, the explosive center from which all else follows.
RAMAC 是所有硬盘驱动器、服务器、关系数据库和云的祖先。99 号诺特丹大道是它的诞生地。对于数字存储而言,这里是“三位一体测试场”,所有其他事物都由此爆炸中心衍生而来。

RAMAC’s massive aluminum disks were coated in iron oxide, with little magnetic slots for data, bits to be read as they spun. Originally marketed narrowly toward accountants, IBM built and leased around 1,000 RAMAC 305 systems to businesses that used punch card systems. But within six years, the IBM 350 storage unit was completely obsolete, replaced by new model numbers and new designs. The returned units were scrapped, one by one. The march to create something smaller, faster, denser, and cheaper forced them off the market in less than a decade.
RAMAC 巨大的铝制磁盘涂有氧化铁,上面布满了用于存储数据的微小磁槽,数据位在磁盘旋转时被读取。IBM 最初将其主要面向会计师进行营销,并向使用穿孔卡片系统的企业租赁了约 1000 套 RAMAC 305 系统。但在六年内,IBM 350 存储单元就完全过时了,被新的型号和新设计所取代。退回的单元一个接一个地被报废。追求更小、更快、更密集和更便宜的产品的步伐迫使它们在不到十年的时间里就退出了市场。

RAMAC actuator and disk stack

RAMAC actuator and
RAMAC 执行器和

disk stack   磁盘堆栈

Courtesy of the Computer History Museum
计算机历史博物馆提供

Only three RAMAC 305 systems and seven individual 350 disk drives in various configurations are known to have survived. A complete mechanical assembly of a 350 drive was restored in 2002 and sits in the collection of the Computer History Museum. According to a 2014 Wired magazine report,
目前已知仅存三套 RAMAC 305 系统和七个不同配置的 350 磁盘驱动器。一套完整的 350 驱动器机械组件于 2002 年修复,现收藏于计算机历史博物馆。根据《连线》杂志 2014 年的一篇报道,
Citation: Tech Time Warp of the Week: The World's First Hard Drive, 1956 (Wired, 2014)
本周科技时空扭曲:世界上第一块硬盘,1956 年(《连线》杂志,2014 年)
during that restoration researchers found data still present and readable on the 350, from a Canadian insurance company, car manufacturers, and the 1963 World Series. “The RAMAC data is thermodynamically stable for longer than the expected lifetime of the universe,” said Joe Feng, one of the engineers who worked on the restoration.
在修复过程中,研究人员发现这台 350 硬盘上仍然存在并可读取的数据,这些数据来自一家加拿大保险公司、几家汽车制造商以及 1963 年世界大赛。“RAMAC 数据在热力学上稳定存在的时间超过宇宙预期寿命,”参与修复工作的工程师之一乔·冯说。

From an isolated technical and engineering perspective, IBM created a storage medium that could last much longer than a hundred years, even long beyond any reasonable definition of forever, on the first try, without that even being the goal. Since the West Coast laboratory team was exploring a novel design and process, they used extremely hardy materials and mechanisms, focusing on functional reliability above all else. Yet today, 78 years later, the parts for the RAMAC are no longer being manufactured, and the machines that fabricated those parts no longer exist. It took the collaboration of several institutions to restore the necessary hardware to make possible the recovery of data off of a single unit that survived. So many other RAMAC drives did not make it. The theoretical longevity provided by its sturdy materials and robust mechanical design could not guarantee its continued use.
从纯粹的技术和工程角度来看,IBM 首次尝试就制造出了一种存储介质,其寿命远超百年,甚至远远超过任何对“永久”的合理定义,而这并非其目标。由于西海岸实验室团队正在探索一种新颖的设计和工艺,他们使用了极其耐用的材料和机制,将功能可靠性置于首位。然而,78 年后的今天,RAMAC 的零部件已不再生产,制造这些零部件的机器也不复存在。恢复必要的硬件以实现从幸存的单个单元恢复数据,需要多个机构的合作。许多其他 RAMAC 磁盘驱动器都没有幸存下来。其坚固的材料和强大的机械设计所提供的理论寿命并不能保证其持续使用。

In the present day, our records, our artifacts, our publications, and our art no longer only inhabit the physical world. The intellectual and cultural output that we rely on and consume predominantly lives on screens, electromagnetically stored in bits and transmitted through packets and wires. Over the past two decades museums and archives have raised and spent billions of dollars to digitize their holdings, to say nothing of the countless individual citizen archivists painstakingly assembling digital collections on their own. Our hardware and software infrastructure is not built for this reality. It is tailored to the short term, without any concern for its long-term durability.
如今,我们的记录、文物、出版物和艺术品不再仅仅存在于物理世界。我们依赖和消费的知识和文化产品主要存在于屏幕上,以电磁方式存储在比特中,并通过数据包和电线传输。过去二十年里,博物馆和档案馆已投入数十亿美元进行数字化收藏,更不用说无数个人公民档案管理员辛勤地自行组建数字收藏了。我们的硬件和软件基础设施并非为这种现实而建。它是为短期利益而设计的,丝毫不考虑其长期耐用性。

This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all. This is not a piece about the sheer volume of data we are creating each day, and how we might store all of it. Nor is it a piece about the extremely tough curatorial process of deciding what is and isn’t worth preserving and storing. It is about longevity, about the potential methods of preserving what we make for future generations, about how we make bits endure. If you had to store something for 100 years, how would you do it? That’s it.
本文探讨一个单一问题:如果你现在要将某些东西数字化存储 100 年,你该如何着手?你该如何存储这些数据以实现这一目标?当考虑一个世纪的未知风险时,我们的方法和平台是什么样的?本文并未涉及许多相关的主题和论述。本文并非关于我们每天创造的海量数据及其存储方式,也并非关于决定哪些内容值得保存和存储的极其艰难的策展过程。本文关乎数据的持久性,关乎保存我们为后代创造的成果的潜在方法,关乎如何使数据持久存在。如果你必须存储某些东西 100 年,你会怎么做?就是这样。

Even accounting for the human predilection for nice round numbers and the decimal system (10 fingers, 10 toes), 100 years is arbitrary. But 100 years is our metric precisely because it is attainable. It is a scale within the outer possibilities of a single human lifetime but not of a single human career. It is a duration that cannot be attained through individual force of will. It requires planning and organization across at least one generational replacement. It is a broad enough time period that the chance for social, economic, and technological change is absolute, yet close enough that all context does not collapse. It is survival from the end of the Napoleonic Wars to the beginning of World War I, from the invention of the shortwave radio to the age of Facebook, from the reign of King Henry V to Martin Luther publishing his Ninety-five Theses, from the first performance of Igor Stravinsky’s The Rite of Spring to the release of Daft Punk’s final album, from the Battle of Gettysburg to the signing of the Partial Nuclear Test Ban Treaty, from the patenting of the telephone to the release of the Apple I. We picked a century scale because most physical objects can survive 100 years in good care. It is attainable, and yet we selected it because the design of mainstream digital storage mediums are nowhere close to even considering this mark.
即使考虑到人们对整百数字和十进制的偏好(10 根手指,10 根脚趾),100 年也是一个任意的时间单位。但我们之所以选择 100 年作为衡量标准,正是因为它可以实现。它是一个处于单个人一生可能性范围之内,但又超出单个人职业生涯范围的尺度。它是一个无法通过个人意志实现的持续时间。它需要至少一代人的更替才能进行规划和组织。这是一个足够宽泛的时间段,社会、经济和技术的变革是绝对的,但又足够接近,不会使所有语境都崩溃。它涵盖了从拿破仑战争结束到第一次世界大战开始,从短波无线电发明到 Facebook 时代,从亨利五世国王统治时期到马丁·路德发表《九十五条论纲》,从伊戈尔·斯特拉文斯基的《春之祭》首演到达芬奇乐队的最后一张专辑发行,从葛底斯堡战役到签署《部分禁止核试验条约》,从电话获得专利到苹果 I 型电脑发布。我们选择百年尺度是因为大多数物理物体在精心保管下可以保存 100 年。它是可以实现的,但我们之所以选择它,是因为主流数字存储介质的设计根本没有考虑这个期限。

No single methodology that we discuss holds an obvious answer to this question, and that is fine, particularly because professional archivists recommend making and storing multiple copies of anything in multiple formats as a best practice. For example, the Smithsonian endorses
我们讨论的任何单一方法都不能直接解答这个问题,但这很好,尤其因为专业档案管理员建议将任何东西制作和存储多个副本,并采用多种格式,这是一种最佳实践。例如,史密森学会认可
Citation: Best Practices for Storing, Archiving and Preserving Data (Smithsonian Libraries, 2023)
数据存储、归档和保存的最佳实践(史密森尼图书馆,2023)
a “3-2-1 Rule” when it comes to data storage: “3 copies of the data, stored on 2 different media, with at least 1 stored off-site or in the cloud.” Or as archivist Trevor Owens puts it in his seminal text Theory and Craft of Digital Preservation,
数据存储的“3-2-1 规则”:“3 份数据副本,存储在 2 种不同的介质上,至少 1 份存储在异地或云端。”或者正如档案管理员特雷弗·欧文斯在其开创性著作《数字保存理论与实践》中所说,
Citation: The Theory and Craft
理论与技艺

of Digital Preservation  数字保存

(Trevor Owens, 2018)   (特雷弗·欧文斯,2018)
“In digital preservation we place our trust in having multiple copies. We cannot trust the durability of digital media, so we need to make multiple copies.” When storing digital data, archivists recommend utilizing file formats that are widespread and not dependent on a single commercial entity—in the words of the Smithsonian
数字保存的核心在于多份备份。我们无法依赖数字媒体的持久性,因此需要制作多份副本。正如史密森尼学会所说,在存储数字数据时,档案管理员建议使用广泛传播且不依赖于单一商业实体的文件格式。
Citation: Smithsonian Data Management Best Practices
史密森学会数据管理最佳实践

(Smithsonian Libraries, 2018)
(史密森学会图书馆,2018)
, “non-proprietary, platform-independent, unencrypted, lossless, uncompressed, [and] commonly used.” But at the century scale, even our most widely adopted file formats are completely untested. Digital history is not long enough to definitively settle on best practices.
即“非专有、平台无关、未加密、无损、未压缩且常用”。但从世纪尺度来看,即使是我们最广泛采用的文件格式也完全未经检验。数字历史不足以最终确定最佳实践。

With digital storage there will always be two separate but equal battlefields of maintenance to consider: maintenance of the digital holdings and software environments in which they live, and the simple physical maintenance of the hardware and architecture that contain them. Every technology and methodology we discuss keeps these sorts of principles in mind, that solutions are never singular (not should they be). But we still try to analyze how their design and nature stacks up against the rigors of a hundred-year scale. How they might each deal with the savage threats a century may bring.
数字存储始终存在两个独立但同等重要的维护战场:数字资产及其所在软件环境的维护,以及容纳它们的硬件和架构的物理维护。我们讨论的每项技术和方法都牢记这些原则,即解决方案从来都不是单一的(也不应该如此)。但我们仍然尝试分析其设计和性质如何应对百年尺度的严苛考验,以及它们如何应对一个世纪可能带来的巨大威胁。

Hard Drives  硬盘

Putting data on a hard drive is an act of writing. It is inscription with electromagnetism, which, given the right connected hardware and software, allows near-instant reading.
将数据写入硬盘是一种写入行为。它是利用电磁力进行的刻录,只要连接合适的硬件和软件,就能实现近乎即时的读取。

It’s tempting to say that they don’t make hard drives like they used to, and to a certain extent, it’s true. Seven decades of hard drive design have seen an unceasing sprint toward increased speed, capacity, density, efficiency, and short-term reliability, all while decreasing physical size and weight. Long-term reliability is a fringe concern. Hard drive manufacturers assume that anyone serious about their data will replace their storage set-ups at least once a decade as technology evolves.
人们很容易说现在的硬盘不如以前好,某种程度上来说,这是真的。七十年来,硬盘设计一直在朝着更高的速度、容量、密度、效率和短期可靠性不断发展,同时也在减小体积和重量。长期可靠性则是一个次要问题。硬盘制造商假设任何认真对待其数据的人都至少会十年更换一次存储设备,以适应技术发展。

Exposed internal hard drive

500GB Western Digital Scorpio Blue hard drive, 2013
500GB 西部数据酷鱼蓝硬盘,2013

Photo by Evans-Amos, licensed under GNU Free Documentation License
照片由 Evans-Amos 拍摄,根据 GNU 自由文档许可证授权使用

Contemporary hard drives come in two main flavors: hard disk drives, where data is stored on a spinning magnetic-coated metal platter, and solid state drives, where data is stored within interconnected semiconductor cells constructed as logic gates. The basic design and principles of contemporary hard disk drives are not all that different from RAMAC. A hard drive, both then and now, works by using electromagnetism to store bits within two states, +M or -M, yes or no, 1 or 0 imprinted onto an electromagnetic disk.
现代硬盘主要分为两种:硬盘驱动器 (HDD),数据存储在旋转的磁涂层金属盘片上;固态硬盘 (SSD),数据存储在作为逻辑门构建的互连半导体单元中。现代硬盘驱动器的基本设计和原理与 RAMAC 并无太大不同。无论过去还是现在,硬盘都是利用电磁力将比特存储在两种状态(+M 或 -M,是或否,1 或 0)中,并将其刻印在电磁磁盘上。

The fundamental issue with hard disk drives is that they are mechanical. A platter spins. An arm reads. There are motors involved. Actuators. Mechanisms. All small and inaccessible to regular maintenance. Mechanical parts, as a rule, fail. When things move, they break. Even despite this limitation, hard disk drives kept in the right conditions have the theoretical ability to last a long time. The magnetic disks that the data is actually stored on can be hardy. Their manufactured form factors seem to have stabilized over the last decade now that the race toward lighter and smaller is the province of solid state drives.
硬盘的基本问题在于它是机械式的。一个盘片旋转,一个磁头读取数据,其中涉及电机、执行器和各种机构,这些部件都很小,而且无法进行常规维护。机械部件通常会发生故障,活动部件容易损坏。即便如此,在合适的条件下,硬盘的理论寿命仍然很长。数据实际存储的磁性磁盘非常耐用。在过去十年中,随着轻薄小型化的竞争转向固态硬盘,它们的制造规格似乎已经稳定下来。

These SSDs, on the other hand, have the advantage of having no moving parts and high rates of reading and writing. They dominate the world around us: sitting in our pockets, as the standard in our computers, on every sort of server or system going for speed. In short-term everyday use, their lack of moving parts means they can handle movement and physical shock without mechanical breakage. This structure comes at a cost. Solid state drives have a finite lifespan, a limited number of times each cell can be written before the insulation that holds charged electrons inside the miniature transistors degrades. While these limits are high, and keep rising, they are still there, and when operating on a century scale they immediately become a concern, especially in a use case where data is being continually stored. SSDs also eventually lose their ability to store and hold data if left unpowered and unused for too long. While the exact spans vary depending on the hardware, any SSD-based long-term storage solution would require regular copying and re-copying, and careful management and placement of the drives in optimal conditions, particularly in terms of temperature. Even without mechanical parts, SSDs eventually fail with age. They have an expiration date.
与此相反,固态硬盘(SSD)的优势在于没有活动部件,并且具有高速读写速度。它们主导着我们周围的世界:在我们的口袋里,作为我们电脑的标准配置,以及所有追求速度的服务器或系统中。在日常短期使用中,由于没有活动部件,它们能够承受移动和物理冲击而不会发生机械损坏。但这种结构是有代价的。固态硬盘的寿命有限,每个单元格只能写入有限次数,之后用于将带电电子保持在微型晶体管内的绝缘层会退化。虽然这些限制很高,并且还在不断提高,但它们仍然存在,当以百年为尺度进行考量时,它们立即成为一个问题,尤其是在持续存储数据的用例中。如果长时间不使用且断电,SSD 最终还会失去存储和保存数据的能力。虽然确切的时间跨度取决于硬件,但任何基于 SSD 的长期存储解决方案都需要定期复制和重新复制,并仔细管理和放置驱动器以达到最佳条件,尤其是在温度方面。即使没有机械部件,SSD 最终也会随着时间的推移而失效。它们也有保质期。

1TB Patriot P210 Internal SSD, 2023

1TB Patriot P210
1TB Patriot P210 固态硬盘

Internal SSD, 2023   2023 款内部固态硬盘

Photo by Jacek Halicki, licensed under CC BY-SA 4.0
照片由 Jacek Halicki 拍摄,根据 CC BY-SA 4.0 许可使用

For those serious about long-term storage, single hard drives are never used alone. In 1988, the computer scientists Gareth Gibson, Randy Katz, and David Patterson proposed redundant arrays of inexpensive disks (RAID), positing that multiple commercially available drives could be superior in reliability to the centralized mainframe disk drives of the previous era. RAID is driven by the realization that it can be cheaper, easier, and more resilient to replace one small element of a large array than to replace one singular expensive piece of complicated equipment. It’s a component of what could be called digital Fordism, a digital order marked by the mass production of standardized hardware products meant for standardized software, used by the masses and specialists alike.
对于那些认真对待长期存储的人来说,单块硬盘永远不会单独使用。1988 年,计算机科学家 Gareth Gibson、Randy Katz 和 David Patterson 提出了廉价磁盘冗余阵列(RAID),认为多个商用硬盘在可靠性方面可能优于前一个时代的集中式大型机磁盘驱动器。RAID 的理念在于,替换大型阵列中的一个小型组件比替换一件昂贵且复杂的单一设备更便宜、更容易且更具弹性。它是所谓的数字福特主义的一个组成部分,这是一种数字秩序,其特点是批量生产标准化的硬件产品,用于标准化的软件,供大众和专家 alike 使用。

RAID levels and variants have evolved over the ensuing decades, allowing for different optimal configurations depending on the goal, chasing speed or capacity or reliability versus cost. For example, RAID 6, which consists of two parity blocks, can sustain two drive failures and still not result in a loss of data. Even the RAMAC, all those years ago, included a parity bit, an error detection system. A single bit set to zero or one, odd or even, designed to count the number of zeros or ones in a set of bits. If the parity bit has changed, an error has occurred. Higher-level RAID systems provide an extreme version of this, parity at the drive level, which can not only sustain multiple failures but detect and correct the affected data.
几十年来,RAID 级别及其变体不断发展,允许根据目标(追求速度、容量或可靠性与成本的平衡)进行不同的优化配置。例如,RAID 6 包含两个校验块,即使发生两次驱动器故障,也不会导致数据丢失。即使是多年前的 RAMAC,也包含一个奇偶校验位,这是一个错误检测系统。一个设置为零或一的单个位,奇数或偶数,用于计算一组位中零或一的数量。如果奇偶校验位发生变化,则表示发生了错误。更高级别的 RAID 系统提供了该技术的极端版本,即驱动器级别的奇偶校验,它不仅可以承受多次故障,还可以检测和纠正受影响的数据。

RAID arrays require maintenance, checks, and physical inspections when running over long periods. Attaining century-scale storage using hard drive systems is less a question of technology than one of institution-building, funding, real estate, logistics, culture, and a commitment to digitally preserving everything surrounding and interfacing with your storage system.
RAID 阵列在长期运行时需要维护、检查和物理检查。使用硬盘系统实现百年级存储,与其说是技术问题,不如说是机构建设、资金、房地产、物流、文化以及对数字化保存与存储系统相关联的一切内容的承诺问题。

Say you set up multiple top-of-the-line fancy RAID 6 servers in a dozen perfectly climate controlled bunkers around the world. To achieve century-scale storage, you would have to create, fund, and ensure the survival of an institution to maintain, financially support, and remember them. This institution would also have to preserve the file formats, software, hardware, operating system, and every other digital element the data you are storing relies on, and continue to develop means to access them. (This will be a recurring theme in this piece: Preserving digital data also requires preserving the means to access that data, just as preserving a book requires preserving the language in which it is written.)
假设你在全球各地气候控制完善的十几个掩体中,设置了多台顶级的、高级的 RAID 6 服务器。为了实现百年级的存储,你必须创建、资助并确保一个机构的生存,以维护、经济支持并记住这些服务器。这个机构还必须保存文件格式、软件、硬件、操作系统以及你存储的数据所依赖的每一个其他数字元素,并继续开发访问它们的方法。(这将是本文反复出现的主题:保存数字数据也需要保存访问这些数据的方法,就像保存一本书需要保存它所使用的语言一样。)

Past the scale of a human lifetime, technological solutions can become uncertain, and sometimes shockingly counterintuitive. For example, it would be easy to say we should completely write off contemporary SSDs for century-scale storage given the depth of commitment that they require. But the fact that a long-term storage scheme using SSDs would require hypervigilant care and maintenance, the development of a practice, could arguably be viewed as an advantage when it comes to century-scale storage. Part of the reason books have the capacity to last over a century or even a millennium is that eventually they have to be reprinted. Fragility, and the culture it creates, can be an asset in inspiring the sort of care necessary for the long term. A system that seeks the indestructible or infallible has the potential to encourage overconfidence, nonchalance, and the ultimate enemy of all archives, neglect.
超越人类寿命的规模,技术解决方案可能会变得不确定,有时甚至令人震惊地违反直觉。例如,鉴于当代固态硬盘(SSD)所需的投入之深,很容易说我们应该完全放弃它们用于百年级的存储。但事实上,使用固态硬盘的长期存储方案需要高度警惕的维护和保养,需要形成一种实践,这在百年级存储方面可以说是优势。书籍能够持续一个世纪甚至千年,部分原因在于它们最终需要重印。易损性及其所创造的文化,能够激励人们付出长期所需的关照,这是一种优势。一个追求坚不可摧或万无一失的系统,有可能滋生过度自信、漠不关心,以及所有档案馆的最终敌人——忽视。

The other major advantage that the hard drive has—speed of accessibility—is also worth considering as a factor in a century-scale storage solution. No need to swap out records, tapes, or reels. No need to rely on a connection to a broader global network or an independent company that might go out of business. Whatever data you want is there, in an instant. Your holdings can be shared and showcased without relying on any other party. If the collection you’re preserving isn’t exclusively for use 100 years from now, but rather is going to be used throughout the next 100 years, speed of accessibility must be balanced against other factors to some degree, particularly the security of storage. This is something that archivists, preservationists, and curators are constantly balancing in the professional management of physical institutional collections. Even so, accessibility can be a tremendous asset in building the sort of community that can survive over the long term. Proper care and maintenance require resources, funding, and labor. It’s hard to induce these while also hiding from the world.
硬盘的另一个主要优势——访问速度——也是百年尺度存储方案中值得考虑的一个因素。无需更换记录、磁带或卷轴;无需依赖更广泛的全球网络或可能倒闭的独立公司。任何你想要的数据都唾手可得。你的收藏可以被分享和展示,而无需依赖任何其他方。如果你保存的收藏并非专供 100 年后使用,而是要在未来 100 年中使用,那么访问速度必须在一定程度上与其他因素(特别是存储安全)相平衡。这正是档案管理员、保护人员和馆长在专业管理物理机构收藏时一直在权衡的事情。即便如此,易访问性对于建立能够长期生存的社区来说也是一项巨大的财富。妥善的维护保养需要资源、资金和人力。在隐瞒于世的同时,很难获得这些。

When used for anything resembling “cold” storage—storage designed to not be readily accessible, that can sit locked away for decades—hard drives present a risky selection. Fire, water, physical impact, heat, moisture, and static electricity are constant threats. As RAMAC teaches us, even when the data survives, hard drives are merely one component of an entire computer system that also needs to be preserved.
如果用于任何类似“冷”存储的东西——旨在不易访问、可以锁起来存放数十年的存储——硬盘则是一个冒险的选择。火灾、水灾、物理冲击、高温、潮湿和静电都是持续的威胁。正如 RAMAC 所教导我们的那样,即使数据幸存下来,硬盘也只是整个计算机系统的一个组成部分,该系统也需要保存。

RAMAC also shows us that hardware fragility is a choice. Hardware firms could make astronomically more resilient products under a different incentive structure. Just like with buggy software, the problem is not a lack of engineering know-how, but of shareholder pressure and the incentive for short-term growth. As it stands, hard drives themselves are cheap and getting cheaper. You can buy a 24 TB hard drive for $439, over a hundred dollars less than the cost of a single month of 24 TB storage on Amazon’s S3 cloud storage platform. We could have much hardier hard drives, but it would likely come at increased financial cost, or a complete engineering reorientation.
RAMAC 也向我们展示了硬件易损性是一种选择。在不同的激励机制下,硬件厂商可以制造出耐用性高得多的产品。就像有问题的软件一样,问题不在于缺乏工程技术知识,而在于股东压力以及对短期增长的激励。目前,硬盘本身很便宜,而且越来越便宜。你可以花 439 美元买到一块 24 TB 的硬盘,比亚马逊 S3 云存储平台一个月 24 TB 存储的费用低一百多美元。我们可以拥有更耐用的硬盘,但这可能会增加财务成本,或者需要彻底的工程重新定向。

With the right resources, hosting and using one’s own hard drive systems and servers could be a reasonable component of a long-term storage effort. They are accessible, always ready to be copied to other drives and mediums, and built, in a sense, to be upgraded to future versions of both hardware and software. Having an on-site backup of a digital archive, completely within your control, can be a worthy contingency if you are primarily relying on a distributed or cloud solution. Having a hard drive system as part of such a design, designed to be upgraded or updated every few years, is completely reasonable, as long as the need for future adaptation is understood and properly funded. The very fact that hard disk drives are designed to be replaced incentivizes us to think carefully about planning for the future. Whether we respond rationally to that incentive is another story.
有了足够的资源,托管和使用自己的硬盘系统和服务器可以成为长期存储工作中合理的一部分。它们易于访问,随时可以复制到其他驱动器和介质,并且从某种意义上说,可以升级到未来版本的硬件和软件。如果你主要依赖分布式或云解决方案,那么拥有一个完全在你控制之下的本地数字档案备份,可能是一个值得的应急措施。将硬盘系统作为此类设计的一部分,设计为每隔几年升级或更新一次,是完全合理的,只要理解未来适应的需要并获得适当的资金支持。硬盘驱动器本身就是为了被替换而设计的,这一事实激励我们认真思考未来的规划。我们是否会理性地回应这一激励,则是另一回事了。

The Cloud  

Google Drive passed one billion users in 2018.
谷歌云端硬盘在 2018 年突破十亿用户。
Citation: Google Drive is about to hit 1 billion users (The Verge, 2018)
Google Drive 即将达到 10 亿用户(The Verge,2018)
Dropbox currently claims 700 million registered users.
Dropbox 目前声称拥有 7 亿注册用户。
Citation: Fact Sheet (Dropbox)   Dropbox 事实清单 According to a 2022 10-K filing,
根据 2022 年 10-K 文件,
Citation: Amazon Web Services owns 11.9 million square feet of property, leases 14.1 million square feet (Data Center Dynamics, 2022)
亚马逊网络服务拥有 1190 万平方英尺的房产,租赁 1410 万平方英尺(数据中心动态,2022 年)
Amazon Web Services leases 14 million square feet of real estate and owns close to another 12 million square feet. We live and compute in the cloud’s world. The cloud is the dominant method for how we store data, and how the software that retrieves that data runs. Software platforms that don’t attempt to enforce the cloud as the default storage option are becoming an endangered species. Of the 39 archives, libraries, and collectors I surveyed for this project, 27 use a cloud storage service as the primary site of their digital collections. Of those, 18 use a separate cloud storage service as a secondary backup, and an additional four that have their primary storage on-site or in a decentralized scheme use a cloud storage service as a backup.
亚马逊网络服务租赁了 1400 万平方英尺的房地产,并拥有近 1200 万平方英尺的房地产。我们生活和计算在云的世界里。云是存储数据的主要方法,也是检索数据的软件运行方式。不尝试将云作为默认存储选项的软件平台正成为濒危物种。在我为这个项目调查的 39 个档案馆、图书馆和收藏机构中,有 27 个使用云存储服务作为其数字馆藏的主要站点。其中,18 个使用单独的云存储服务作为辅助备份,另外 4 个将主要存储放在本地或分散式方案中的机构使用云存储服务作为备份。

Satellite image of Google Data Center in The Dalles, Oregon

Google Data Center,
谷歌数据中心

The Dalles, Oregon   俄勒冈州达勒斯

Map data ©2024 Google
地图数据 ©2024 谷歌

The cloud is an aggregation of data centers. These data centers run servers, which rely on massive arrays and combinations of the computational and hard drive technologies we have already discussed to store large amounts of data for millions of clients. To store data in the cloud is to outsource that storage, to give it over to a custodian, a guardian, whose sole purpose is receiving, safeguarding, and delivering that data for whomever is willing to pay. This charge has proven lucrative for the companies that have chosen to undertake it. While the biggest players often own and run their own data centers, there are tens of thousands of smaller businesses that offer cloud services to any number of clients, including the bigger services themselves.
云是由多个数据中心组成的集合。这些数据中心运行着服务器,这些服务器依靠我们之前讨论过的海量计算和硬盘技术阵列及组合来存储数百万客户的大量数据。将数据存储在云中意味着将存储工作外包,将其交给一个保管者、守护者,其唯一目的是为任何愿意付费的人接收、保护和交付数据。这种收费模式已被证明对选择承担这项工作的公司非常有利可图。虽然最大的参与者通常拥有并运营自己的数据中心,但还有数万家小型企业为众多客户提供云服务,包括那些大型服务提供商本身。

According to Amazon,
据亚马逊称,
Citation: Celebrate Amazon S3’s 17th birthday at AWS Pi Day 2023 (Amazon, 2023)
庆祝亚马逊 S3 在 2023 年 AWS 圆周率日迎来 17 岁生日 (亚马逊,2023)
S3 “holds more than 280 trillion objects and averages over 100 million requests a second. To protect data integrity, Amazon S3 performs over four billion checksum computations per second.” The total of human production under the stewardship of Amazon S3, Microsoft Azure, and Google Cloud is unfathomable. It is a black hole, a star so dense it has collapsed unto itself and is only getting heavier and heavier.
S3“存储超过 280 万亿个对象,平均每秒处理超过 1 亿个请求。为保护数据完整性,亚马逊 S3 每秒执行超过 40 亿次校验和计算。”在亚马逊 S3、微软 Azure 和谷歌云的管理下,人类生产的总量难以估量。它就像一个黑洞,一颗密度如此之大的恒星,它已经坍缩成自身,并且越来越重。

The advantages of employing a cloud storage service are obvious. The burden of upgrading hardware and software is offloaded onto specialists. The physical demands of architecture, of physical protection from the elements, are no longer a concern for the customer. The cloud is accessible with an internet connection, and part of a massive existing infrastructure with many other stakeholders. This network effect of so many clients gives a sense of security unto itself, not unlike the “too big to fail” culture of banking. If you store your data trove in the same place as massive investment banks and the CIA,
使用云存储服务的优势显而易见。硬件和软件升级的负担转移给了专业人士。建筑的物理需求,以及对自然环境的物理防护,不再是客户的担忧。只要有互联网连接,就可以访问云,并且它是庞大现有基础设施的一部分,拥有许多其他利益相关者。如此众多客户的网络效应本身就带来了一种安全感,这与银行的“大到不能倒”文化类似。如果您将数据存储在与大型投资银行和中央情报局相同的地方,
Citation: The Details About the CIA's Deal With Amazon
中情局与亚马逊交易的细节

(The Atlantic, 2014)   (《大西洋月刊》,2014 年)
it’s easy to imagine that the power of your fellow stakeholders would have some effect on the reliability and continued availability of the product.
很容易想象,您的其他利益相关者的力量会对产品的可靠性和持续可用性产生一定的影响。

The cloud’s current data center regime is only designed for conditions of utter stability. The physical threats to data centers are not dissimilar to the threats faced by traditional libraries, with a few additions: fire, water, physical destruction, neglect of maintenance, power failures, connection failures, theft, vandalism, and the constant forever need for software that works. During the writing of this piece, in July 2024, a Crowdstrike update bug caused archives that were using Microsoft Azure’s cloud storage services to lose access to their holdings. Natural disasters, wars, and political upheavals are all capable of causing immediate and irrevocable disruptions. Despite the internet’s founding dream, its birth ideal, of being a telecommunications network that could survive a nuclear attack, it’s fairly certain any substantive nuclear exchange would render the cloud unusable. Even aside from such nightmare scenarios, the cloud is made possible by a relatively small number of undersea cables that require constant maintenance.
当前云数据中心体制只适用于绝对稳定的条件。数据中心面临的物理威胁与传统图书馆面临的威胁大致相似,只是增加了一些内容:火灾、水灾、物理破坏、维护疏忽、断电、连接故障、盗窃、破坏以及对持续有效软件的永恒需求。在撰写本文时(2024 年 7 月),Crowdstrike 的一次更新错误导致使用微软 Azure 云存储服务的档案馆无法访问其馆藏。自然灾害、战争和政治动荡都可能造成立即且不可逆转的破坏。尽管互联网的创立梦想,其最初的理想,是建立一个能够抵御核攻击的电信网络,但可以肯定的是,任何大规模的核交换都会使云计算不可用。即使撇开这些噩梦般的场景不谈,云计算也依赖于数量相对较少的需要持续维护的海底电缆。
Citation: The Cloud Under the Sea
海底云

(The Verge, 2024)   (The Verge,2024)
Any blue water naval power already has the firepower and capability to severely damage global access to the internet, and thus the cloud. The global geographic distribution of data centers heavily tilts toward the U.S. and Europe.
任何拥有蓝水海军实力的国家都拥有严重破坏全球互联网接入,进而破坏云计算的能力和火力。全球数据中心的地理分布严重偏向美国和欧洲。
Citation: Leading countries by number of data centers 2024
2024 年数据中心数量领先国家

(Statista, 2024)   (Statista,2024)
The cloud is fairly centralized, because the companies that run it are fairly centralized.
云计算相当中心化,因为运营它的公司也相当中心化。

Cloud storage requires paying someone, an outside entity, for as long as you are engaged in the act of storing. This can be more expensive than other methods (S3’s “Standard” option is $23 per TB as of November 2024), especially over extremely long periods. Amazon S3 has tried to combat this by offering a storage class for slower but more permanent storage, “Glacier,” designed to be competitive with offline cold storage options, by separating storage pricing from retrieval pricing (their “Flexible Retrieval” option is $3.60 per TB as of November 2024). But you still have to pay them. Every month or every year. Forever. You can turn off the machines that you own for a while and then turn them back on, and everything you stored will still be there, but if you stop paying your cloud storage fee the data is gone, probably forever.
云存储需要持续付费给外部实体,只要你还在使用存储服务。这可能比其他方法更贵(截至 2024 年 11 月,S3 的“标准”选项价格为每 TB 23 美元),尤其是在极长的时间跨度内。亚马逊 S3 试图通过提供一种速度较慢但更持久耐用的存储类别“Glacier”来解决这个问题,该类别旨在与离线冷存储选项竞争,方法是将存储价格与检索价格分开(截至 2024 年 11 月,他们的“灵活检索”选项价格为每 TB 3.60 美元)。但你仍然需要付费。每月或每年。永远。你可以暂时关闭你自己的机器,然后重新启动,你存储的所有内容仍然存在,但如果你停止支付云存储费用,数据就会丢失,很可能永远丢失。

The cloud requires trust. Assessing the cloud from the perspective of century-scale storage is less about the technical abilities and configurations on offer than about organizational structures, and even values. Right now, the dominant cloud storage options are exclusively administered by companies. In November of 2018, at an all-hands meeting in Seattle, an Amazon employee asked CEO and founder Jeff Bezos what he had learned from the recent bankruptcy of Sears. “Amazon is not too big to fail… In fact, I predict one day Amazon will fail.
云计算需要信任。从百年尺度的存储角度评估云计算,与其说是关于所提供的技术能力和配置,不如说是关于组织结构,甚至是价值观。目前,主要的云存储选项都完全由公司管理。2018 年 11 月,在西雅图的一次全体员工会议上,一名亚马逊员工问首席执行官兼创始人杰夫·贝佐斯,他从最近西尔斯公司的破产中学到了什么。“亚马逊并非大到不能倒……事实上,我预测亚马逊有一天会倒闭。”
Citation: Jeff Bezos makes surprise admission about Amazon's life span (Business Insider, 2018)
杰夫·贝佐斯对亚马逊寿命的意外承认(商业内幕,2018 年)
Amazon will go bankrupt. If you look at large companies, their lifespans tend to be 30-plus years, not a hundred-plus years,” Bezos answered. Bezos was right. Most companies do not last long. They get acquired or split up into pieces or go bankrupt or decline into something much smaller or are upended by catastrophic geopolitical events.
亚马逊会破产。贝佐斯回答说:“看看那些大公司,它们的寿命往往是 30 多年,而不是 100 多年。”贝佐斯是对的。大多数公司都活不长。它们会被收购、拆分、破产、衰落成规模更小的公司,或者被灾难性的地缘政治事件颠覆。

In 2012, Richard Foster, a professor at Yale School of Management, found
2012 年,耶鲁管理学院教授理查德·福斯特发现
Citation: Can a company live forever? (BBC, 2012)
公司能永存吗?(BBC,2012)
that the average lifespan of companies listed on the S&P 500 had been decreasing precipitously. It had dropped from 67 years in the 1920s to just 15 years. Most companies, even wildly successful behemoths, don’t even last 50 years, let alone a hundred.
标准普尔 500 指数成分股公司的平均寿命一直在急剧下降。从 20 世纪 20 年代的 67 年下降到仅 15 年。大多数公司,即使是极其成功的巨头,也活不过 50 年,更别说 100 年了。

It could be that data storage service companies are immune to this trend. Among the oldest companies in the United States that are not farms, pubs, breweries, or inns are several insurance companies. The Philadelphia Contributorship (1752), Insurance Company of North America (now known as Chubb, 1792), and Baltimore Equitable (1794) are all in their third century of business. This is also true globally (Bilsener Gilde of Germany, 1642; Hamburger Feuerkasse of Germany, 1676; Lloyd’s of London, 1688). These are companies engaged in the long-term management of assets, fluent in risk. One could imagine a storage service or data center adopting the sort of culture that might lead to similar longevity, but the cloud is so far the province of companies that are mostly nonspecialized, with cultures focused on driving growth-exploding paradigm shifts rather than stability.
数据存储服务公司或许能免受这种趋势的影响。美国一些最古老的非农场、酒吧、酿酒厂或旅馆的公司,包括几家保险公司。费城互助保险公司 (1752 年)、北美保险公司 (现称 Chubb,1792 年) 和巴尔的摩公平保险公司 (1794 年) 都已运营了三个世纪。这种情况在全球范围内也普遍存在(德国比尔森行会,1642 年;德国汉堡消防保险公司,1676 年;伦敦劳合社,1688 年)。这些公司长期从事资产管理,精通风险管理。人们可以想象一家存储服务公司或数据中心会采用可能带来类似长寿的文化,但云计算目前主要由非专业公司主导,这些公司的文化更注重推动增长——爆炸性的范式转变,而不是稳定性。

Over the long term, there is serious risk that the three dominant players in cloud storage, or other upstart firms that wish to follow in their ilk, discontinue their services out of necessity, preference, or optimization as they move onto whatever the next shiny thing is that they believe can drive growth. Google has been particularly guilty of this behavior, shutting down its own products with such regularity that it has become a running joke among those of us who are terminally online.
长期来看,云存储领域的三大巨头,或其他希望效仿它们的初创公司,出于必要、偏好或优化的考虑,可能会停止其服务,转而追逐他们认为能够推动增长的下一个“闪亮”事物,这存在严重风险。谷歌在这方面尤其“劣迹斑斑”,其关闭自家产品的频率之高,已成为资深网民们茶余饭后的笑谈。
Citation: Killed by Google  被谷歌关闭 During the course of writing this piece Google announced that it was killing its URL Shortener, actively contributing to link rot and abetting the degradation of the world wide web.
在撰写本文期间,谷歌宣布其将关闭网址缩短服务,这实际上加剧了链接腐烂,并助长了万维网的退化。

To trust the entities offering cloud storage on a century scale would require a shift in their ethics, culture, terms, and contracts, but we can imagine what that shift could look like.
要相信那些提供百年尺度云存储的实体,需要它们在伦理、文化、条款和合同方面发生转变,但我们可以想象这种转变可能是什么样子。

Every time I’ve spoken to a gallery guard at an art museum, I’ve found that they hold a deep sense of reverence and responsibility for the collections under their watch. This has been repeatedly reflected in both journalism covering the people with these jobs
每当我与美术馆的保安交谈时,我都会发现他们对馆藏作品怀有深深的敬畏感和责任感。这一点在报道这些工作岗位人员的新闻报道中也反复体现。
Citation: The Secret Lives of Museum Guards (The New Yorker, 2015)
《博物馆警卫的秘密生活》(《纽约客》,2015)
and their own accounts.
以及他们自己的账户。
Citation: All the Beauty in the World:
世界上所有的美好:

The Metropolitan Museum of Art and Me
大都会艺术博物馆与我

(Patrick Bringley, 2023)
(帕特里克·布林格利,2023)
They stand there, day after day, dealing with entitled tourists, escaped toddlers, food smugglers, hyped-up school groups, and some of the more annoying varieties of possible human behaviors, and they don’t seem to lose a sense of the importance of their work. Even when the institutions they work for don’t pay them enough, this sense of reverence is retained. Every single day they are doing their best to protect that which is in their care. They feel a sense of duty during their time in those rooms.
他们日复一日地站在那里,应对自以为是的游客、逃脱的幼儿、走私食品的人、兴奋的校外旅行团,以及一些更令人讨厌的人类行为,但他们似乎并没有失去对工作重要性的认识。即使他们工作的机构付给他们的薪水不足,这种敬畏感依然保留着。他们每天都在尽最大努力保护他们所守护的东西。他们在那些房间工作期间都感到责任重大。

I have struggled to find that similar sense of duty and reverence when interviewing big tech employees working on storage products, but there is no reason it could not exist. I’ve certainly observed their desire to care for the personal, the sense that they should make something that works so people don’t lose their treasured memories. They also strive for reliability on behalf of customers that are businesses, understanding that a donut shop needs their accounting records to work or they might shut down. But outside of a rare moment of marketing and a few moments of rhetoric, I feel comfortable generalizing that a culture of stewardship, a sense of the stakes, is not widespread within technology companies, nor building it a priority.
在采访大型科技公司从事存储产品工作的员工时,我很难找到那种类似的责任感和敬畏感,但这并不意味着这种感觉不存在。我确实观察到他们渴望关注个人需求,希望创造出能够正常运行的产品,以免人们丢失珍贵的回忆。他们还努力为企业客户提供可靠性,明白一家甜甜圈店需要他们的会计记录正常工作,否则可能会倒闭。但除了偶尔的市场营销和一些辞藻之外,我认为可以概括地说,在科技公司内部,对数据保管的文化、对利害关系的认识并不普遍,也并非优先考虑的事项。

The current web pages and marketing for Microsoft Azure and Google Cloud do not mention cultural or historical preservation at any point. Only Amazon S3 mentions the concept, presenting the case study of migrating the BBC’s 100-year-old archive
目前微软 Azure 和谷歌云的网页和营销材料中,没有任何地方提及文化或历史保护。只有亚马逊 S3 提到了这个概念,并以迁移 BBC 百年档案的案例研究为例。
Citation: The BBC Preserves 100 Years of History Using Amazon S3 (BBC, 2024)
英国广播公司利用亚马逊 S3 保存百年历史(BBC,2024)
to its Glacier system between case studies involving Salesforce data lakes and generative AI.
将其冰川系统置于涉及 Salesforce 数据湖和生成式 AI 的案例研究之间。

At this precise moment all of these services mention AI (a lot) and how it’s going to change everything. “Is your infrastructure AI-ready?” Microsoft Azure’s landing page asks. Google Cloud encourages you to “build what’s next in generative AI.” Two years ago their marketing materials mentioned web3 and the metaverse (a lot) and how it was going to change everything, and how if your business did not adapt you were going to be left behind—yet those sentiments no longer appear.
现在,所有这些服务都在大量提及 AI 及其颠覆一切的能力。“你的基础设施准备好迎接 AI 了吗?”微软 Azure 的登录页面问道。谷歌云则鼓励你“构建生成式 AI 的未来”。两年前,他们的营销材料大量提及 Web3 和元宇宙及其颠覆一切的能力,并警告企业如果不适应就会被淘汰——但这些说法现在已经消失了。

“I didn’t even know we had any clients like that,” a Microsoft product manager told me when I asked how she felt about protecting archives. “I have a hard time convincing anyone else it matters,” an Amazon engineer said. “There are some higher-ups that genuinely seem to care more, but that doesn’t filter down.”
“我甚至不知道我们有这样的客户,”一位微软产品经理在我问她对保护档案有何看法时告诉我。“我很难说服其他人重视这个问题,”一位亚马逊工程师说。“确实有一些高层人士似乎更关心这个问题,但这并没有向下渗透。”

The cloud does not exist in a vacuum. It is dependent on a far-reaching fabric of interactions, telecoms, internet service providers, and hardware manufacturers, all of which are motivated by timescales far removed from a century (often the daily whims of the market and a fiduciary responsibility to shareholders to maximize returns on a quarterly basis). The Jack Welch school of shareholder supremacy is completely incompatible with the sorts of values that would ensure a cloud storage provider would reliably exist for a century.
云并非独立存在。它依赖于一个广泛的交互网络,包括电信公司、互联网服务提供商和硬件制造商,所有这些参与者的动机都与百年尺度相去甚远(通常是市场的日常变化以及对股东最大化季度收益的受托责任)。杰克·韦尔奇式的股东至上论与确保云存储提供商能够可靠地存在一个世纪的价值观完全不相容。

Given the network effect of mass reliance on the cloud, one would hope there is a corporate culture of responsibility around safety and the seriousness of their charge. But such corporate cultures, even when they do exist, are fragile.
鉴于对云的大规模依赖所产生的网络效应,人们希望企业文化中能体现出对安全和自身责任的重视。但即使存在,这种企业文化也很脆弱。

Another manifestation of the cloud is as a means with which digital products are currently offered, sold, and consumed. Digital copies of books, film, music, games, and journalism, some released under subscription models, some available on digital marketplaces, are all part of our cloud environment. Over the past decade, the collection development budgets of most libraries have moved steadily away from physical books
云的另一种表现形式是目前数字产品提供、销售和消费的一种手段。书籍、电影、音乐、游戏和新闻的数字副本,有些采用订阅模式发行,有些在数字市场上销售,都是我们云环境的一部分。在过去十年中,大多数图书馆的馆藏发展预算已稳步摆脱实体书籍。
Citation: A Complex Landscape | Budgets and Funding 2024
复杂的局面 | 2024 年预算和资金

(Library Journal, 2024)   (《图书馆期刊》,2024)
and towards cloud-based digital subscriptions.
并转向基于云的数字订阅。

A potential approach to century-scale storage for any published work of literature, knowledge, art, or science would be to simply trust rights holders and IP developers to keep all that they offer safe until it enters the public domain, but under current conditions such trust is impossible. The publicly traded corporation, as an entity operating in today’s paradigm of shareholder supremacy and today’s copyright law, cannot be trusted as a partner in the preservation of anything. We have seen video game companies fail to preserve their own IP and then ruthlessly pursue litigation when fans attempt pick up the slack, film studios refuse to release entire films and shelve television shows in order to claim write-offs and avoid paying residuals, digital marketplaces disable content that consumers have already purchased, an entire generation of web art rendered nonfunctional,
一种潜在的方法,可以用于保存任何已发表的文学、知识、艺术或科学作品长达一个世纪,那就是简单地信任权利持有者和知识产权开发者,让他们保管他们提供的所有内容,直到这些内容进入公共领域。但在目前的条件下,这种信任是不可能的。作为在当今股东至上模式和版权法下运作的实体,上市公司并不可靠,无法成为任何内容保存的合作伙伴。我们已经看到,视频游戏公司未能保存自己的知识产权,然后在粉丝试图弥补不足时却无情地提起诉讼;电影制片厂拒绝发行完整的电影,并将电视剧下架,以申报减记避免支付剩余款项;数字市场禁用消费者已购买的内容;整整一代的网络艺术变得无法使用,
Citation: Emulation or it Didn’t Happen (Rhizome, 2020)
模拟或从未发生过(根茎,2020)
and countless songs, books, shows, games, software, and even entire formats simply vanish without warning, even when contained within the “libraries” of customers who allegedly “own” them.
无数歌曲、书籍、节目、游戏、软件,甚至整个格式都可能毫无预兆地消失,即使它们包含在声称“拥有”它们的客户的“资料库”中也是如此。

This process is constant and ongoing. During the short period in which this piece was written, Paramount unexpectedly deleted the entire archive of MTV News, including work which does not appear to have been saved by the Internet Archive’s Wayback Machine; GameStop abruptly shut down Game Informer and disappeared their archives, which as recently as 2011 was one of the three highest circulating magazines in the US; and reporting revealed that some of the archives of some of the most storied local newspapers in the country, including the Village Voice, had been taken over by LLM-generated clickbait. These may seem like small things. They are not. In the case of MTV News, two decades of documentation that included some of the most impactful moments in music history were put at risk of loss.
这个过程是持续不断的。在撰写本文的短暂期间,派拉蒙公司意外删除了 MTV 新闻的全部档案,其中包括互联网档案 Wayback Machine 似乎也没有保存的作品;GameStop 突然关闭了 Game Informer 并删除了其档案,而该杂志在 2011 年还是美国发行量最高的杂志之一;报道显示,包括《村声》在内的一些美国最具历史意义的地方报纸的档案已被LLM生成的点击诱饵所取代。这些似乎是小事。但事实并非如此。就 MTV 新闻而言,二十年来积累的记录,其中包括音乐史上一些最具影响力的时刻,正面临丢失的风险。

While various international law frameworks for the protection of cultural and intellectual heritage during wars have existed since the 19th century, no such frameworks exist for peacetime. If they did, they might radically change our capacity to trust both the custodians of the cloud and corporate rights holders. A different legal and civic order would affect this entire analysis. But for now, the cloud is only governed by itself.
尽管自 19 世纪以来就存在各种保护战争期间文化和知识遗产的国际法框架,但在和平时期却没有此类框架。如果存在这样的框架,它们可能会从根本上改变我们信任云数据保管者和企业权利持有者的能力。不同的法律和公民秩序将影响整个分析。但就目前而言,云端仅受自身约束。

Removable Media  可移动介质

The oldest vinyl record I own is a 1951 pressing of Tchaikovsky’s Piano Concerto No. 1 in B-flat Minor, played by Vladimir Horowitz and the NBC Symphony Orchestra at Carnegie Hall. On the crimson cover an illustrated pair of disembodied hands play an angled keyboard. It’s worth between $3 and $12 at present. The record is a re-press, as the recording was made in 1943 (of sheet music first published in 1875 and revised into its final form in 1890). Recordings of the concert also exist on YouTube,
我收藏的最古老的唱片是 1951 年压制版柴可夫斯基的降 B 小调第一钢琴协奏曲,由弗拉基米尔·霍洛维茨和 NBC 交响乐团在卡内基音乐厅演奏。深红色的唱片封面上绘有一双悬浮的双手在演奏倾斜的键盘。目前它的价值在 3 到 12 美元之间。这张唱片是再版唱片,因为录音制作于 1943 年(乐谱首次出版于 1875 年,并于 1890 年修订为最终版本)。这场音乐会的录音也存在于 YouTube 上,
Citation: Vladimir Horowitz-Toscanini: Tchaikovsky Concerto No. 1, Op. 23 (1943/NBC Symphony Orchestra) (YouTube, 2024)
弗拉基米尔·霍洛维茨-托斯卡尼尼:柴可夫斯基第一钢琴协奏曲,作品 23(1943 年/美国国家广播公司交响乐团)(YouTube,2024 年)
digitally imported from various analog versions. The record on my shelf is not yet a successful implementation of century-scale storage, depending on your definition, but it’s getting there. If you go on Discogs, the crowdsourced tool for cataloging music releases that also functions as a global marketplace, you can shop for thousands of records that are over a century old.
是从各种模拟版本数字导入的。我架子上的这张唱片,根据你的定义,还算不上是成功的世纪级存储方案,但它正在接近。如果你访问 Discogs——一个众包的音乐发行目录工具,同时也充当全球市场——你可以找到数千张超过百年历史的唱片。
Citation: Shop Vinyl Records, CDs, and More released in 1902 to 1925 (Discogs)
1902 年至 1925 年发行的黑胶唱片、CD 及更多(Discogs)
You need to have a turntable capable of rotating at 78-rpm speed in order to play them, but if you do, you can listen to a recording from over a hundred years ago stored on an analog shellac disc.
要播放这些唱片,你需要一台能够以 78 转/分的转速旋转的唱机。但如果你有的话,就能聆听一张百年前录制在黑胶唱片上的录音。

The most famous examples of faith in the stability of records as a storage format are currently 15.2 billion miles and 12.7 billion miles from the Earth respectively. They are traveling at 38,026.77 mph and 34,390.98 mph relative to Earth, beyond the heliosphere in interstellar space. The Voyager Golden Records housed on the Voyager space probes are made from gold- and nickel-plated copper instead of vinyl. They contain greetings in 55 human languages; 26 musical recordings including the works of Blind Willie Johnson, Chuck Berry, and Johann Sebastian Bach; field recordings of nature; and 116 encoded images of life. Each record is designed to last over a billion years.
目前距离地球分别约为 244 亿公里和 204 亿公里的旅行者 1 号和旅行者 2 号探测器,是记录存储格式稳定性最著名的例子。它们相对于地球的速度分别为每小时 61360 公里和 55390 公里,已飞出日球层,进入星际空间。旅行者探测器上搭载的金唱片并非采用黑胶唱片材质,而是由镀金和镀镍的铜制成。唱片包含 55 种人类语言的问候语;26 首音乐录音,包括盲威利·约翰逊、查克·贝里和约翰·塞巴斯蒂安·巴赫的作品;自然界的实地录音;以及 116 幅编码的生命图像。每张唱片的预期寿命超过十亿年。

The Golden Record is the ultimate outlier, a small preview of what might be achieved if a society brought real resources to the preservation and curation of cultural production. Most removable storage is not designed to travel the cosmos and last for over a billion years. Still, even in their less durable versions, records, tapes, and optical discs represent a storage regime that is both replicable and distributable, as well as standardizable enough that consumer hardware can access it, and display it immediately. These formats are by nature air-gapped, immune to cyberattack, buggy software updates, and accidental deletions.
金唱片是最终的例外,它预示着如果一个社会真正投入资源来保存和管理文化产品,将会取得怎样的成就。大多数可移动存储设备并非设计用于星际旅行,也无法保存超过十亿年。即便如此,即使在不太耐用的版本中,唱片、磁带和光盘也代表着一种既可复制又可分发的存储机制,并且标准化程度足够高,消费者硬件可以访问并立即显示其内容。这些格式天生就是“隔空”的,不受网络攻击、软件错误更新和意外删除的影响。

The cover of the Golden Record

The Golden Record, 1977
旅行者金唱片,1977

Public domain, NASA/JPL   公共领域,美国宇航局/喷气推进实验室

Optimized for the rigors of a past era of physical commerce, removable media formats all have some degree of shelf stability, but like everything else they wither under the stresses of time. Vinyl records hate heat and wear out with each listen. Cassette tapes slowly shed their oxide layer, especially under high heat and humidity, while magnetic fields, even those resulting from small consumer electronics, wreak havoc on a tape’s magnetic particles. VHS tapes experience similar afflictions. Fungi attack floppy disks. Motion picture film reels are tormented by an array of maladies. The cellulose nitrate film base used in the first half of the 20th century is known to disintegrate into dust when it isn’t literally exploding into spectacular flame. The cellulose acetate that replaced it suffers from “vinegar syndrome,” where its acetate base decays, emitting a pungent smell. Polyester film is troubled by fading colors. Film archives now store many reels in humidity-controlled frozen vaults to slow down these reactions.
为了适应过去实体商业的严苛环境,可移动媒体格式都具有一定的保存期限,但和其他事物一样,它们也会随着时间的推移而衰败。黑胶唱片怕热,每次播放都会磨损。磁带,尤其是在高温高湿的环境下,会慢慢脱落氧化层,而即使是小家电产生的磁场也会破坏磁带的磁性颗粒。VHS 磁带也会出现类似的问题。真菌会侵蚀软盘。电影胶片卷轴则饱受各种病害的折磨。20 世纪上半叶使用的硝酸纤维素胶片基材,如果不直接爆炸成壮观的火焰,就会分解成粉尘。取代它的醋酸纤维素则会发生“醋酸综合症”,其醋酸基材会腐烂,并发出刺鼻的气味。聚酯薄膜则会出现褪色问题。现在,电影档案馆将许多胶片卷轴储存在湿度可控的冷冻库中,以减缓这些反应。

Digital removable form factors like CDs, DVDs, and other optical discs are plagued by chaotic and unpredictable chemical meltdowns that erode their playback capability. There is no consensus on a singular cause of “disc rot” because manufacturing standards for CDs and DVDs were so varied that it’s impossible to identify a universal accelerant. Some discs appear totally fine, ready to last hundreds of years. Others seem almost like they were designed to degenerate after an appallingly short amount of time, as if they were not Céline Dion’s Greatest Hits but intentionally self-destructing messages out of Mission Impossible. Some discs bronze or speckle. Some shed little strips of their reflective layer. Some go bizarrely translucent, a magic trick, as if their previous existence were an illusion.
数字化可移动存储介质,如 CD、DVD 和其他光盘,则饱受混乱且不可预测的化学分解的困扰,这些分解会侵蚀它们的播放能力。“光盘腐烂”的原因尚无定论,因为 CD 和 DVD 的制造标准差异很大,因此无法确定普遍的加速因素。有些光盘看起来完好无损,似乎可以保存几百年。而另一些光盘则似乎像是被设计成在极短的时间内退化,仿佛它们不是席琳·迪翁的精选集,而是《碟中谍》中故意自毁的信息。有些光盘会变成古铜色或出现斑点。有些会脱落一小条反射层。有些则变得不可思议地半透明,像变魔术一样,仿佛它们之前的存在只是一个幻觉。

Humidity, pests, light, magnetism, and heat can destroy almost anything. Mitigating them can help preserve almost anything.
湿度、害虫、光照、磁场和高温几乎可以破坏任何东西。减轻这些因素的影响有助于保护几乎任何东西。

With the right conditions and care, any of these formats can last an awfully long time, but at that point they are artifacts like any other physical singular works selected for preservation. They are sculptures, tapestries, costumes, or paintings. The storage medium is just as much the treasured object as the content it is holding. And the ability to play or access the media on them must also be preserved. In 2021, lawyer and FOIA expert Michael Ravnitzky filed a request for copies of video footage of a lecture by legendary computer scientist Admiral Grace Hopper that were present in the National Security Agency’s archives. The NSA denied the request
在合适的条件和精心照料下,任何这些格式都可以保存很久,但到那时它们就和其他任何被选中用于保存的独特的物理作品一样,成为了文物。它们是雕塑、挂毯、服装或绘画。存储介质本身就如同其所承载的内容一样珍贵。而且,播放或访问其上的媒体的能力也必须得到保存。2021 年,律师兼信息自由法专家迈克尔·拉夫尼茨基申请复制传奇计算机科学家格雷斯·霍珀海军上将的讲座视频片段,这些片段存在于国家安全局的档案中。国家安全局拒绝了这一请求。
Citation: Admiral Grace Hopper’s landmark lecture is found, but the NSA won’t release it (Muckrock, 2024)
发现了格雷斯·霍珀海军上将具有里程碑意义的演讲,但国家安全局拒绝公开(Muckrock,2024)。
in May of 2024, stating that the agency no longer owned a machine capable of playing back the AMPEX video tapes in their collection.
2024 年 5 月,该机构声明其不再拥有能够回放其收藏的 AMPEX 录像带的机器。

The Unicorn Defends Itself (from the Unicorn Tapestries), depicting a unicorn surrounded by hunters

The Unicorn Defends Itself, from the Unicorn Tapestries c.1495-1505
独角兽自卫,来自约 1495-1505 年的独角兽挂毯

Public domain, part of The Met's Open Access Initiative
公共领域,大都会艺术博物馆开放获取倡议的一部分

If you don’t plan on preserving a turntable along with a record collection, you’d better be able to build one. This is where analog formats have a potential long-term advantage over digital ones. The creators of the Voyager Golden Record reckoned that it’s easier to communicate instructions on how to build a record player to a future viewer than it is to communicate how to build a computer. Anthropomorphic issues when theorizing billion-year-in-the-future interactions with aliens aside, it certainly seems true that constructing a simple mechanical device is easier than constructing a multi-level hardware and software system.
如果你不打算保留一台唱机和唱片收藏,那你最好能自己做一个。这就是模拟格式相较于数字格式潜在的长期优势所在。“旅行者金唱片”的创造者们认为,向未来的观者传达如何制作一台唱机的说明比传达如何制作一台电脑更容易。暂且不考虑在设想与未来数十亿年后的外星人互动时出现的人格化问题,构建一个简单的机械装置确实比构建一个多层次的硬件和软件系统更容易。

For the purposes of century-scale storage, a knock against these formats might be that as mass consumer products they are already largely obsolete, replaced by a vast streaming infrastructure that intangibly beams content down from the cloud. However, even in a minority capacity, many of them continue to have lives as products. Vinyl records have been fully resurrected now for over a decade, having even surpassed sales of CDs, the format that supposedly marked their demise. Gen Z collectors are buying cassette tapes
就几个世纪的存储而言,这些格式的一个缺点可能是,作为大众消费产品,它们已经基本上过时了,被庞大的流媒体基础设施所取代,后者以无形的方式从云端传输内容。然而,即使只是少数,它们中的许多仍然作为产品存在。黑胶唱片在十多年前就已完全复兴,甚至超过了 CD 的销量,而 CD 曾被认为是导致黑胶唱片消亡的格式。Z 世代收藏家们正在购买磁带
Citation: Gen Z Loves Cassettes. But Wait, How Do These Things Work? (Wall Street Journal, 2024)
Z 世代爱上了磁带。等等,这玩意儿怎么用?(《华尔街日报》,2024)
in massive quantities. Even when sold, produced, or maintained as niche offerings, these formats have serious potential that merits serious consideration.
大量生产。即使作为小众产品进行销售、生产或维护,这些形式也具有值得认真考虑的巨大潜力。

In 1998, the Japanese Diet passed the Electronic Books Preservation Act, which required certain tax and accounting data to be stored digitally for 100 years. The government created and mandated a quality standard for optical discs that could reach this mark. Pioneer developed an optical disc drive, the BDR-WX01DM, and archival-level discs to meet this standard. Equally promising are M-DISCs, a slightly thicker Blu-ray disc design manufactured by both Ritek and Verbatim, which claim a lifespan of 1000 years when properly stored. M-DISCs passed both ECMA and ISO/IEC standards with a rating of several hundred years, and different durability and stress tests of M-DISCs have confirmed a high degree of resilience depending on conditions. But both formats are new, and these claims are untested in temporal reality. And like all other removable media, they require a physical space to be held, kept safe, and remembered.
1998 年,日本国会通过了《电子书籍保存法》,要求某些税务和会计数据以数字形式保存 100 年。政府制定并强制执行了光盘质量标准,以达到这一目标。先锋公司开发了一种光盘驱动器 BDR-WX01DM 和档案级光盘以满足这一标准。同样令人鼓舞的是 M-DISC,这是一种略厚的光盘,由理光和威宝公司生产,据称在适当储存条件下寿命可达 1000 年。M-DISC 通过了 ECMA 和 ISO/IEC 标准,评级为几百年,不同的 M-DISC 耐用性和压力测试证实了其高度的耐用性(取决于条件)。但两种格式都很新,这些说法尚未经时间的检验。而且,像所有其他可移动介质一样,它们需要物理空间来存放、保管和记忆。

Tape drives, which have been used for mainframe computer storage since the 1950s, are remarkably enduring. Now specifically designed for long term “cold” storage, tape drives have high capacities. As IBM’s Mark Lantz wrote in 2018,
磁带驱动器自 20 世纪 50 年代以来一直用于大型计算机存储,其耐用性非常出色。现在,磁带驱动器专门设计用于长期的“冷”存储,具有高容量。正如 IBM 的 Mark Lantz 在 2018 年所写:
Citation: Why the Future of Data Storage is (Still) Magnetic Tape
数据存储的未来仍然是(磁带)

(IEEE Spectrum, 2018)   (IEEE Spectrum,2018)
“a single robotic tape library can contain up to 278 petabytes of data. Storing that much data on compact discs would require more than 397 million of them, which if stacked would form a tower more than 476 kilometers high.” Unlike the magnetic tape in cassettes or VHS tapes, the tape itself is much more resilient to physical degradation, and enclosed in cartridges made of heartier materials. Tape requires far less energy and computational minding than other types of computer hard drive systems.
单个机器人磁带库最多可容纳 278 PB 的数据。如果用光盘存储这么多数据,需要超过 3.97 亿张,堆叠起来的高度将超过 476 公里。与盒式磁带或 VHS 磁带中的磁带不同,这种磁带本身更耐物理降解,并且封装在更耐用的盒子里。与其他类型的计算机硬盘系统相比,磁带需要的能量和计算资源要少得多。

Internals of Ampex Fine Line F-44, a 3-head Ampex home-use audio tape recorder, c. 1965

Internals of Ampex Fine Line F-44, a 3-head Ampex home-use audio tape recorder, c. 1965
安培克斯 F-44 细线型录音机内部结构,一款 3 磁头家用录音机,约 1965 年

Photo by Gregory F. Maxwell, licensed under GNU Free Documentation License
摄影:Gregory F. Maxwell,根据 GNU 自由文档许可证授权

Tape also has the advantage of being relatively cheap, far cheaper over the medium term than cloud-based offerings or constantly upgrading your own server or hard drive hardware. In 2021, IBM priced their own LTO-9 tape solutions at $5.89 a terabyte.
磁带还具有相对便宜的优势,从中期来看,比基于云的解决方案或不断升级您自己的服务器或硬盘硬件要便宜得多。2021 年,IBM 将其自己的 LTO-9 磁带解决方案定价为每 TB 5.89 美元。
Citation: IBM ships new LTO 9 Tape Drives with greater density, performance, and resiliency (IBM, 2021)
IBM 发布更高密度、性能和弹性的新型 LTO 9 磁带驱动器 (IBM,2021)
It’s no accident that tape drives are the standard for most large-scale cold storage. Most film companies and film libraries, banks, insurance agencies, law firms, and national archives keep a copy of at least some of their data on magnetic tape.
磁带驱动器成为大多数大型冷存储的标准并非偶然。大多数电影公司和电影资料库、银行、保险公司、律师事务所和国家档案馆都至少将部分数据备份保存在磁带上。

However, tape drive systems, depending on their complexity, can have high initial set-up costs. They are not currently produced, sold, or advertised to individual consumers. Depending on the exact hardware, writing to tape drives can be slow—slow enough that updating them incorrectly can even threaten the integrity of the data being stored. The drives that write to tapes are not designed to be regularly moved. The tapes themselves are primarily designed to sit in the perfect space you made for them. They are vulnerable to anything that might threaten the physical site within which they are contained. They are also almost entirely meant for use in a singular system and cannot claim the advantage of the naturally occuring decentralizations that follow when products are developed for consumer use.
然而,磁带驱动器系统的初始设置成本可能很高,这取决于其复杂程度。它们目前并未面向个人消费者生产、销售或推广。根据硬件的不同,写入磁带驱动器可能很慢——慢到错误更新甚至会威胁到所存储数据的完整性。写入磁带的驱动器并非设计为定期移动。磁带本身主要设计用于放置在其为其准备的理想空间中。它们容易受到任何可能威胁其所在物理场所的因素的影响。它们也几乎完全是为了在单个系统中使用而设计的,无法获得面向消费者产品开发时自然产生的去中心化优势。

Judged against other available storage technologies in a vacuum, away from the organizational, financial, architectural, and social structures around them, tape storage is probably the best bet for single-site storage of a large digital collection over a few decades, as long as that collection never has to be updated. But tape’s life is still measured in decades at most, designed to be replaced by a successor system long before we reach our 100-year mark. Tape drive vendors still typically market their competitive pricing advantages on scales of only five or ten years. And tape still requires physical caretaking, space, and a watchful eye.
如果撇开周围的组织结构、财务状况、架构和社会结构,仅从其他可用存储技术的角度来看,对于在单个站点存储大型数字集合几十年(只要该集合无需更新)而言,磁带存储可能是最佳选择。但磁带的使用寿命最多也只有几十年,其设计是在我们达到 100 年大关之前很久就被后继系统取代。磁带驱动器厂商通常仍然只在五年或十年的时间尺度上宣传其具有竞争力的价格优势。而且磁带仍然需要物理维护、空间和密切关注。

While more expensive tape systems allow for fast data retrieval times, the very fact that they are not easily accessible is one of their trademarks, providing the security of an air gap against digital threats. This lack of accessibility is both a strength and a weakness. As previously discussed, the immediate and instant accessibility of one’s holdings can be a positive factor in preservation over a century scale. Any solution best designed for a hidden bunker, a place to be kept from the world, from the public, from society, runs the risk of not being able to inspire the social, political, and financial conditions to ensure proper care and maintenance.
尽管更昂贵的磁带系统允许快速的数据检索,但其不易访问性恰恰是其一大特点,提供了抵御数字威胁的“空隙”安全保障。这种不易访问性既是优势也是劣势。如前所述,能够立即访问自己的资产,对于跨世纪的保存来说是一个积极因素。任何最适合隐藏式掩体(远离世界、公众和社会的地方)的方案,都存在无法激发社会、政治和经济条件以确保妥善保管和维护的风险。

We have grouped analog and digital removable storage methods, some meant for mass consumption and some meant for one-off institutional storage, all within the same analysis because they share the possibility for both mobility and replication detached from the systems that read them. It is absolutely possible to encode Google Chrome onto a vinyl record, just as we can encode the thousand-year-old musical compositions of Hildegard of Bingen in digital file formats, and just as the Voyager Golden Record encoded visual images of Earth onto its grooves.
我们将模拟和数字的可移动存储方法(一些面向大众消费,一些面向一次性机构存储)都纳入同一分析框架,因为它们都具有独立于读取系统的移动性和复制能力。将谷歌浏览器编码到黑胶唱片上完全可行,就像我们可以将希尔德加德·冯·宾根一千年前的音乐作品编码成数字文件格式一样,也就像旅行者金唱片将其刻录了地球的视觉图像一样。

Let’s imagine a world where six months from now, all digital music somehow flashes out of existence. Spotify, Apple Music, Tidal, and every other digital music service lose their holdings. Every hard drive in the world with music on it is erased. The record label backups are gone. Let’s pretend a digital copy of Sabrina Carpenter’s album Short n’ Sweet accidentally survives on both a single tape drive in a library, and on the thousands of records, tapes, and CDs sold to fans. Even as products of the streaming era, where physical copies represent a mere fraction of how people are listening to music, if you asked me to bet on which would make it a century in this imaginary scenario, I’d bet on the records, tapes, and CDs—and the fans, their heirs, and successor fans. I would bet on whatever Discogs-esque service exists in 2124.
想象一下,六个月后,所有数字音乐都消失了。Spotify、Apple Music、Tidal 以及所有其他数字音乐服务都失去了它们的资源。世界上所有存储音乐的硬盘都被清空了。唱片公司的备份也不见了。假设萨布丽娜·卡彭特(Sabrina Carpenter)的专辑《Short n’ Sweet》的数字副本偶然保存在图书馆的一台磁带驱动器上,以及数千张卖给歌迷的唱片、磁带和 CD 上。即使是在流媒体时代的产品,实体拷贝只占人们收听音乐方式的一小部分,但如果你让我在这个虚构的场景中押注哪种音乐形式能够存活一个世纪,我会押注唱片、磁带和 CD——以及歌迷、他们的继承人和后继歌迷。我会押注 2124 年仍然存在的类似 Discogs 的服务。

Make It Physical: Print and Rock
物理化:印刷与摇滚

If this project were called “Millennia-Scale Storage” the historical record would suggest two particularly successful methods for ensuring the survival of written and visual media—carving in stone or inscribing on a clay tablet.
如果这个项目被称为“千年尺度存储”,历史记录将表明两种确保书面和视觉媒体存续的特别成功的方法——石刻或泥板铭刻。

Humans painted, engraved, and shaped stone for tens of thousands of years before anything resembling civilization or written history. Many of these works endure, only ceasing to exist through violent modification or annihilation, whether by a human being or natural force.
在任何类似文明或文字历史出现之前,人类就已经在数万年间绘画、雕刻和塑造石头。许多这些作品都得以保存至今,只有通过人为或自然力量的暴力破坏或毁灭才会消失。

Thousands of years later, starting as early as 9000 BCE with simple small counting tokens and reaching widespread adoption around 3000 BCE, ancient Sumerians, Babylonians, Minoans, Mycenaeans, and Hittites wrote with sharp styluses on pieces of wet clay, which were then left to dry in the sun or baked in a kiln. These practices spread relatively quickly. We have over half a million of these tablets today, and the number of survivors continues to go up with new recoveries and excavations. These Bronze and Iron Age writings are in extraordinary shape, resistant to many would-be means of destruction. They encompass government archives, commercial documents, lists of battles, receipts, letters, debates, hymns, essays, laws, stories, mathematical theorems, recipes, and medical texts. The collection includes the now- internet-famous “complaint tablet to Ea-nāṣir,” a customer complaint (regarding the substandard quality of already-paid-for copper ingots) that has survived over 3,700 years. We have 1,800 years of astronomical records, the Epic of Gilgamesh, and 382 diplomatic letters from Akhenaten, the pharaoh of Egypt, to neighboring major powers. The clay tablets reach across an unfathomable stretch of time with an almost astonishing ease.
早在公元前 9000 年,苏美尔人、巴比伦人、米诺斯人、迈锡尼人和赫梯人就开始使用简单的计数符号,到公元前 3000 年左右,这种文字广泛应用。他们用锋利的笔在湿粘土板上书写,然后让其在阳光下晾干或在窑中烧制。这种做法传播得很快。如今,我们拥有超过五十万块这样的泥板,而且随着新的发现和挖掘,存留数量还在不断增加。这些青铜器时代和铁器时代的文字保存得非常好,能够抵抗许多潜在的破坏方式。它们涵盖了政府档案、商业文件、战役清单、收据、信件、辩论、赞美诗、论文、法律、故事、数学定理、食谱和医学文献。其中包括如今在互联网上广为流传的“向埃阿-纳西尔的投诉泥板”,这是一封关于劣质铜锭(已付款)的客户投诉,至今已有 3700 多年的历史。我们拥有 1800 年的天文记录、《吉尔伽美什史诗》以及埃及法老阿肯那顿写给周边主要强国的 382 封外交信函。这些泥板跨越了难以想象的漫长岁月,其保存之完好令人惊叹。

A brief look at historical sites around the world demonstrates the possibilities of not only physicalizing treasured digital archives, but of turning them into architecture, or using the architecture that surrounds them in ways that can enhance the possibility of century-scale survival. Computers are made of rocks, after all. Maybe we should be reversing the process and writing source code in stone, engraving our most important functions into walls.
简要考察世界各地的历史遗址,可以看出,不仅可以将珍贵的数字档案实体化,还可以将其转化为建筑,或者利用周围的建筑来增强其百年甚至更长时间保存的可能性。毕竟,计算机也是由石头制成的。也许我们应该反其道而行之,用石头编写源代码,将我们最重要的功能刻在墙上。

The reasons for us to not resurrect these processes are obvious. Both stone inscription and clay tablet writing are inordinately time-intensive, slow, immutable, limited, and unbearably heavy. Even laser engraving on standardized surfaces costs hundreds of dollars for relatively short phrases, not to mention the cost of the stone itself. The current typical cost of a headstone, which usually only mentions a name and some dates, is $1,000–$3,000. To encode all 4.7 billion words of English Wikipedia (as of October 20, 2024) into stone would require tens of thousands of people and tens of billions of dollars. Any correction or update could force a restart. To encode digital images, one would not only have to include billions of characters of code, but instructions to build a computer, operating system, and software capable of interpreting that massive set of encoded characters, and then input them without making any errors, a task that could take human teams centuries itself.
我们不恢复这些流程的原因显而易见。无论是石刻还是泥板书写都极其耗时、缓慢、不可更改、信息量有限且笨重不堪。即使是在标准表面进行激光雕刻,相对较短的语句也要花费数百美元,更不用说石材本身的成本了。目前墓碑的典型成本(通常只刻姓名和一些日期)为 1000 至 3000 美元。要将截至 2024 年 10 月 20 日的 47 亿字的英文维基百科全部刻在石头上,需要数万人和数百亿美元。任何更正或更新都可能迫使重新开始。要编码数字图像,不仅要包含数十亿个字符的代码,还要包含构建能够解释这套庞大编码字符的计算机、操作系统和软件的指令,然后才能无误地输入这些指令,这项任务本身就可能需要人类团队花费数个世纪。

Monuments are a target that require protection. Every war is a war on architecture. Physical monuments, particularly those that are contemporary, don’t always enjoy the benefits of national and cultural security. Without the constant guard and competence of a security force, the heaviest stone is easily destroyable, even by forces without access to advanced weaponry. The Georgia Guidestones were a mysterious 19-foot-tall, 118-ton granite monument created in 1980 by anonymous private citizens that espoused a set of ideological precepts in eight languages to guide humanity post-apocalypse. For 808 words, their construction cost over $400,000 in today’s dollars. They were bombed and destroyed by unknown vandals in July of 2022. Intended for a far-flung future, they only lasted 42 years.
纪念碑是需要保护的目标。每一次战争都是对建筑的战争。实体纪念碑,特别是当代的纪念碑,并不总是能享受到国家和文化安全的益处。如果没有安全部队的持续守护和能力,即使是最重的石头也容易被破坏,即使是那些没有先进武器的势力也能做到。佐治亚巨石阵是一个神秘的 19 英尺高、118 吨重的花岗岩纪念碑,由匿名的私人公民于 1980 年建造,它用八种语言阐述了一套意识形态原则,以指导人类在世界末日后的生活。808 个字的铭文,其建造成本按今天的美元计算超过 40 万美元。它在 2022 年 7 月被身份不明的破坏者炸毁。它本意是为遥远的未来而建,却只存在了 42 年。

The Georgia Guidestones in 2020

The Georgia Guidestones
佐治亚巨石阵

in 2020   2020 年

Public domain   公共领域

More modern methods for preserving the written word are compelling in their own way. My bookshelves contain plenty of works over 100 years old: Sumerian texts telling the story of the ancient Mesopotamian goddess Inanna, the works of Homer, Sappho, and Euripides, of Marco Polo, Miguel de Cervantes, and Mary Shelley. These attained century-scale storage by being printed and reprinted, copied, protected, translated, and adapted into new formats, with these burdens distributed over centuries of caretakers. Even as individual objects, books can last quite a while when stored and cared for in the proper conditions. I have a few dozen that comfortably pass the hundred-year mark. The oldest book I own is a volume of Cicero’s Orator printed in Venice in 1554. The specific object is not valuable or important (I bought it for £40 in London in 2014), and yet it has traveled through space and time, conquering centuries, families, and continents without any evidence of having entered the care of a professional archivist or institution. It is still a functional book. The pages turn. The pale vellum binding stands straight. It survives.
保留文字的更现代方法也有其自身的魅力。我的书架上摆满了许多超过百年历史的著作:讲述古代美索不达米亚女神伊南娜故事的苏美尔文本,荷马、萨福、欧里庇得斯、马可·波罗、米格尔·德·塞万提斯和玛丽·雪莱的作品。这些作品通过印刷、重印、复制、保护、翻译和改编成新的形式,得以保存数百年之久,其维护的责任也分散在几个世纪的保管者身上。即使作为单独的物件,书籍在适当的条件下储存和保养也能保存相当长的时间。我有一些书轻松地超过了百年历史。我拥有的最古老的书是 1554 年在威尼斯印刷的西塞罗的《演说家》。这件具体的物件并不值钱或重要(我在 2014 年伦敦以 40 英镑的价格买下它),然而它穿越了时空,征服了几个世纪、几代人和几大洲,没有任何证据表明它曾进入专业档案管理员或机构的保管。它仍然是一本功能完好的书。书页可以翻动。淡黄色的羊皮纸装订依然挺括。它依然存在。

This is supposed to be a piece about the best options for making digital storage last a century or more, with an implied focus on fancy storage technologies and novel archival schemes. But even today, if our goal is storing information for a century, we should not underrate the power of print.
这篇文章原本应该讨论让数字存储持续一个世纪甚至更久最佳方案,重点是高端存储技术和新型档案方案。但即使在今天,如果我们的目标是保存一个世纪的信息,也不应低估印刷的力量。

Print books in physical codex form naturally decentralize to an extreme degree, finding their way into not only institutions but the collections of individuals. They are like plants whose seeds have adapted to float and spread in the wind, engineered to end up exactly where you would least suspect, and persist. Even small self-published print runs of artist zines can end up in the Museum of Modern Art or the National Archives.
纸质书籍以传统线装书的形式存在,天然地实现了极高的去中心化,不仅进入机构,也进入个人收藏。它们就像种子适应了随风飘散的植物,其传播方式巧妙,最终会出现在你意想不到的地方,并持久留存。即使是小规模自印的艺术家小刊物,也可能最终出现在现代艺术博物馆或国家档案馆。

Unlike digital storage, the survival of print requires physical libraries, even if those libraries are shelves in an individual’s home. It requires that those libraries be protected from fire, water, and pests like silverfish. While the cotton- and linen-infused long-fiber paper of centuries past is remarkably sturdy and robust, mechanical wood-pulp paper has shorter fibers, and is susceptible to acidification as it ages.
与数字存储不同,纸质书籍的保存需要实体图书馆,即使这些图书馆只是个人家中的书架。这需要这些图书馆受到保护,免受火灾、水灾和衣鱼等害虫的侵害。虽然几个世纪以前的棉麻长纤维纸非常坚固耐用,但机械木浆纸纤维较短,随着时间的推移容易酸化。

A gridded scene depicting a hunt

Les Singuliers et Nouveaux Portraicts by Federico de Vinciolo, 1588
费德里科·德·文奇奥洛著《奇特与新肖像》,1588 年

Public domain, part of The Met's Open Access Initiative
公共领域,大都会艺术博物馆开放获取倡议的一部分

The advantage of print is that it can be a practice. What was printed before can be reprinted. The downside is that, in order to take advantage of the full preservational powers of the codex form, what you are saving and printing has to be valued by the public. The printing press is a creature of the market, built to replicate based on demand. Still, just as with physical copies of music and film, the multiplicative scale of even small print runs dwarfs what you see with most digital backup methods. Even small independent publishers of edgy literature routinely print 5,000 books as a starting point for a print run. Tiny poetry presses regularly print 1,000–2,000 copies of a collection. In our current streaming era, how often do thousands of people voluntarily keep the exact same stored digital object on their own hard drive? Increasingly few. Physical libraries and readers accomplish this daily, to great effect.
印刷的优势在于它可以成为一种实践。之前印刷的内容可以再次印刷。缺点是,为了充分利用卷轴形式的保存能力,你所保存和印刷的内容必须得到公众的认可。印刷机是市场的产物,其复制是基于需求的。然而,就像音乐和电影的实体副本一样,即使是小规模印刷的数量级也远远超过大多数数字备份方法。即使是边缘文学的小型独立出版社,起初的印刷量也通常为 5000 本。小型诗歌出版社经常印刷 1000-2000 本诗集。在我们目前的流媒体时代,有多少人会自愿将完全相同的数字对象存储在自己的硬盘上?越来越少了。实体图书馆和读者每天都在做到这一点,而且效果显著。

The issue with books is their number. There are already a lot of them. The volume of books runs up against the human capacity to care for them. In 2010, Google tried to calculate the total number of books ever written and published and arrived at 129,864,880.
书籍的问题在于数量众多。书籍的数量已经很多了。书籍的数量超过了人类所能承受的范围。2010 年,谷歌试图计算曾经编写和出版的书籍总数,结果为 129,864,880 本。
Citation: Google Book Search  谷歌图书搜索 Estimates vary, but each year somewhere between one and four million more are published. As of 2022, the Library of Congress, the most well-funded library on Earth, has only 25 million books. Given the physical space storing print requires, the scope of human publishing necessitates curation and culling. Millions of volumes of text can be stored digitally on a hard drive the size of your fingertip, volumes that in physical form would require multiple buildings. The greatest challenge to the century-scale storage potential of the print codex is that once a book is 100 years old, there is no guarantee anyone will care enough about what lies within it to take on the demands of its care.
估计每年新增出版物在 100 万到 400 万册之间,数据有所出入。截至 2022 年,世界上资金最雄厚的图书馆——美国国会图书馆,也只有 2500 万册藏书。考虑到纸质书籍的存储空间需求,人类出版的规模使得文献的整理和筛选成为必要。数百万册的文本资料可以存储在一个指尖大小的硬盘上,而这些资料如果以纸质形式存在,则需要多栋建筑才能容纳。印刷书籍在百年尺度上的保存面临的最大挑战是:一旦一本图书超过百年历史,就无法保证有人会足够关心书中的内容,从而承担起保存它的责任。

Dispersal  散布

One solution to century scale storage is to scatter your holdings, to put copy after copy all over the world, so that no disaster, war, or sudden loss of funding could ever threaten a digital collection’s survival. Right now, the internet and computation are not decentralized. As Janus Kopfstein noted in the New Yorker
解决世纪规模存储问题的一个方案是分散你的资产,在世界各地复制备份,这样任何灾难、战争或突然的资金短缺都不会威胁到数字收藏的生存。目前,互联网和计算并非去中心化的。正如 Janus Kopfstein 在 2013 年《纽约客》杂志上指出的那样,
Citation: The Mission to Decentralize
分散化使命

the Internet  互联网

(The New Yorker, 2013)
(《纽约客》,2013)
in 2013, “a staggering percentage of communications flow through a small set of corporations—and thus, under the profound influence of those companies and other institutions.” This concentration has only accelerated in the intervening decade. Our access to the internet is controlled by telecommunications firms that openly employ anticompetitive practices without serious recourse, often avoiding each other’s turf
“令人震惊的是,大量的通信流经少数几家公司——因此,受到这些公司和其他机构的深刻影响。”在过去的十年里,这种集中趋势进一步加剧。我们对互联网的访问受到电信公司的控制,这些公司公开采用反竞争行为,却鲜有受到有效的制约,而且常常避免彼此的势力范围。
Citation: Report: Most Americans Have No Real Choice in Internet Providers (Institute for Local Self-Reliance, 2020)
报告:大多数美国人在互联网服务提供商方面没有真正的选择(地方自治研究所,2020 年)
because direct competition would limit their rent-seeking and profits. Only a handful of computer operating systems have anything approaching widespread adoption. Chips, graphics cards, and yes, hard drives, are made by a relatively small number of companies, and this is even more true of the parts that comprise them. AMD, Apple, ARM, Broadcom, Marvell, MediaTek, Qualcomm, and Nvidia are all semiconductor customers of Taiwan Semiconductor Manufacturing Company. This year, U.S. Commerce Secretary Gina Raimondo said
因为直接竞争会限制他们的寻租和利润。只有少数几种计算机操作系统获得了广泛的采用。芯片、显卡,是的,硬盘,都是由相对较少的公司生产的,而构成它们的零部件更是如此。AMD、苹果、ARM、博通、Marvell、联发科、高通和英伟达都是台湾积体电路制造公司的半导体客户。今年,美国商务部长吉娜·雷蒙多表示
Citation: US official says Chinese seizure of TSMC in Taiwan would be 'absolutely devastating' (Reuters, 2024)
美国官员表示,中国占领台湾台积电将是“绝对毁灭性的”(路透社,2024 年)
that the United States buys 92% of its chips from TSMC.
美国 92%的芯片都来自台积电。

Despite occasional promises, small head nods, and paeans to the contrary, major firms have not been converting their products to open protocols. iMessages are still blue and everyone else is green. Meta products are only interoperable with other Meta products. X, née Twitter, is not interoperable with anything. Attempts to re-decentralize the internet, like the self-hosting platform arkOS,
尽管偶尔会有承诺、点头示意和相反的颂歌,但大型公司并没有将其产品转换为开放协议。iMessages 仍然是蓝色的,其他所有都是绿色的。Meta 产品只能与其他 Meta 产品互操作。X(原 Twitter)与任何东西都不互操作。尝试重新去中心化互联网的尝试,例如自托管平台 arkOS,
Citation: Sunset (arkOS, 2017)   落日 (arkOS,2017) have regularly run out of resources and been discontinued.
经常资源耗尽而被停止使用。

Still, some attempts at decentralization have been more successful. LOCKSS (Lots of Copies Keep Stuff Safe) is a digital preservation strategy, protocol, and software developed by Victoria Reich and David Rosenthal in 1999 at Stanford Libraries. Rosenthal has spent decades working on and writing about the possibilities and pitfalls in long-term digital storage (this piece would absolutely not exist without his work).
尽管如此,一些去中心化的尝试还是比较成功的。LOCKSS(大量副本确保内容安全)是一种数字保存策略、协议和软件,由维多利亚·赖希和戴维·罗森塔尔于 1999 年在斯坦福大学图书馆开发。罗森塔尔几十年来一直致力于研究和撰写关于长期数字存储的可能性和陷阱(如果没有他的工作,这篇文章根本不可能存在)。
Citation: Keeping Bits Safe:   保护比特安全:
How Hard Can It Be?
这能有多难?

(ACM Queue, 2010)   (ACM 队列,2010)
For LOCKSS, multiple copies of academic journals are stored across a distributed network; each copy in the system periodically checks itself against other copies for damage and discrepancies, a process of polling and repair. They whisper to each other, sharing checksums, ensuring that their copies remain uncorrupted. If a node detects a discrepancy, an injury, it sends out a silent SOS, and another node, a digital Samaritan, comes to its aid, offering a pristine copy to heal the wound. But each node is autonomous, individually responsible for tending to its own copies and paying its subscriptions.
对于 LOCKSS,学术期刊的多个副本存储在一个分布式网络中;系统中的每个副本都会定期检查自身与其他副本是否存在损坏和差异,这是一个轮询和修复的过程。它们彼此“窃窃私语”,共享校验和,确保其副本保持完整无损。如果一个节点检测到差异,即损伤,它会发出无声的求救信号,另一个节点,一个数字“好撒玛利亚人”,就会前来援助,提供一个原始副本来修复损伤。但每个节点都是自主的,各自负责维护自己的副本并支付订阅费用。

What LOCKSS represents is an attempt to put a decentralized and reliable storage system within the control of a community. The result is a network of 80 research and public libraries sharing
LOCKSS 的意义在于尝试建立一个由社区控制的、分散且可靠的存储系统。其结果是一个由 80 个科研和公共图书馆组成的网络,共同
Citation: LOCKSS Program  LOCKSS 项目 “custody of the scholarly record on library-owned storage, not in the cloud.” Members of LOCKSS pay a fee based on their budgets, an attempt to spread the financial burden beyond one institution.
“保管图书馆自有存储设备中的学术记录,而不是云存储。” LOCKSS 成员根据自身预算支付费用,试图将财务负担分散到多个机构。

LOCKSS is careful. It is narrowly tailored, built to respect copyright holders and institutional administrations. It is limited to 13,200 journals and 23,600 books under a set of labored agreements with publishers. The growth of the system is constrained by its strict enforcement of intellectual property rights and the inherent costs of maintaining a place within the network. These costs are ongoing and persistent, and unlike specific individual works that eventually enter the public domain, the current regime of academic publishers of journals and periodicals intends to continually publish new material to be held under copyright, and charge for it, forever. That LOCKSS achieved its scale in the current copyright regime is unusual.
LOCKSS 系统非常谨慎。它设计精巧,旨在尊重版权持有者和机构管理部门。它仅限于根据与出版商签订的一系列严格协议,收录 13200 种期刊和 23600 本书。该系统的增长受到其严格执行知识产权和维持网络内地位的固有成本的限制。这些成本持续不断,与最终进入公共领域的特定作品不同,学术期刊和期刊的现行出版制度意图持续出版新的受版权保护的材料,并永久收费。LOCKSS 系统在现行版权制度下取得目前的规模实属罕见。

The fundamental idea of LOCKSS—mutual decentralized stewardship—recalls much earlier forms of online file-sharing. As soon as computers could talk to each other, people used platforms like bulletin board systems, Usenet, and IRC to share data with all those who were connected. For a brief moment, in the late 1990s and 2000s, file-sharing (and later, torrent systems) spread massive amounts of music and video files with impunity despite limited bandwidth. Such structures, of course, allowed intellectual property rights to be ignored with abandon. Copies of goods that were contemporaneously being sold could be quickly acquired for free. And there were other problems. Malware was rampant, the time-intensive burden of collection management was shifted onto individuals, and what was available was completely dependent on the whims of the uploaders. Most significantly, these platforms were all run on centralized servers, which meant that once courts and states attacked them, they vanished. If decentralized storage was to have a future, it would have to be one where any user could not access the files of any other user, where collections were walled.
LOCKSS 的基本理念——相互分散的管理——让人想起早期形式的在线文件共享。自从计算机能够相互通信以来,人们就开始使用公告板系统、Usenet 和 IRC 等平台与所有连接的用户共享数据。在 20 世纪 90 年代末和 21 世纪初的短暂时间里,尽管带宽有限,但文件共享(以及后来的 torrent 系统)却肆无忌惮地传播了海量的音乐和视频文件。当然,这种结构允许知识产权被随意忽略。正在同时销售的商品副本可以免费快速获取。此外还存在其他问题。恶意软件猖獗,收集管理的费时负担转移到了个人身上,可用内容完全取决于上传者的意愿。最重要的是,这些平台都运行在集中式服务器上,这意味着一旦法院和国家对其进行打击,它们就会消失。如果去中心化存储要拥有未来,它就必须是一个任何用户都无法访问任何其他用户文件的系统,其中集合是被隔离的。

The progeny of these platforms still exist, and in some cases, thrive, though they are no longer a dominant means of distributing media. Sci-Hub, Library Genesis, and Z-Library offer academic journal articles for free to anyone who wants to download them, flouting intellectual property laws and invoking the right to science and culture under Article 27 of the Universal Declaration of Human Rights. These platforms are, in effect, an illegal, decentralized mirror of initiatives like LOCKSS, piracy that also functions as an insurance policy in the case of a future global meltdown. The singular and well-defined missions of these efforts help make them popular and contribute to their survival, despite their murky legal status and the vast powers arrayed against them. Their tremendous narrative strength—promoting universalist causes against overwhelming odds, not to mention the mythical appeal of being outlaws—has resulted in fierce protection from fans and has incentivized care and stewardship from this loyal community.
这些平台的后继者仍然存在,有些甚至蓬勃发展,尽管它们已不再是主要的媒体分发方式。Sci-Hub、Library Genesis 和 Z-Library 向任何想要下载的人免费提供学术期刊文章,公然违反知识产权法,并援引《世界人权宣言》第 27 条中关于科学和文化权利的规定。这些平台实际上是 LOCKSS 等倡议的非法、分散式镜像,是一种盗版行为,同时也是未来全球崩溃情况下的保险政策。这些努力目标明确,这使得它们广受欢迎并有助于其生存,尽管它们法律地位不明确,并且面临着强大的力量对其进行打压。它们强大的叙事力量——在逆境中捍卫普遍主义事业,更不用说作为“亡命之徒”的神话般吸引力——赢得了粉丝的强烈保护,并激励了忠实社区的呵护和维护。

It’s worth considering the efficacy of piracy and the intentional breaking of intellectual property law as a long-term preservation tactic. Abigail De Kosnik, a professor in the Berkeley Center for New Media, contends
值得考虑将盗版和故意违反知识产权法作为一种长期保存策略的有效性。伯克利新媒体中心教授阿比盖尔·德·科斯尼克认为,
Citation: Piracy Is the Future of Culture (Abigail De Kosnik, 2019)
盗版是文化的未来(阿比盖尔·德·科斯尼克,2019)
that, given the nature of digital cultural output and the failures of the current corporate and institutional orders to properly care for them, piracy-based media preservation efforts are more likely to survive catastrophic future events than traditional institutions. On the other hand, as the notorious prosecution of Aaron Swartz or the legal cases against the Internet Archive demonstrate, engaging in copyright infringement at scale runs the constant risk of sanction and shutdown from state actors.
鉴于数字文化产品的性质以及当前企业和机构未能妥善保存它们的现状,基于盗版的媒体保存工作比传统机构更有可能在未来的灾难性事件中幸存下来。另一方面,正如臭名昭著的亚伦·斯沃茨案或针对互联网档案馆的法律案件所表明的那样,大规模侵犯版权行为始终面临着来自国家行为者的制裁和关闭的风险。

Fully decentralized systems present a more elusive target for such actions. The InterPlanetary File System is a decentralized protocol designed to create a peer-to-peer network for storing and sharing files in a distributed structure. Instead of relying on centralized servers, IPFS uses content addressing, where each file is identified by a unique hash derived from its contents. This allows a specific file to be retrieved from any node in the network that stores the corresponding hash, all shared on a global network built to recognize and communicate multiple instances of the same file. IPFS functions like one giant torrent but allows users to download or seed only a part of the whole.
完全去中心化的系统对于此类行为而言,是一个更难以捉摸的目标。星际文件系统 (IPFS) 是一种去中心化协议,旨在创建一个点对点网络,用于以分布式结构存储和共享文件。IPFS 不依赖于中心化服务器,而是使用内容寻址,其中每个文件都由其内容生成的唯一哈希值标识。这允许从存储相应哈希值的网络中的任何节点检索特定文件,所有这些文件都共享在一个能够识别和通信同一文件的多个实例的全球网络上。IPFS 的功能就像一个巨大的 torrent,但允许用户仅下载或分发一部分内容。

In 2017, when Turkish courts banned Wikipedia, IPFS allowed an entire copy to be distributed to bypass the ban. After crackdowns on the aforementioned Library Genesis and Z-Library, their holdings were migrated to IPFS. The long-term success of holdings stored with IPFS are dependent on the digital archival practices of each individual participant, and reliant on a level of participation that can be, as any open-source developer can tell you, fragile.
2017 年,当土耳其法院禁止维基百科时,IPFS 允许分发整个副本以绕过禁令。在对上述图书馆创世纪 (Library Genesis) 和 Z-Library 进行镇压后,其内容已迁移到 IPFS。存储在 IPFS 上的内容的长期成功取决于每个参与者的数字存档实践,并依赖于参与程度,正如任何开源开发者都能告诉你的那样,这种参与程度是脆弱的。

In 2017 Protocol Labs, the developers of IPFS, launched Filecoin, a cryptocurrency-based digital storage system partially based on the IPFS architecture. It attempts to incentivize participation by compensating those who provide storage space with a cryptocurrency. Filecoin is not alone. Arweave, Storj, Sia, BitTorrent Token, and Safecoin are all variations on the same theme, new attempts at an older dream: creating a market system that can connect all the unused digital storage scattered about the planet to those who might need it. We have always had vast surpluses of unused digital storage space and no viable marketplace to harness this excess, which could allow those with extra storage to profit and give those looking to store access to a cheap distributed market.
2017 年,IPFS 的开发者 Protocol Labs 推出了 Filecoin,这是一个基于加密货币的数字存储系统,部分基于 IPFS 架构。它试图通过用加密货币补偿提供存储空间的用户来激励参与。Filecoin 并非孤例。Arweave、Storj、Sia、BitTorrent Token 和 Safecoin 都属于同一主题的不同变体,它们都是对一个古老梦想的新尝试:创建一个市场系统,将散布在全球各地所有未使用的数字存储空间与可能需要它们的用户连接起来。我们一直拥有大量的未使用数字存储空间,但却缺乏可行的市场来利用这些剩余资源,这本可以使拥有额外存储空间的用户获利,并为寻求存储的用户提供一个廉价的分布式市场。

Fully blockchain-based systems, where each new piece of data gets added to the end of a chain that is then replicated in every instance and every node, have a complete “persistence mechanism.” The entire record of data is stored in an immutable, decentralized ledger across multiple nodes, ensuring transparency but consuming significant storage due to the entire transaction history existing at every point. Because this mechanism is not viable for the storage of any large amounts of data, most cryptocurrency-based storage solutions rely on contract-based persistence mechanisms, often used in so-called “smart contracts,” and thus store only essential data directly within the contract. This approach avoids the replication of the entire blockchain history.
完全基于区块链的系统,其中每条新数据都会添加到链的末尾,然后在每个实例和每个节点中复制,具有完整的“持久性机制”。整个数据记录存储在分布在多个节点上的不可变的去中心化账本中,确保了透明性,但由于每个节点都存储了完整的交易历史,因此会消耗大量的存储空间。由于这种机制不适用于存储大量数据,大多数基于加密货币的存储解决方案都依赖于基于合约的持久性机制(通常用于所谓的“智能合约”),因此只将必要的数据直接存储在合约中。这种方法避免了复制整个区块链历史。

Coin-based storage systems work by incentivizing users to store the data entrusted to them. They are designed to constantly verify that storage providers are storing an unaltered, undamaged copy of that which has been entrusted to them, and confirm that that storage is continuing over time. To Filecoin’s credit, rather than walling off their system, they offer storage solutions compatible with Amazon S3. They appear to be genuinely interested in storing archival data and working with archival and educational institutions. Their associated charitable organization, the Filecoin Foundation for the Decentralized Web, provided financial support to the Library Innovation Lab at Harvard Law School, allowing for the creation of this piece.
基于代币的存储系统通过激励用户存储委托给他们的数据来运作。这些系统旨在持续验证存储提供商是否正在存储委托给他们的数据的未经修改、未损坏的副本,并确认该存储正在持续进行。值得称赞的是,Filecoin 并没有封闭其系统,而是提供了与 Amazon S3 兼容的存储解决方案。他们似乎真正在乎存储归档数据,并致力于与档案馆和教育机构合作。他们相关的慈善组织——去中心化网络 Filecoin 基金会——为哈佛法学院图书馆创新实验室提供了资金支持,促成了本文的创作。

In 2018, digital preservationist and LOCKSS co-founder David Rosenthal argued
2018 年,数字保存专家、LOCKSS 联合创始人大卫·罗森塔尔认为
Citation: The Four Most Expensive Words in the English Language
英语中最昂贵的四个词

(David Rosenthal, 2018)   (大卫·罗森塔尔,2018)
that cryptocurrency-based decentralized storage networks will never catch up to centralized cloud storage offerings on reliability, price, speed, and access terms. The need for encryption of all storage assets for security reasons, the lack of stable pricing, and the constant need for storage market liquidity all create potential long-term issues. Lastly, as Rosenthal also points out, if contributing storage space to these services does become profitable, there is a profound risk of centralization of storage providers: If providing storage generates revenue, that revenue will centralize because it is incentivized to centralize, just like other supposedly decentralized offerings in an unregulated market context. The untested legal status of these systems also poses potential problems. Storing copies of copyrighted intellectual property could lead to problems within the market itself if providers in certain jurisdictions are legally forced to delete data they were contracted to store.
基于加密货币的分散式存储网络在可靠性、价格、速度和访问条件方面永远无法赶上中心化的云存储服务。出于安全原因,所有存储资产都需要加密,缺乏稳定的价格,以及对存储市场流动性的持续需求,都会造成潜在的长期问题。最后,正如 Rosenthal 也指出的那样,如果为这些服务贡献存储空间变得有利可图,那么存储提供商就会面临严重的中心化风险:如果提供存储能产生收入,那么这种收入就会集中,因为它有激励机制使其集中,就像非监管市场环境中其他所谓的去中心化产品一样。这些系统的法律地位尚未经检验,也存在潜在问题。如果某些司法管辖区的提供商被迫删除他们被合同约定存储的数据,那么存储版权受保护的知识产权副本可能会导致市场本身出现问题。
Citation: Document how removal of data for legal reasons (Github/Filecoin, 2018)
论述因法律原因删除数据的方式(Github/Filecoin,2018)
None of these schemes have so far proven that they can function, let alone thrive, as functional viable marketplaces for a sustained period of time, nor that they can reliably incentivize storage in times of strife or scarcity. Since the development of large-scale trading civilizations, no region on Earth has seen a century pass without a significant economic crisis, shock, or shortage. On the century scale, these events can be severe, capable of toppling regimes, destroying nation-states, and sparking conflicts that lead to deaths measured in the millions. To directly peg an archival storage method to a market system with stakeholders that feed on volatility is equivalent to burying your hard drives in a 100-year flood zone. If a cryptocurrency-backed decentralized storage solution is going to be viable in the long term for cultural and intellectual institutions and collectors—organizations and individuals that tend to have extremely sensitive budgetary practices—they have to find a way to limit and mitigate the effect of these shocks on their pricing.
到目前为止,这些方案都没有证明它们能够作为一个功能完善的市场持续运行,更不用说蓬勃发展了,也没有证明它们能够在冲突或短缺时期可靠地激励存储。自从大型贸易文明发展以来,地球上没有任何一个地区能够在一个世纪内没有经历重大的经济危机、冲击或短缺。从世纪尺度来看,这些事件可能是严重的,能够推翻政权,摧毁民族国家,并引发导致数百万人死亡的冲突。将档案存储方法直接与一个依赖于波动性的利益相关者参与的市场系统挂钩,相当于把你的硬盘埋在一个百年一遇的洪水区。如果基于加密货币的分散式存储解决方案要长期为文化和知识机构以及收藏家(这些组织和个人往往具有极其敏感的预算实践)提供可行的方案,他们必须找到一种方法来限制和减轻这些冲击对其价格的影响。

Blockchain-based implementations also run the risk of being intellectually dismissed or even politically targeted by those turned off by the speculative financial use of the technology and its tainted history of grift, fraud, rent-seeking, greed, and anti-statist techno-libertarian fantasies. I know this because I am one of those who harbor such instinctive negative reactions. Even if you believe these reactions are not fair, they are an excellent example of the human volatilities one must consider in evaluating a technology for century-scale storage. If you construct your storage scheme using a culturally and politically volatile technology, that in itself presents a risk long after you are gone. As with every other method described here, the method must be preserved along with what is being stored.
基于区块链的实现方案也面临着被那些厌恶该技术投机性金融用途及其充斥诈骗、欺诈、寻租、贪婪和反国家主义技术自由主义幻想的污点历史的人们在智识上否定甚至政治上打压的风险。我之所以知道这一点,是因为我也是那些怀有这种本能性负面反应的人之一。即使你认为这些反应并不公平,它们也很好地例证了在评估一项用于世纪级存储的技术时必须考虑的人类情绪波动。如果你使用一种文化和政治上都易变的技术来构建你的存储方案,那么这本身就构成了一个在你逝去很久以后仍然存在的风险。与这里描述的每种其他方法一样,该方法必须与被存储的内容一起保存。

The cryptocurrency and blockchain communities, and their related firms, have so far invested little in the necessary digital-preservation grunt work that might allow their protocols, and the software and hardware those protocols run on, to endure. Crypto firms have also not mounted any significant challenges to the centralized hardware and telecommunications companies that make their models possible. If they are serious about wanting truly decentralized and resilient solutions to help store human cultural memory, they should use their resources to attack, subvert, and replace the centralized telecommunications, hardware, and software behemoths that they currently rely upon. A crypto community that is serious about a decentralized internet should be in an all-out war with the Verizons, AT&Ts, Comcasts, Starlinks, and Spectrums of the world, and treating the dominance of a firm like NVIDIA with utter hostility.
到目前为止,加密货币和区块链社区及其相关公司,在必要的数字保存基础工作上的投入甚少,而这些工作可能使它们的协议以及运行这些协议的软件和硬件能够持久存在。加密公司也没有对使它们的模式成为可能的中心化硬件和电信公司提出任何重大挑战。如果他们真的想要真正去中心化和有弹性的解决方案来帮助存储人类文化记忆,他们应该利用他们的资源来攻击、颠覆和取代他们目前依赖的中心化电信、硬件和软件巨头。一个认真对待去中心化互联网的加密社区,应该与 Verizon、AT&T、Comcast、Starlink 和 Spectrum 等全球公司展开全面战争,并对 NVIDIA 等公司的支配地位表示极度敌视。

But these critiques are secondary. We can imagine an alternate decentralized storage technology that doesn’t relate to cryptocurrency at all and still arrive at the real evaluative question present here: that of centralization versus decentralization in archival practices itself.
但这些批评是次要的。我们可以想象一种与加密货币完全无关的替代性去中心化存储技术,并且仍然可以得出这里存在的真正评价问题:档案实践本身的中心化与去中心化问题。

Andrew Pettegree and Arthur der Weduwen’s The Library: A Fragile History
安德鲁·佩特格里和阿瑟·德·韦杜文合著的《图书馆:一部脆弱的历史》
Citation: The Library: A Fragile History (Andrew Pettegree & Arthur der Weduwen, 2021)
图书馆:一部脆弱的历史(安德鲁·佩特格里 & 阿瑟·德·韦杜文,2021)
opens with the anecdote of a 16th-century Dutch scholar arriving to his appointment at the Holy Roman Emperor’s library to find it in a state of utter neglect and destitution. The printing press had only been around a century, but in that short time, the greatest enemy of archives—neglect—had already struck. We can fret about all manner of dramatic disasters. Global thermonuclear war, asteroid impacts, caldera volcanoes, x-risks, Skynet, cultural revolutions, second comings, alien invasions, Malthusian crises, birthrate collapses, pandemics, solar flares, and Local Group supernovae. We can try to engineer around every variety of society-threatening catastrophe, the seas boiling and the ground rumbling and the cities burning. We can imagine how decentralization could provide security against destructive scenarios, how it would protect an archive in case of invasion, fire, bombing, and cyberattack. But none of those are what primarily kills archives. Boring human neglect kills archives.
开篇以一个十六世纪荷兰学者赴神圣罗马帝国皇帝图书馆赴任的轶事开场,发现图书馆处于完全的忽视和贫困状态。活字印刷术出现才一个世纪,但就在这短短的时间里,档案馆最大的敌人——忽视——就已经造成了打击。我们可以为各种戏剧性的灾难担忧:全球热核战争、小行星撞击、超级火山爆发、X 风险、天网、文化大革命、第二次降临、外星人入侵、马尔萨斯危机、出生率下降、大流行病、太阳耀斑和本星系群超新星爆发。我们可以尝试应对各种威胁社会安全的灾难,想象着大海沸腾、大地轰鸣、城市燃烧的情景。我们可以设想去中心化如何能够提供针对破坏性场景的安全保障,如何在入侵、火灾、轰炸和网络攻击的情况下保护档案馆。但这些都不是导致档案馆消亡的主要原因。无聊的人为忽视才是档案馆的杀手。

The most pressing question for decentralized storage services is: Can they inspire care?
对去中心化存储服务来说,最紧迫的问题是:它们能否激发人们的关注?

A library subject catalog

The subject catalog ("Schlagwortkatalog") of the University Library of Graz
格拉茨大学图书馆的主题目录(“Schlagwortkatalog”)

Photo by Dr. Marcus Gossler, licensed under GNU Free Documentation License
照片由马库斯·戈斯勒博士拍摄,采用 GNU 自由文档许可证授权

There are certainly situations where centralization has proven disastrous. In Bosnia, the National Archives in Sarajevo were seriously damaged during a series of demonstrations and riots in 2014. In 1984, the Sikh Reference Library in Amritsar, Punjab was targeted in an Indian military operation and its entire collection confiscated. It has not yet been returned and is presumed lost. The Boxer Rebellion in 1900 claimed Beijing’s Hanlin Academy library. The only known surviving manuscripts of both Beowulf and Sir Gawain and the Green Knight survived a fire at the Cottonian Library in London in 1731. Other volumes were not so lucky. History is replete with the destruction and loss of libraries and books. World War II alone destroyed or damaged millions of library-held volumes.
集中化在某些情况下确实已被证明是灾难性的。2014 年,萨拉热窝的国家档案馆在一系列示威和骚乱中严重受损。1984 年,旁遮普邦阿姆利则的锡克教参考图书馆成为印度军事行动的目标,其全部藏书被没收,至今未归还,据推测已丢失。1900 年的义和团运动摧毁了北京的翰林院图书馆。唯一已知的《贝奥武夫》和《高文爵士与绿骑士》的手稿幸免于 1731 年伦敦科顿图书馆的一场大火,但其他卷册就没那么幸运了。历史上充斥着图书馆和书籍的破坏和损失,仅第二次世界大战就摧毁或损坏了数百万册馆藏书籍。

I have been avoiding mentioning the most famous destruction of a library in history, that of the fabled Library of Alexandria, not least because the time and circumstances of its destructions (plural) are not authoritatively determined. But I would offer the impressions of Richard Ovenden,
我一直避免提及历史上最著名的图书馆毁坏事件——传说中的亚历山大图书馆的毁坏,原因至少在于其毁坏(复数)的时间和具体情况尚未得到权威认定。但我愿在此分享理查德·奥文登的观点,
Citation: The Story of the Library of Alexandria Is Mostly a Legend, But the Lesson of Its Burning Is Still Crucial Today
亚历山大图书馆的故事大多是传说,但其焚毁的教训至今仍至关重要

(Time, 2020)   (时间,2020)
author of Burning the Books: A History of the Deliberate Destruction of Knowledge,
他是《焚书:知识故意毁灭史》一书的作者,
Citation: Burning the Books:   焚书:
A History of the Deliberate Destruction of Knowledge
知识的故意毁灭史

(John Murray, 2020)   (约翰·默里,2020)
discussing Edward Gibbon’s account of the library’s fate in The History of the Decline and Fall of the Roman Empire:
论述爱德华·吉本在《罗马帝国衰亡史》中对亚历山大图书馆命运的记述:
Citation: The History of the Decline and Fall of the Roman Empire (Edward Gibbons, 1776)
罗马帝国衰亡史(爱德华·吉本,1776 年)
“For Gibbon, the Library of Alexandria was one of the great achievements of the classical world and its destruction—which he concludes was due to a long and gradual process of neglect and growing ignorance—was a symbol of the barbarity that overwhelmed the Roman Empire, allowing civilization to leach away the ancient knowledge that was being re-encountered and appreciated in his own day. The fires were major incidents in which many books were lost, but the institution of the library disappeared more gradually both through organizational neglect and through the gradual obsolescence of the papyrus scrolls themselves.”
“对吉本而言,亚历山大图书馆是古典世界的一大成就,其毁灭——他认为是长期逐渐被忽视和知识日益匮乏的结果——象征着席卷罗马帝国的野蛮行径,使文明流失了在其自身时代被重新发现和欣赏的古代知识。那些火灾是导致许多书籍遗失的重大事件,但图书馆的机构则是通过组织上的忽视和纸莎草卷轴本身的逐渐过时而更逐渐地消失的。”

If your goal in century-scale storage is avoiding kinetic, Hollywood-ready catastrophes, then decentralized solutions are ideal, but whether they can combat neglect is less clear. If a decentralized scheme wants to be successful at century scale, this is what they should and must attack.
如果你的目标是实现百年尺度的存储,并避免好莱坞式的灾难性事件,那么去中心化的解决方案是理想的,但它们能否对抗忽视的问题则不太明确。如果一个去中心化方案想要在百年尺度上取得成功,这就是它们应该且必须攻克的难题。

One of the few clear benefits of centralization is that it inspires care. If people know something is important, of value, potentially even the last of something, they tend to fight every day to protect it. The history of war, strife, and disaster is also the history of archivists, curators, artists, scientists, and passionate Samaritan bystanders saving works from impending destruction at great personal risk and sacrifice. The survivorship bias present in the human canon is merely an echo of thousands of acts of heroism.
集中化为数不多的明显优势之一是它能激发人们的责任心。如果人们知道某事很重要、有价值,甚至可能是最后一件,他们往往会每天都努力保护它。战争、冲突和灾难的历史,也是档案管理员、馆长、艺术家、科学家以及充满热情的旁观者冒着巨大的个人风险和牺牲,从即将到来的破坏中拯救作品的历史。人类典籍中存在的幸存者偏差,仅仅是数千次英雄行为的回响。

The Bibliothèque nationale de France, previously the Royal Library, has survived 16 kings, two emperors, five republics, six full-scale revolutions, the Hundred Years’ War, the French Civil War, the Italian Wars, the Thirty Years’ War, the Franco-Dutch War, the Nine Years’ War, the War of the Spanish Succession, the Seven Years’ War, the Napoleonic Wars, the Franco-Prussian War, World War I, and World War II. It has, at times, safeguarded works that had no other known caretakers. The Bibliothèque nationale de France is not an outlier nor a case of survivorship bias, as many national libraries attain century-scale storage even while withstanding violent changes to the states they serve.
法国国家图书馆,前身为皇家图书馆,经历了 16 位国王、两位皇帝、五个共和国、六次全面革命、百年战争、法国内战、意大利战争、三十年战争、法荷战争、九年战争、西班牙王位继承战争、七年战争、拿破仑战争、普法战争、第一次世界大战和第二次世界大战。它有时会保护那些没有其他已知保管人的作品。法国国家图书馆并非特例,也不是幸存者偏差的案例,因为许多国家图书馆即使在所服务国家的剧烈变革中也能实现百年规模的典藏。

A fairly large portion of human literature, science, art, and music has survived precisely because it has been relatively centralized. Despite the obvious risks of putting all one’s eggs in one basket, we should not dismiss centralization too quickly. Can we really trust the anonymous contributors to a distributed cryptocoin-backed storage service to operate with the same level of care as professional librarians in a centralized institution or obsessive individual collectors? Can we trust that in the face of a disaster, malicious government, or marauding force that they might fight to protect their holdings? Or would they instead relax in the knowledge that somewhere else there is another copy, that write-blockers, error-correcting checksums, and encryptions ensure that they are not alone? Therein lies the problem for distributed systems: What if every other node in the distributed network also assumes this security?
人类大量的文学、科学、艺术和音乐作品之所以得以保存至今,正是因为它们相对集中地被保存。尽管把所有鸡蛋放在一个篮子里显然有风险,但我们也不应过快地否定集中化。我们真的能相信那些为分布式加密货币支持的存储服务做出贡献的匿名用户,能像中心化机构的专业图书馆员或那些痴迷的个人收藏家那样认真负责吗?我们能相信,面对灾难、恶意政府或侵略势力时,他们会奋力保护自己的收藏吗?或者他们会因为知道别处还有副本,因为写入阻止器、纠错校验和和加密技术确保他们并非孤军奋战而放松警惕呢?这就是分布式系统的难题所在:如果分布式网络中的其他每个节点都认为自己很安全,那会怎样?

Over a hundred years, eventually, the havocs come. A distributed system runs the risk of overconfidence and a lack of individual responsibility. During World War II, librarians and curators in centralized institutions smuggled works directly out of the hands of the Gestapo and SS. Some refused to flee and stayed working under occupation at great personal risk, even pretending to work with the enemy (thus also risking targeting from resistance forces), while compiling ledgers that tracked the destinations of looted collections. Librarians in Lithuania concealed ancient Jewish texts in local church basements. In Poland, they hid 13th- century monastery manuscripts in bank vaults. Whole archives were moved across borders, under darkness, with brutal and certain death stalking anyone who might be caught in the act.
一百年后,灾难终将降临。分布式系统面临着过度自信和缺乏个人责任感的风险。二战期间,中心化机构的图书馆员和馆长们直接将作品从盖世太保和党卫队的魔爪中偷运出来。一些人拒绝逃离,冒着巨大的个人风险留在被占领区工作,甚至假装与敌人合作(因此也冒着被抵抗组织攻击的风险),同时编制账目,追踪被掠夺藏品的去向。立陶宛的图书馆员将古老的犹太文本藏在当地教堂的地下室里。在波兰,他们将 13 世纪的修道院手稿藏在银行金库里。整个档案馆在黑暗中被转移到国境之外,残酷而确定的死亡威胁着任何可能被抓住的人。

Still, there are plenty of examples of successful distributed or decentralized efforts worth considering—some of the oldest libraries in the world—Saint Catherine’s Monastery, Al-Qarawiyyin, Nalanda University, the Vatican Library, or Sakya Monastery—are arguably the surviving nodes of a network, as keepers of religious texts. They are made possible by the first principle of print—of the codex and the scroll and even the manuscript—that it exists to be copied, to be multiple. This in itself is an endorsement of the merits of decentralization.
然而,仍有很多成功的分布式或去中心化努力值得借鉴——世界上一些最古老的图书馆——圣凯瑟琳修道院、卡拉维因大学图书馆、那烂陀大学、梵蒂冈图书馆或萨迦寺——可以说是某个网络的幸存节点,作为宗教文本的守护者。它们之所以能够存在,源于印刷术(以及法典、卷轴甚至手稿)的最初原则——它们的存在是为了被复制,为了被多元化。这本身就证明了去中心化的优点。

Globally, the performing arts, theater, dance, and music have all utilized decentralization as a preservation tactic to a staggering level of success. In the case of European baroque and classical music, thousands of orchestras and music schools across the world used and use the act of collecting, copying, printing, studying, and playing to safeguard and transmit works across the centuries. Even original instruments, now worth fortunes and hundreds of years old—the Stradivari, Amati, Guarneri, Ruggieri, Guadagnini, et al—are still being played, preserved, passed down, and held in trust for the next generation of players. Periods of war and upheaval have seen small groups of musicians playing chamber music together in whatever spaces were available to them, allowing musical works to conquer time.
在全球范围内,表演艺术、戏剧、舞蹈和音乐都将去中心化作为一种保存策略,并取得了令人震惊的成功。以欧洲巴洛克和古典音乐为例,世界各地的数千个乐团和音乐学校都曾并仍在通过收集、复制、印刷、学习和演奏来保护和传承这些作品,跨越几个世纪。即使是现在价值连城、已有数百年的古董乐器——斯特拉迪瓦里、阿玛蒂、瓜奈里、鲁杰里、瓜达尼尼等——仍在演奏、保存、传承,并被托付给下一代演奏家。战争和动荡时期,小型音乐家团体会在任何可用的空间一起演奏室内乐,使音乐作品能够战胜时间。

Decentralized fan culture is inherently protective. Individual enthusiasts and digital pirates gathering in forums and Discord channels have done an incredible job preserving literary, music, video game, and film history through aggregation, emulation, and decentralized distribution. High-quality versions of the original unaltered cuts of films like Star Wars (which are no longer commercially available) are being preserved and held in this way. Volunteer teams of “rogue archivists”
去中心化的粉丝文化具有内在的保护性。聚集在论坛和 Discord 频道中的个人爱好者和数字盗版者,通过整合、模拟和去中心化分发,在保存文学、音乐、电子游戏和电影史方面做了不可思议的工作。像《星球大战》这样(目前已不再商业发行)的电影原始未经修改版本的优质拷贝,正以这种方式被保存和持有。“流氓档案管理员”志愿者团队
Citation: Archive Team  存档团队 have been engaged in decades-long efforts to save digital and web assets in danger of abandonment or destruction.
数十年来一直致力于抢救面临遗弃或销毁风险的数字和网络资产。

A personal collection of objects, the books on your bookshelf, for example, can easily engender a substantive emotional connection. What will be key for decentralized storage systems is developing similar mechanics. The most successful volunteer decentralized computing project in history, the 1999–2020 SETI@home project—which analyzed collected radio signals in the search for signs of extraterrestrial intelligence
例如,个人收藏的物品,比如书架上的书籍,很容易产生实质性的情感联系。对于去中心化存储系统来说,关键在于开发类似的机制。历史上最成功的志愿者去中心化计算项目——1999 年至 2020 年的 SETI@home 项目(分析收集到的无线电信号以寻找地外文明的迹象)就是一个例子。
Citation: A Brief History of SETI@Home (The Atlantic, 2017)
简史 SETI@home(大西洋月刊,2017)
—points to the ways such a scheme might be possible. Hundreds of thousands of computer users, including then-teenagers like me, gladly turned over their computers to this task. This was not accomplished with a promise of financial compensation, but an appeal to the sheer scope and grandeur of the mission and a genuine invitation to participate in something that could matter.
这指出了这种方案实现的可能性。数十万计算机用户,包括当时还是十几岁的我,都欣然将他们的电脑贡献给了这项任务。这并非依靠金钱补偿实现的,而是凭借这项任务的宏伟目标和真诚的参与邀请,让人们参与到一件可能意义重大的事情中。

What is consistent about these examples is that they all involve groups who care. The most enduring decentralized efforts don’t owe their success to technological or organizational innovation, but rather by having enlisted generations of people with an emotional and intellectual investment in their worth. For both cloud storage services and distributed storage schemes, the question is whether they can provoke the necessary level of passion and watchfulness. Are they and their technologies empowering those who care, or setting them up to fail? Can cloud storage corporations transform themselves into wardens? Can distributed storage systems turn each node into a guardian?
这些例子的共同点在于,它们都涉及到一群关心的人。最持久的分散式努力并非依靠技术或组织上的创新而成功,而是因为它们争取到了几代人对自身价值的情感和智力投入。对于云存储服务和分布式存储方案来说,问题在于它们能否激发必要的热情和警惕性。它们及其技术是否赋能于关心的人,还是将他们推向失败?云存储公司能否转变为守护者?分布式存储系统能否将每个节点都变成守护者?

Answers and Non-Answers  答案与非答案

I have mostly been beating around the bush here for 12,000 words. One can make a real argument that storage methods and media are largely irrelevant to survival over such long periods. The success of century-scale storage comes down to the same thing that storage and preservation of any duration does: maintenance. The everyday work of a human being caring for something. If a collection enjoys proper maintenance and care for 400 years, odds are, that collection will survive 400 years. How it is stored will evolve or change as it is maintained, but if there are maintainers, it will persist.
迄今为止,我已经在绕弯子说了 12000 字。实际上,人们可以提出一个强有力的论点,即对于如此长的时间跨度而言,存储方法和介质在很大程度上与生存无关。百年尺度的存储成功与任何时长的存储和保存的关键相同:维护。这是人类日常照料某物的行为。如果一个收藏品得到妥善的维护和保养 400 年,那么它很可能存活 400 年。在维护过程中,它的存储方式可能会发展或改变,但只要有人维护,它就会持续存在。

This will stay true even with huge potential advancements in storage media on the horizon—foremost among them DNA storage,
即便未来存储介质取得巨大进展——其中最重要的是 DNA 存储——这一点依然成立。
Citation: DNA: The Ultimate Data-Storage Solution (Scientific American, 2021)
DNA:终极数据存储方案(《科学美国人》,2021)
with its incredible capacity for density and replication. The method is currently limited by a painfully slow read/write speed and several processes that have not yet begun to be invented, but once it’s here, that technology will still have to be maintained.
它具有令人难以置信的密度和复制能力。但该方法目前受制于极其缓慢的读写速度,以及一些尚未开始研发的流程。即便技术成熟,后期维护仍然是一个挑战。

Digital storage relies on software. All software and file formats are dependent on upkeep and preservation, as the march of technological advancement renders the hardware and software previously used to read and create media obsolete. Longstanding software is rare enough that it can become an object of fascination. In 2015, MIT Technology Review writer Glenn Fleishman answered a reader’s question
数字存储依赖于软件。所有软件和文件格式都需要维护和保存,因为技术的进步使得以前用于读取和创建媒体的硬件和软件逐渐过时。能够长期使用的软件非常罕见,甚至可能成为人们关注的对象。2015 年,《麻省理工科技评论》的撰稿人格伦·弗莱施曼回答了读者的一个问题
Citation: What Is the Oldest Computer Program Still in Use? (MIT Technology Review, 2015)
仍在使用的最古老的计算机程序是什么?(麻省理工学院技术评论,2015 年)
about what the oldest computer program still in use was. He concluded that the oldest was a Defense Department contracts management and tracking system, MOCAS, first created in 1958. It is still in use today, despite its scheduled retirement date
关于仍在使用的最古老的计算机程序是什么,他得出的结论是,最古老的程序是国防部合同管理和跟踪系统 MOCAS,该系统于 1958 年首次创建。尽管其计划退役日期已过,它至今仍在使用。
Citation: Future of MOCAS (2018)
MOCAS 的未来(2018 年)
of October 1, 2002. Fleishman also referenced a 1948 IBM 402 punch card system for inventory and accounting that was still being used by a Texas-based water filtration device manufacturer.
为 2002 年 10 月 1 日。弗莱施曼还提到了一个 1948 年的 IBM 402 穿孔卡片系统,该系统用于库存和会计,至今仍在德克萨斯州一家水过滤设备制造商处使用。

An IBM punch card

An IBM punch card
一张 IBM 穿孔卡片

Public domain, via Wikimedia Commons
公共领域,来自维基共享资源

The IRS’s Individual Master File, the primary system for storing and processing tax submissions and inputting their data, was originally written in COBOL for IBM System/360 mainframe computers and has been running since the 1960s. There are parts of the UNIX codebase that have been continuously in use since the operating system’s start in 1969. There are likely implementations of assembly language that have been going since the 1950s.
国税局的个人主文件是存储和处理税务申报以及输入其数据的首要系统,最初是用 COBOL 语言为 IBM System/360 大型计算机编写的,并从 20 世纪 60 年代起一直运行至今。UNIX 代码库的部分内容自 1969 年操作系统启动以来就一直持续使用。一些汇编语言的实现可能从 20 世纪 50 年代就开始使用了。

It’s hard to determine the oldest piece of continuous digitally stored data that is not software or code itself and was never physicalized and re-digitized. Based on who the early adopters of hard drive and tape storage systems were, I would hazard that it’s a piece of meteorological or seismographic data recorded at a university on the West Coast of the United States, but that’s just a guess. The fact that some of these datasets were also held in print archives and later reentered into digital databases makes it hard to say for sure. The National Oceanic and Atmospheric Administration’s primary computer weather data system and Data Buoy systems have both been in use since 1970. These datasets have persisted through vigilance and the grinding attention of generations of scientists and their students. But none of them are close to attaining our 100-year mark.
很难确定现存最古老的、并非软件或代码本身、且从未物理化和重新数字化的一段连续数字存储数据。根据早期硬盘和磁带存储系统用户的推测,我猜想它可能是美国西海岸某所大学记录的气象或地震数据,但这只是猜测。一些数据集也保存在印刷档案中,后来才重新输入数字数据库,这使得很难确定。美国国家海洋和大气管理局的主要计算机气象数据系统和浮标数据系统自 1970 年以来一直在使用。这些数据集之所以能够保存至今,要归功于几代科学家及其学生的辛勤努力和坚持不懈。但它们都远未达到我们的百年目标。

Our digital tools fall into obsolescence and disrepair at an astonishing pace. Totally aside from issues related to preservation and storage, the risks when we fail to maintain software, and the knowledge and capacity to maintain it, are real and exigent. During the COVID-19 pandemic, several state and local governments found themselves in desperate need of COBOL programmers, as their unemployment and insurance systems still ran on software built in the relatively ancient language. If you want to store something for a hundred years, the ability to read and retrieve that stored item in the future is critical. And the only way to ensure that ability is to preserve the software that allows you to access your data, preserve the hardware that can run that software, and preserve the knowledge and skills required to maintain the entire system.
我们的数字工具正以惊人的速度走向过时和损坏。完全撇开保存和存储相关的问题不谈,当我们未能维护软件以及维护软件的知识和能力时,所面临的风险是真实存在的且迫在眉睫。在 COVID-19 疫情期间,一些州和地方政府发现自己迫切需要 COBOL 程序员,因为他们的失业和保险系统仍在运行着使用这种相对古老的语言编写的软件。如果你想存储一些东西一百年,那么将来能够读取和检索这些存储项至关重要。而确保这种能力的唯一方法是:保存允许你访问数据的软件,保存能够运行该软件的硬件,以及保存维护整个系统所需的知识和技能。

While plenty of computer scientists and thinkers, like co-inventor of the internet Vinton Cerf,
尽管许多计算机科学家和思想家,例如互联网的共同发明人文顿·瑟夫,
Citation: Bit rot (on digital vellum) |
数字媒介的比特腐烂 |

Vint Cerf | TEDxRoma
(YouTube, 2014)
文特·瑟夫 | TEDx 罗马(YouTube,2014)
have nobly proposed or theorized universal-file-format schemes that might change this reality, they remain a speculative fantasy. The only currently viable way to preserve software is through the hard everyday work of maintenance, adaptation, and emulation. Right now, there is no shortcut or magic format. There is no hack.
曾高尚地提出或设想了一些可能改变现状的通用文件格式方案,但这些方案仍然停留在推测的幻想阶段。目前保存软件唯一可行的方法是通过日常的维护、适配和模拟工作。现在,没有任何捷径或神奇的格式,也没有任何技巧可言。

But there are tools, efforts, and protocols that try to ease these burdens. The Web ARChive (WARC) file format is a standard for preserving web-based holdings that accommodates all sorts of secondary content. Fedora is a digital resource management system built from the ground up to preserve digital assets. CLOCKSS is an independent nonprofit implementation of LOCKSS technology intended as a long-term dark archive for journals and books. Rhizome, a digital art and culture organization that works out of the New Museum in New York City, has a dedicated digital preservation team working to preserve digital works. ArchiveBox is a self-hosted solution to archiving the web. The Media Archeology Lab at the University of Colorado Boulder preserves and documents obsolete media. John Bowers, Jack Cushman, Jayshree Sarathy, and Jonathan Zittrain, here at the Library Innovation Lab, have proposed “Strong Dark Archives,”
但也有一些工具、努力和协议试图减轻这些负担。Web ARChive (WARC) 文件格式是一种用于保存基于网络资源的标准,它可以容纳各种辅助内容。Fedora 是一个从根本上构建的数字资源管理系统,用于保存数字资产。CLOCKSS 是一个独立的非营利性组织,它实施 LOCKSS 技术,旨在为期刊和书籍建立一个长期的暗存档。位于纽约新博物馆的数字艺术和文化组织 Rhizome 拥有一个专门的数字保存团队,致力于保存数字作品。ArchiveBox 是一个用于存档网络的自托管解决方案。科罗拉多大学博尔德分校的媒体考古实验室负责保存和记录过时的媒体。图书馆创新实验室的 John Bowers、Jack Cushman、Jayshree Sarathy 和 Jonathan Zittrain 提出了“强大的暗存档”,
Citation: ‘Time Capsule’ Archiving Through Strong Dark Archives (SDA): Designing Trustable Distributed Archives for Sensitive Materials (Harvard Public Law Working Paper No. 22-17, 2022)
基于强暗档案 (SDA) 的“时间胶囊”归档: 面向敏感材料的可信赖分布式档案设计(哈佛公共法工作论文第 22-17 号,2022 年)
a protocol for born-digital sealed records that must be protected for security or legal reasons. There are dozens of other worthy projects and examples. The librarians and archivists of the world have been tackling the challenges of digital preservation for decades—the issue is that no one else is.
这是一种针对出于安全或法律原因必须受到保护的数字原生密封记录的协议。还有许多其他值得关注的项目和案例。世界各地的图书馆员和档案管理员几十年来一直在努力应对数字保存的挑战——问题在于其他人并没有这样做。

The real solution to century-scale storage, especially at scale, is to change this reality. Successful century-scale storage will require a massive investment in digital preservation, a societal commitment. Politicians, governments, companies, and investors will have to be convinced, incentivized, or even bullied.
解决世纪尺度存储问题,尤其是在大规模存储方面,真正的方案在于改变现状。成功的世纪尺度存储需要对数字保存进行巨额投资,需要全社会的共同努力。这需要说服、激励,甚至施压政客、政府、公司和投资者。

The United States allocates scant resources to the practice and problem of archiving and preservation in general. I tried to calculate what percentage of U.S. GDP is spent on libraries and archives—not just digital preservation, not even just preservation, but what sort of resources were allocated to the entire category. I aggregated budget reports from national, state, and local agencies, nonprofit institutions, industry groups, and corporate archives; assessed the productive capacities of the industries that serve these groups; spoke with economists, experts, and analysts at UBS, Morgan Stanley, and the Congressional Budget Office—and was never able to get close to an estimate that cracked 0.1 percent of GDP. According to the Institute of Museum and Library Services (IMLS) Public Libraries Survey, public libraries in the United States had a total operating expenditure of about $13 billion in FY 2018. The National Archives requested $481 million for their 2024 budget. The private sector spends very little on its own archival efforts. Even extremely large companies tend to employ a single corporate archivist, if that. Relative to the size of any other part of our government or economy, these numbers are tiny.
美国用于档案和保存工作(广义而言)的资源少得可怜。我试图计算美国 GDP 的多少百分比用于图书馆和档案馆——不仅仅是数字保存,甚至不仅仅是保存本身,而是整个领域的资源分配情况。我汇总了国家、州和地方机构、非营利组织、行业团体和公司档案馆的预算报告;评估了为这些团体服务的行业的生产能力;与瑞银、摩根士丹利和国会预算办公室的经济学家、专家和分析师进行了交谈——但始终无法得到一个接近 GDP 0.1%的估计值。根据博物馆和图书馆服务研究所(IMLS)的公共图书馆调查,2018 财年,美国公共图书馆的总运营支出约为 130 亿美元。国家档案馆 2024 年的预算请求为 4.81 亿美元。私营部门在其自身的档案工作上的投入非常少。即使是规模非常大的公司,也往往只有一名企业档案管理员,如果有的话。相对于我们政府或经济的任何其他部分的规模而言,这些数字微不足道。

Software is running our world. Spending so little to attempt to preserve something so important is a scandal.
软件正在掌控我们的世界。投入如此少的资金试图保护如此重要的东西简直是可耻的。

The best option to ensure century-scale storage is to radically change this order. Any storage provider serious about being a viable long-term storage option should be screaming about software preservation at every opportunity. If the corporate stakeholders in this space are serious about providing long-term storage to customers, they should wield the full power of their financial, human, and political capital to make digital preservation a greater priority.
确保百年尺度存储的最佳方案是彻底改变这种现状。任何一个认真对待长期存储的提供商都应该抓住一切机会大声疾呼软件保护的重要性。如果这个领域的企业利益相关者认真对待为客户提供长期存储,他们就应该动用其全部的资金、人力和政治资本,将数字保护提升到更高的优先级。

Every time a media company destroys an archive, every time a video game company prosecutes the preservers of content it has abandoned, every time a tech company kills a well-used product with no plan for preservation, these actions should be met with attention and resistance.
每当一家媒体公司销毁档案,每当一家游戏公司起诉那些保存其已放弃内容的人,每当一家科技公司终止一款广受欢迎的产品却没有任何保存计划时,这些行为都应该受到关注和抵制。

We are on the brink of a dark age,
我们正处于黑暗时代的边缘,
Citation: Raiders of the LostWeb
失落的网络掠夺者

(The Atlantic, 2015)   (《大西洋月刊》,2015)
or have already entered one. The scale of art, music, and literature being lost each day as the World Wide Web shifts and degenerates represents the biggest loss of human cultural production since World War II. My generation was continuously warned by teachers, parents, and authority figures that we should be careful online because the internet is written in ink, and yet it turned out to be the exact opposite. As writer and researcher Kevin T. Baker remarked,
或者已经输入了一个。随着万维网的变迁和退化,每天都在损失着艺术、音乐和文学的规模,这代表着自二战以来人类文化生产的最大损失。我们这一代人不断受到老师、父母和权威人士的警告,要小心上网,因为互联网是“墨水写成”的,然而事实却恰恰相反。正如作家兼研究员凯文·T·贝克所言,
Citation: X (formerly Twitter), 2024
X(前身为推特),2024 年
“On the internet, Alexandria burns daily.”
“在互联网上,亚历山大图书馆每天都在焚烧。”

The Library of Alexandria, 19th-century artistic rendering by German artist O. Von
                    Corven

19th-century artistic rendering of The Library of Alexandria
19 世纪亚历山大图书馆的艺术描绘

Public domain, "The Great Library of Alexandria" by O. Von Corven
公共领域,O·冯·科文著《亚历山大图书馆》

For century-scale storage, you aren’t fighting against mere mortal enemies—you’re waging a battle against the raging and unkind powers of geology, physics, and chemistry, not to mention the inexhaustible fallibility of humanity as a species. No quarter will be given.
对于世纪尺度的存储而言,你面对的并非仅仅是凡人敌人——你正在与地质、物理和化学这些无情且强大的力量作战,更不用说人类物种那无穷无尽的错误倾向了。不会有任何宽容。

If you want to store something for 100 years, what are the best methods for ensuring its survival? Hold it within a social or governmental structure that is most likely to facilitate maintenance and care. Be under the protection of or affiliated with the right nation-state (for example, one could argue that the holdings of the Library of Congress are backed up by the full force of the United States nuclear arsenal). Be part of a major religion. Be part of an aristocracy. Be part of a prominent artistic or intellectual scene, or a participant in an artistic or intellectual tradition.
如果你想保存某些东西 100 年,确保其存续的最佳方法是什么?将其置于最可能促进维护和保养的社会或政府结构中。处于某个合适的民族国家的保护之下或与其相关联(例如,有人可能会争辩说,美国国会图书馆的藏品得到了美国全部核武库的全力支持)。成为主要宗教的一部分。成为贵族阶层的一部分。成为重要的艺术或知识界的一部分,或参与艺术或知识传统。

Still, each of these governance structures also presents risks. “There is no political power without power over the archive,” Jacques Derrida wrote in Archive Fever. The centralized power structures of monarchies, despotic states, military dictatorships, and single-party rule all have a penchant for the intentional destruction of artifacts and records in order to maintain control. Clan-based systems regularly destroy that which is not contained within them. Dominant free-market ideologies actively incentivize the mass abandonment of anything that does not have the market value to sustain itself. Nation-states, even social democracies that rank highly on various freedom indices, have a spectacular capacity for both conscious and accidental censorship, selective preservation, and desertion of the artifacts under their care. As Fernando Báez writes in A Universal History of the Destruction of Books,
然而,这些治理结构也都存在风险。“没有权力凌驾于档案之上,就没有政治权力,”雅克·德里达在《档案热》中写道。君主制、专制国家、军事独裁统治和一党专政的中央集权结构都倾向于故意销毁文物和记录以维持控制。氏族制度经常会销毁那些不在其掌控范围内的物品。占主导地位的自由市场意识形态积极地激励人们大量抛弃任何没有市场价值来维持自身的东西。民族国家,即使是自由指数排名很高的社会民主国家,也具有惊人的能力,既能进行有意识的和无意识的审查、选择性保存,也能遗弃其保管的文物。正如费尔南多·巴埃斯在《一部关于书籍毁灭的通史》中所写的那样,
Citation: A Universal History of the Destruction of Books: From Ancient Sumer to Modern Iraq (Fernando Báez and Alfred MacAdam, 2008)
《毁书简史》 书籍:《从古代苏美尔到现代伊拉克》(费尔南多·巴埃斯和阿尔弗雷德·麦克亚当,2008 年)
“It’s a common error to attribute the destruction of books to ignorant men unaware of their hatred. After twelve years of study, I’ve concluded that the more cultured a nation or a person is, the more willing each is to eliminate books under the pressure of apocalyptic myths.”
将书籍的毁坏归咎于无知且不知自己怀有仇恨的人,这是一个常见的错误。经过十二年的研究,我得出结论:一个民族或个人越有文化,在末日神话的压力下就越愿意消灭书籍。

In order to survive, a data storer, and the makers of the tools they use, must be prepared to adopt a skeptical and even defiant attitude toward the societies in which they live. They must accept the protection of a patron while also preparing for the possibility of betrayal. If you’re wondering why much of this essay takes such an antagonistic pose toward external political and economic actors, while also considering the fruits of their offerings, it is because the century-scale archivist must sometimes be in service of an ideology that only answers to itself—to the protection of the collected artifacts at all costs. This ideology, an “Archivism,” entails a belief in the preservation of that which we make and think for future generations, at the expense of anything else. Century-scale storage can span methods and platforms, be enabled by governments and titans of industry, be helped by religions, cultures, artists, scenes, fans, collectors, technocrats, and engineers, but it must, at the end of the day, retain its values internally.
为了生存,数据存储者及其所用工具的制造者必须准备好对他们所生活的社会采取怀疑甚至对抗的态度。他们必须接受庇护人的保护,同时也要为背叛的可能性做好准备。如果您想知道为什么这篇论文的大部分内容对外部政治和经济行为者采取如此对抗性的姿态,同时又考虑了他们提供的成果,那是因为百年尺度的档案管理员有时必须服务于一种只对自己负责的意识形态——不惜一切代价保护收集到的文物。这种意识形态,“档案主义”,意味着相信为了子孙后代保存我们所创造和思考的东西,不惜一切代价。百年尺度的存储可以跨越方法和平台,可以由政府和行业巨头支持,可以得到宗教、文化、艺术家、场景、粉丝、收藏家、技术专家和工程师的帮助,但最终它必须在内部保持其价值观。

This is where, once again, the only true solution is an aggressive and massive investment in archives, libraries, digital preservationists, and software and hardware maintainers at every level, in every form of practice and economic circumstance. This needs to happen not just for states, corporations, and institutions, but for hobbyists and consumers. Many of our most treasured artistic and intellectual artifacts survived for decades in the hands of individuals long before they entered institutional care.
再次强调,唯一真正的解决方案是:在各个层面、各种实践形式和经济环境下,对档案馆、图书馆、数字保护人员以及软件和硬件维护人员进行积极且大规模的投资。这不仅需要国家、企业和机构去做,也需要业余爱好者和消费者参与。我们许多最珍贵的艺术和知识文物,在进入机构收藏之前,就已经在个人手中保存了几十年。

Resilience over time is not something that can be designed at the moment of inception and then forgotten. Century-scale storage requires a watchful eye that can adapt to new threats, to new paradigms, to that which could not be previously imagined. The goal of century-scale storage must be to preserve that which we have created so that others, those we will never meet, may experience their intricacies and ecstasies, their capacities for enlightenment. This should be done by whatever means necessary, whatever method or decision ensures the possibility of that future—one day at a time,—and be willing to change at any moment, to scrap and claw against the forces attempting to smother the light.
长期保存能力并非在创建之初就能设计好然后就置之不理。百年尺度的存储需要时刻关注,能够适应新的威胁、新的范式以及以前无法想象的事物。百年尺度存储的目标必须是保护我们创造的东西,以便其他人——那些我们永远不会遇到的人——能够体验其精妙之处和狂喜,以及其启迪能力。这应该通过任何必要的手段来实现,无论采用何种方法或决策,都能确保未来的可能性——一天一天地——并且随时准备改变,与试图扼杀光明的力量抗争到底。

If you are a company that offers a storage product, how can you help the long-term digital storage of archival material? Try to find a new investment model, one which might allow you to build for the longer term. Embrace open protocols. Support Right To Repair laws and build hardware that is repairable. Attack firms on other parts of the network and computational chains bent on centralization and monopoly. Help fund and implement a completely new paradigm of the digital preservation of software.
如果你是一家提供存储产品的公司,该如何帮助长期保存档案材料的数字化存储?尝试寻找新的投资模式,一种可能允许你进行长期建设的模式。拥抱开放协议。支持“维修权”法律,并制造可维修的硬件。打击网络和计算链的其他部分,这些部分致力于中心化和垄断。帮助资助和实施一种全新的软件数字保存模式。

And if you, an individual reading this, want to store something and ensure it survives a century, what should you do? More than one thing. You should combine every method available to you, layers of backups, armies of copies, and most of all, practices and sites that encourage a culture of watchfulness and care. You should fight for a society that values the sciences and arts and that which they produce. And then, each day, you should do whatever it takes to keep your something safe, do whatever you can to empower the next generation to do the same, and then entrust that battle to them, to repeat into futurity.
如果你是一位阅读本文的个人,想要存储一些东西并确保它能保存一个世纪,你应该怎么做?不止一件事情。你应该结合所有可用的方法,多层备份,大量的副本,最重要的是,鼓励警惕和细致文化的实践和场所。你应该为一个重视科学、艺术及其产出的社会而奋斗。然后,每一天,你都应该尽一切努力保护你的东西安全,尽你所能赋能下一代去做同样的事情,然后将这场战斗托付给他们,让它延续到未来。

In 2009, a couple renovated an abandoned house in sleepy St. Anne, Illinois. The house was in bad shape, clearly vandalized, ransacked, cracking and warping. They found papers strewn and stacked all over, with the name “Florence Price” written on them again and again. They had unwittingly discovered the composer’s former vacation house 56 years after her death, and a dozen works that had been thought long-lost, including two violin concertos and her fourth symphony. In 2015, the editor of a literary magazine was trolling through Princeton University’s rare book archive when he came across an unreleased short story by F. Scott Fitzgerald, which had been intended for publication but never released because of a conflict between the author and his agent. An unpublished Edith Wharton story was found at Yale’s Beinecke Rare Book & Manuscript Library that same year. In 2010, at that same archive, Richard Wright’s daughter found a previously unpublished novel manuscript hidden within his papers. If you walked into a record store in the early 1950s and asked them for a copy of Vivaldi’s “Four Seasons,” now one of the most ubiquitous pieces of music in the world, chances are they would have had no idea what you were talking about. In the 1920s, a small group of Vivaldi enthusiasts in Italy scoured local libraries and repositories for what was then thought of as a minor figure’s works, eventually finding half his scores in a monastery in Piedmont, and the other half held by a wealthy aristocratic family. The group found financial backers, purchased both collections for the University of Turin, and went to work for the next three decades resurrecting a musical reputation. By the 1960s, Vivaldi was omnipresent. When the Czech National Museum was re-cataloging and digitizing their archives in 2015, they found a long-lost 1785 collaboration between Wolfgang Amadeus Mozart and Antonio Salieri. Just this year, the municipal libraries of Leipzig revealed that another Mozart composition had been rediscovered within their holdings. A few weeks before this piece was published, the New York Times revealed a curator at the Morgan Library & Museum had unearthed a previously unknown Chopin Waltz.
2009 年,一对夫妇在伊利诺伊州圣安妮镇一处僻静的废弃房屋进行了翻修。这栋房子破败不堪,显然遭到过破坏和洗劫,墙壁开裂变形。他们发现满地散落着堆积如山的纸张,上面反复写着“弗洛伦斯·普莱斯”的名字。他们无意中发现了这位作曲家去世 56 年后留下的旧度假屋,以及十几部被认为早已遗失的作品,其中包括两首小提琴协奏曲和她的第四交响曲。2015 年,一家文学杂志的编辑在普林斯顿大学的珍本书籍档案馆翻阅资料时,偶然发现了一篇 F·斯科特·菲茨杰拉德未发表的短篇小说,这部小说原本打算出版,但由于作者和经纪人之间的冲突而未能面世。同年,耶鲁大学贝内克珍本书籍与手稿图书馆发现了一篇爱迪丝·华顿的未发表作品。2010 年,在同一档案馆,理查德·赖特的女儿发现了他遗留下的一部未发表的小说手稿。如果你在 20 世纪 50 年代初走进一家唱片店,问他们是否有维瓦尔第的《四季》,这首现在已成为世界上最广为流传的乐曲之一,他们很可能不知道你在说什么。在 20 世纪 20 年代,意大利一小群维瓦尔第爱好者搜寻当地图书馆和档案馆,寻找这位当时被认为是次要人物的作品,最终在一个皮埃蒙特的修道院里找到了他一半的乐谱,另一半则由一个富有的贵族家庭收藏。该小组找到了资金支持者,为都灵大学购买了这两个收藏,并在接下来的三十年里致力于恢复这位音乐家的声誉。到 20 世纪 60 年代,维瓦尔第的作品已无处不在。2015 年,捷克国家博物馆在重新编目和数字化其档案时,发现了一部 1785 年沃尔夫冈·阿马德乌斯·莫扎特和安东尼奥·萨列里合作创作的失传已久的作品。就在今年,莱比锡市图书馆透露,他们馆藏中又重新发现了一部莫扎特的作品。在这篇文章发表几周前,《纽约时报》披露,摩根图书馆和博物馆的一位馆长发现了一首此前不为人知的肖邦圆舞曲。
Citation: Hear a Chopin Waltz Unearthed After Nearly 200 Years
近 200 年后,一首肖邦圆舞曲重见天日

(The New York Times, 2024)
《纽约时报》,2024 年

All of these works were republished, rerecorded, or re-performed to great acclaim. What was lost was found. Small individual acts of care, spread over generations, led to their survival and rediscovery. The digital versions of these miracles can and will happen. One day, someone will find the flash drive on the ransacked floor of a house, the forgotten server in the ruin of a data center, the file in the bowel of a database. It will matter. Even if their contents had been damaged or forgotten, actions of previous care can bear fruit decades later. They are the difference between recovery and despair.
所有这些作品都被重新出版、重新录制或重新演绎,并获得了极高的赞誉。失去的被找回了。几代人细致入微的个体行为,促成了它们的幸存和重新发现。这些奇迹的数字化版本也能做到,而且将会做到。总有一天,有人会在被洗劫一空的房屋地板上找到闪存驱动器,在数据中心废墟中找到被遗忘的服务器,在数据库深处找到文件。这将至关重要。即使它们的内容已被损坏或遗忘,之前的细致呵护也能在几十年后结出硕果。它们是恢复与绝望之间的区别。

About the Author   关于作者

Maxwell Neely-Cohen is a fellow at the Library Innovation Lab. His nonfiction and essays have appeared in places like The New Republic, SSENSE, and BOMB Magazine. His non-writing work has spanned theater, video games, dance, and music. His experiments with technology have been acclaimed by The New York Times Magazine, Frieze, and The Financial Times. Before his literary and artistic career, he worked as a conflict analyst studying social upheaval, nuclear weapons, and the effects of asymmetric warfare on societies and economies. He lives in New York City.
麦克斯韦·尼利-科恩是图书馆创新实验室的研究员。他的非虚构作品和随笔发表在《新共和》、《SSENSE》和《炸弹杂志》等刊物上。他的非写作工作涵盖戏剧、电子游戏、舞蹈和音乐。他的技术实验受到《纽约时报杂志》、《艺术界》和《金融时报》的好评。在从事文学和艺术创作之前,他曾担任冲突分析师,研究社会动荡、核武器以及非对称战争对社会和经济的影响。他现居纽约市。

Credits   致谢

Edited by  编者
Clare Stanton   克莱尔·斯坦顿

Additional Editing  附加编辑
Meg Miller   梅格·米勒

Copy Editing  校对
Gillian Brassil   吉利安·布拉西尔

Art and Design  艺术与设计
Shelby Wilson and Alex Miller
雪尔比·威尔逊和亚历克斯·米勒

Web Accessibility  网页可访问性
Rebecca Cremona and Ben Steinberg
丽贝卡·克雷莫纳和本·斯坦伯格

Support for this project was provided by the Filecoin Foundation for the Decentralized Web
本项目得到去中心化网络文件币基金会的支持

Published by the Library Innovation Lab at Harvard Law School
哈佛法学院图书馆创新实验室出版

visual  视觉