This piece is the first in a new series from the Institute for Progress (IFP), called Compute in America: Building the Next Generation of AI Infrastructure at Home. In this series, we examine the challenges of accelerating the American AI data center buildout. Future pieces will be published at this link.
這篇文章是美國進步研究所(IFP)新系列文章的第一篇,名為 "美國的計算":在國內建設下一代人工智慧基礎設施》系列文章的第一篇。在本系列中,我們將探討加速美國人工智慧資料中心建置所面臨的挑戰。未來的文章將在此連結發布。
We often think of software as having an entirely digital existence, a world of “bits” that’s entirely separate from the world of “atoms." We can download endless amounts of data onto our phones without them getting the least bit heavier; we can watch hundreds of movies without once touching a physical disk; we can collect hundreds of books without owning a single scrap of paper.
我們通常認為,軟體是完全數位化的存在,是一個完全獨立於 "原子 "世界的 "位元 "世界。我們可以將無窮無盡的資料下載到手機上,而手機卻絲毫不會變重;我們可以觀看數以百計的電影,而不需要接觸任何實體磁碟;我們可以收集數以百計的書籍,而不需要擁有任何紙張。
But digital infrastructure ultimately requires physical infrastructure. All that software requires some sort of computer to run it. The more computing that is needed, the more physical infrastructure is required. We saw that a few weeks ago when we looked at the enormous $20 billion facilities required to manufacture modern semiconductors. And we also see it with state-of-the-art AI software. Creating a cutting-edge Large Language Model requires a vast amount of computation, both to train the models and to run them once they’re complete. Training OpenAI’s GPT-4 required an estimated 21 billion petaFLOP (a petaFLOP is 10^15 floating point operations).1 For comparison, an iPhone 12 is capable of roughly 11 trillion floating point operations per second (0.01 petaFLOP per second), which means that if you were able to somehow train GPT-4 on an iPhone 12, it would take you more than 60,000 years to finish. On a 100 Mhz Pentium processor from 1997, capable of a mere 9.2 million floating-point operations per second, training would theoretically take more than 66 billion years. And GPT-4 wasn’t an outlier, but part of a long trend of AI models getting ever larger and requiring more computation to create.
但數位基礎設施最終需要實體基礎設施。所有軟體都需要某種計算機來運作。需要的運算量越大,就需要越多的實體基礎設施。幾週前,當我們看到製造現代半導體所需的 200 億美元的龐大設施時,我們看到了這一點。我們在最先進的人工智慧軟體中也看到了這一點。創建尖端的大型語言模型需要大量的計算,既要訓練模型,又要在完成後運行模型。訓練 OpenAI 的 GPT-4 估計需要 210 億 petaFLOP(1 petaFLOP 相當於 10^15 次浮點運算)。 1 相較之下,iPhone 12 的浮點運算能力約為每秒11 兆次(每秒0.01 petaFLOP),這意味著,如果你能在iPhone 12 上以某種方式訓練GPT-4,你將需要60,000 多年才能完成。而在 1997 年的 100 Mhz 奔騰處理器上,每秒鐘只能進行 920 萬次浮點運算,理論上需要 660 億億年才能完成訓練。 GPT-4 並不是一個例外,而是人工智慧模型越來越大、需要更多計算才能創建的長期趨勢的一部分。
But, of course, GPT-4 wasn’t trained on an iPhone. It was trained in a data center, tens of thousands of computers and their required supporting infrastructure in a specially-designed building. As companies race to create their own AI models, they are building enormous compute capacity to train and run them. Amazon plans on spending $150 billion on data centers over the next 15 years in anticipation of increased demand from AI. Meta plans on spending $37 billion on infrastructure and data centers, largely AI-related, in 2024 alone. Coreweave, a startup that provides cloud and computing services for AI companies, has raised billions of dollars in funding to build out its infrastructure and is building 28 data centers in 2024. The so-called “hyperscalers,” technology companies like Meta, Amazon, and Google with massive computing needs, have enough estimated data centers planned or under development to double their existing capacity. In cities around the country, data center construction is skyrocketing.
當然,GPT-4 並不是在 iPhone 上訓練出來的。它是在資料中心訓練的,在一棟專門設計的大樓裡,有數以萬計的電腦及其所需的支援基礎設施。隨著各家公司競相創建自己的人工智慧模型,它們正在建立巨大的運算能力來訓練和運行這些模型。亞馬遜計劃在未來 15 年斥資 1500 億美元建造資料中心,以因應人工智慧需求的成長。 Meta 計畫僅在 2024 年就在基礎設施和資料中心上投入 370 億美元,其中大部分與人工智慧有關。為人工智慧公司提供雲端運算和運算服務的新創公司 Coreweave 已籌集數十億美元資金,用於建設基礎設施,並將在 2024 年建造 28 個資料中心。所謂的 "超大規模企業",即 Meta、亞馬遜和谷歌等有大量計算需求的科技公司,其規劃或正在開發的數據中心估計足以將現有容量翻一番。在全國各大城市,資料中心的建設正在快速發展。
But even as demand for capacity skyrockets, building more data centers is likely to become increasingly difficult. In particular, operating a data center requires large amounts of electricity, and available power is fast becoming the binding constraint on data center construction. Nine of the top ten utilities in the U.S. have named data centers as their main source of customer growth, and a survey of data center professionals ranked availability and price of power as the top two factors driving data center site selection. With record levels of data centers in the pipeline to be built, the problem is only likely to get worse.
但是,即使對容量的需求激增,建造更多的資料中心也可能變得越來越困難。尤其是,資料中心的運作需要大量電力,而可用電力正迅速成為資料中心建設的限制因素。在美國排名前十的公用事業公司中,有九家將資料中心列為其客戶成長的主要來源,而一項針對資料中心專業人士的調查則將電力供應和價格列為推動資料中心選址的兩大因素。隨著待建資料中心數量創下新高,問題只會越來越嚴重。
The downstream effects of losing the race to lead AI are worth considering. If the rapid progress seen over the last few years continues, advanced AI systems could massively accelerate scientific and technological progress and economic growth. Powerful AI systems could also be highly important to national security, enabling new kinds of offensive and defensive technologies. Losing the bleeding edge on AI progress would seriously weaken our national security capabilities, and our ability to shape the future more broadly. And another transformative technology largely invented and developed in America would be lost to foreign competitors.
在引領人工智慧的競爭中失利所帶來的下游影響值得深思。如果過去幾年的快速發展得以持續,先進的人工智慧系統將大大加快科技進步和經濟成長。強大的人工智慧系統對國家安全也非常重要,可以實現新型的進攻和防禦技術。失去人工智慧進步的製高點將嚴重削弱我們的國家安全能力,以及我們在更大範圍內塑造未來的能力。而另一項主要由美國發明和開發的變革性技術也將被外國競爭對手奪走。
AI relies on the availability of firm power. American leadership in innovating new sources of clean, firm power can and should be leveraged to ensure the AI data center buildout of the future happens here.
人工智慧依賴穩定的電力供應。美國在創新清潔、穩定電力新來源方面的領先地位可以而且應該得到充分利用,以確保未來的人工智慧資料中心建設在這裡實現。
Intro to data centers 資料中心介紹
A data center is a fundamentally simple structure: a space that contains computers or other IT equipment. It can range from a small closet with a server in it, to a few rooms in an office building, to a large, stand-alone structure built specifically to house computers.
資料中心從根本上來說是一個簡單的結構:一個容納電腦或其他 IT 設備的空間。它可以是一個裝有伺服器的小壁櫥,也可以是辦公大樓中的幾個房間,也可以是專門為放置電腦而建造的大型獨立建築。
Large-scale computing equipment has always required designing a dedicated space to accommodate it. When IBM came out with its System/360 in 1964, it provided a 200-page physical planning manual that gave information on space and power needs, operating temperature ranges, air filtration recommendations, and everything else needed for the computers to operate properly. But historically, even large computing operations could be done within a building mostly devoted to other uses. Even today, most “data centers” are just rooms or floors in multi-use buildings. According to the EIA, there were data centers in 97,000 buildings around the country as of 2012, including offices, schools, labs, and warehouses. These data centers, typically about 2,000 square feet in size, occupy just 2% of the building they’re in, on average.
大型計算設備總是需要設計一個專用空間來容納。 1964 年,IBM 推出 System/360 時,提供了一本 200 頁的實體規劃手冊,其中介紹了空間和電源需求、工作溫度範圍、空氣過濾建議以及電腦正常運作所需的其他一切資訊。但從歷史上看,即使是大型計算業務也可以在主要用於其他用途的建築內完成。即使在今天,大多數 "資料中心 "也只是多用途建築中的房間或樓層。根據 EIA 的數據,截至 2012 年,全國共有 97,000 棟建築物內設有資料中心,其中包括辦公室、學校、實驗室和倉庫。這些資料中心的面積通常約為 2000 平方英尺,平均僅佔所在建築物的 2%。
What we think of as modern data centers, specially-built massive buildings that house tens of thousands of computers, are largely an artifact of the post-internet era. Google’s first “data center” was 30 servers in a 28 square-foot cage, in a space shared by AltaVista, eBay, and Inktomi. Today, Google operates millions of servers in 37 purpose-built data centers around the world, some of them nearly one million square feet in size. These, along with thousands of other data centers around the world, are what power internet services like web apps, streaming video, cloud storage, and AI tools.
我們所認為的現代資料中心,即專門建造的可容納數萬台電腦的大型建築,在很大程度上是後互聯網時代的產物。 Google 的第一個 "資料中心 "是一個 28 平方英尺的籠子,裡面有 30 台伺服器,與 AltaVista、eBay 和 Inktomi 共用一個空間。如今,Google在全球 37 個專門建造的資料中心經營數百萬台伺服器,其中一些資料中心的面積接近 100 萬平方英尺。這些資料中心以及全球其他數千個資料中心為網路應用程式、串流影片、雲端儲存和人工智慧工具等網路服務提供動力。
A large, modern data center contains tens of thousands of individual computers, specially designed to be stacked vertically in large racks. Racks hold several dozen computers at a time, along with other equipment needed to operate them, like network switches, power supplies, and backup batteries. Inside the data center are corridors containing dozens or hundreds of racks.
大型現代化資料中心包含數以萬計的獨立計算機,這些計算機專門設計成垂直堆放在大型機架上。機架一次可容納數十台計算機,以及運作這些計算機所需的其他設備,如網路交換器、電源和備用電池。資料中心內有走廊,走廊上有數十或數百個機架。
The amount of computer equipment they house means that data centers consume large amounts of power. A single computer isn’t particularly power hungry: A rack-mounted server might use a few hundred watts, or about 1/5th the power of a hair dryer. But tens of thousands of them together create substantial demand. Today, large data centers can require 100 megawatts (100 million watts) of power or more. That’s roughly the power required by 75,000 homes, or needed to melt 150 tons of steel in an electric arc furnace.2 Power demand is so central, in fact, that data centers are typically measured by how much power they consume rather than by square feet (this CBRE report estimates that there are 3,077.8 megawatts of data center capacity under construction in the US, though exact numbers are unknown). Their power demand means that data centers require large transformers, high-capacity electrical equipment like switchgears, and in some cases even a new substation to connect them to transmission lines.
資料中心所容納的大量電腦設備意味著需要消耗大量電力。單一電腦並不特別耗電:一台安裝在機架上的伺服器可能只需幾百瓦,約為一台吹風機功率的 1/5。但成千上萬台伺服器加在一起就會產生龐大的電力需求。如今,大型資料中心可能需要 100 兆瓦(1 億瓦)或更多的電力。這大約相當於 75,000 個家庭所需的電量,或在電弧爐中熔化 150 噸鋼所需的電力。 2 事實上,電力需求是如此重要,以至於資料中心通常以其耗電量而不是平方英尺來衡量(世邦魏理仕的這份報告估計,美國正在建造的資料中心容量為3,077.8 兆瓦,但具體數字不詳)。對電力的需求意味著資料中心需要大型變壓器、開關櫃等大容量電氣設備,在某些情況下甚至需要新的變電站來連接輸電線路。
All that power eventually gets turned into heat inside the data center, which means it requires similarly robust equipment to move that heat out as swiftly as power comes on. Racks sit on raised floors, and are kept cool by large volumes of air pulled up from below and through the equipment. Racks are typically arranged to have alternating “hot aisles” (where hot air is exhausted) and “cold aisles” (where cool air is pulled in). The hot exhaust is removed by the data center’s cooling systems, chilled, and then recirculated. These cooling systems might be complex, with multiple “cooling loops” of heat exchange fluids, though nearly all data centers use air to cool the IT equipment itself.
所有電力最終都會在資料中心內轉化為熱量,這意味著需要同樣堅固耐用的設備,才能在通電後迅速將熱量排出。機架位於高架地板上,透過大量空氣從下往上穿過設備來保持冷卻。機架通常交替佈置成 "熱通道"(排出熱空氣)和 "冷通道"(吸入冷空氣)。熱廢氣由資料中心的冷卻系統排出、冷卻,然後再循環。這些冷卻系統可能很複雜,有多個熱交換流體的 "冷卻迴路",但幾乎所有資料中心都使用空氣來冷卻 IT 設備本身。
These cooling systems are large, unsurprisingly. The minimum amount of air needed to remove a kilowatt of power is roughly 120 cubic feet per minute; for 100 megawatts, that means 12 million cubic feet per minute. Data center chillers have cooling systems with thousands of times the capacities of a typical home air conditioner. Even relatively small data centers will have enormous air ducts, high-capacity chilling equipment, and large cooling towers. This video shows a data center with a one million gallon “cold battery” water tank: Water is cooled down during the night, when power is cheaper, and used to reduce the burden on the cooling systems during the day.
毫不奇怪,這些冷卻系統都很大。移除一千瓦功率所需的最小空氣量約為每分鐘 120 立方英尺;如果是 100 兆瓦,則意味著每分鐘 1,200 萬立方英尺。資料中心冷卻器的冷卻系統容量是一般家用空調的數千倍。即使是相對較小的資料中心,也會有巨大的空氣管道、大容量冷卻設備和大型冷卻水塔。這段影片展示了一個擁有 100 萬加侖 "冷電池 "水箱的資料中心:夜間電力較低時,水被冷卻下來,白天用來減輕冷卻系統的負擔。
Because of the amount of power they consume, substantial effort has gone into making data centers more energy efficient. A common data center performance metric is power usage effectiveness (PUE), the ratio of the total power consumed by a data center to the amount of power consumed by its IT equipment. The lower the ratio, the less power is used on things other than running computers, and the more efficient the data center.
由於資料中心耗電量龐大,因此人們在提高資料中心能源效率方面付出了巨大努力。一個常用的資料中心效能指標是電能使用效率(PUE),也就是資料中心消耗的總電能與 IT 設備消耗的電能之比。此比率越低,表示除電腦運作外,其他方面的耗電量越少,資料中心的效率越高。
Data center PUE has steadily fallen over time. In 2007, the average PUE for large data centers was around 2.5: For every watt used to power a computer, 1.5 watts were used on cooling systems, backup power, or other equipment. Today, the average PUE has fallen to a little over 1.5. And the hyperscalers do even better: Meta’s average data center PUE is just 1.09, and Google’s is 1.1. These improvements have come from things like more efficient components (such as uninterruptible power supply systems with lower conversion losses), better data center architecture (changing to a hot-aisle, cold-aisle arrangement), and operating the data center at a higher temperature so that less cooling is required.
隨著時間的推移,資料中心的 PUE 穩定下降。 2007 年,大型資料中心的平均 PUE 約為 2.5:每為一台電腦供電 1 瓦,就有 1.5 瓦用於冷卻系統、備用電源或其他設備。如今,平均 PUE 已降至 1.5 多一點。超大型資料中心的表現甚至更好:Meta 資料中心的平均 PUE 僅為 1.09,Google為 1.1。這些改進來自於更有效率的元件(如轉換損耗更低的不間斷電源系統)、更好的資料中心架構(改用熱通道、冷通道佈置),以及在更高溫度下運行資料中心,從而減少冷卻需求。
There have also been efficiency improvements after the power reaches the computers. Computers must convert AC power from the grid into DC power; on older computers, this conversion was only 60-70% efficient, but modern components can achieve conversion efficiencies of up to 95%. Older computers would also use almost the same amount of power whether they were doing useful work or not. But modern computers are more capable of ramping their power usage down when they’re idle, reducing electricity consumption. And the energy efficiency of computation itself has improved over time due to Moore’s Law: Smaller and smaller transistors mean less electricity is required to run them, which means less power is required for a given amount of computation. From 1970 to 2020, the energy efficiency of computation has doubled roughly once every 1.5 years.
電力到達計算機後,效率也有所提高。電腦必須將電網中的交流電轉換成直流電;在老式電腦上,這種轉換的效率只有 60-70%,但現代組件的轉換效率可高達 95%。老式計算機無論是否正在進行有用的工作,都會消耗幾乎相同的電力。但現代電腦更有能力在閒置時降低功耗,進而減少耗電量。由於摩爾定律的存在,計算本身的能效也在不斷提高:越來越小的電晶體意味著運行電晶體所需的電力越來越少,這意味著進行一定量的計算所需的電力也越來越少。從 1970 年到 2020 年,計算能源效率大約每 1.5 年翻倍。
Because of these steady increases in data center efficiency, while individual data centers have grown larger and more power-intensive, power consumption in data centers overall has been surprisingly flat. In the U.S., data center energy consumption doubled between 2000 and 2007 but was then flat for the next 10 years, even as worldwide internet traffic increased by more than a factor of 20. Between 2015 and 2022, worldwide data center energy consumption rose an estimated 20 to 70%, but data center workloads rose by 340%, and internet traffic increased by 600%.
由於資料中心效率的穩定提高,雖然單一資料中心的規模越來越大,耗電量也越來越大,但資料中心的整體耗電量卻出乎意料地保持平穩。在美國,資料中心的能耗在 2000 年至 2007 年間翻了一番,但在接下來的 10 年中卻持平,即使全球網路流量成長了 20 倍以上。 2015 年至 2022 年期間,全球資料中心能耗估計將成長 20% 至 70%,但資料中心工作負荷成長了 340%,網路流量成長了 600%。
Beyond power consumption, reliability is another critical factor in data center design. A data center may serve millions of customers, and service interruptions can easily cost tens of thousands of dollars per minute. Data centers are therefore designed to minimize the risk of downtime. Data center reliability is graded on a tiered system, ranging from Tier I to Tier IV, with higher tiers more reliable than lower tiers.3
除了耗電量,可靠性也是資料中心設計的另一個關鍵因素。一個資料中心可能為數百萬客戶提供服務,而服務中斷每分鐘的損失可達數萬美元。因此,資料中心的設計要盡量降低停機風險。資料中心的可靠性採用分級制度,從 I 級到 IV 級不等,較高的等級比較低的等級更可靠。 3
Most large data centers in the U.S. fall somewhere between Tier III and Tier IV. They have backup diesel generators, redundant components to prevent single points of failure, multiple independent paths for power and cooling, and so on. A Tier IV data center will theoretically achieve 99.995% uptime, though in practice human error tends to reduce this level of reliability.
美國大多數大型資料中心介於三級和四級之間。它們擁有備用柴油發電機、防止單點故障的冗餘組件、多個獨立的供電和冷卻路徑等。四級資料中心的正常運作時間理論上可達到 99.995%,但實際上人為錯誤往往會降低此可靠性水準。
Data center trends 資料中心趨勢
Over time, the trend has been for data centers to grow larger and consume greater amounts of power. In the early 2000s, a single rack in a data center might use one kilowatt of power. Today, typical racks in an enterprise data center use 10 kilowatts or less, and in a hyperscaler data center, that might reach 20 kilowatts or more. Similarly, 10 years ago, nearly all data centers used fewer than 10 megawatts, but a large data center today will use 100 megawatts or more. And companies are building large campuses with multiple individual data centers, pushing total power demand into the gigawatt range. Amazon’s much-reported purchase of a nuclear-powered data center was one such campus; it included an existing 48 MW data center and enough room for expansion to reach 960 MW in total capacity. As hyperscalers occupy a larger fraction of total data center capacity, large data centers and campuses will only become more common.
隨著時間的推移,資料中心的規模越來越大,耗電量也越來越高。 2000 年代初,資料中心中一個機架的耗電量可能只有 1 千瓦。如今,企業資料中心中典型機架的耗電量為 10 千瓦或更少,而在超級分頻器資料中心中,耗電量可能達到 20 千瓦或更多。同樣,10 年前,幾乎所有資料中心的用電量都在 10 兆瓦以下,但如今一個大型資料中心的用電量將達到 100 兆瓦或更多。而且,企業正在建造一個擁有多個獨立資料中心的大型園區,從而將總電力需求推高到千兆瓦級。根據報道,亞馬遜購買的核動力資料中心就是這樣一個園區;它包括一個現有的 48 兆瓦資料中心和足夠的擴展空間,總容量可達 960 兆瓦。隨著超大型資料中心在資料中心總容量中所佔比例越來越大,大型資料中心和園區只會越來越普遍。
Today data centers are still a small fraction of overall electricity demand. The IEA estimates that worldwide data centers consume 1 to 1.3% of electricity as of 2022 (with another 0.4% of electricity devoted to crypto mining). But this is expected to grow over time. SemiAnalysis predicts that data center electricity consumption could triple by 2030, reaching 3 to 4.5% of global electricity consumption. And because data center construction tends to be highly concentrated, data centers are already some of the largest consumers of electricity in some markets. In Ireland, for example, data centers use almost 18% of electricity, which could increase to 30% by 2028. In Virginia, the largest market for data centers in the world, 24% of the power sold by Virginia Power goes to data centers.
如今,資料中心在整體電力需求中仍只佔很小一部分。根據國際能源總署估計,截至 2022 年,全球資料中心的耗電量佔總耗電量的 1% 至 1.3%(另有 0.4% 的電力用於加密貨幣挖礦)。但隨著時間的推移,這一比例預計還會成長。 SemiAnalysis 預測,到 2030 年,資料中心的耗電量將增加兩倍,達到全球耗電量的 3% 到 4.5%。由於資料中心的建置往往高度集中,在某些市場,資料中心已成為最大的電力消耗者。例如,在愛爾蘭,資料中心的用電量幾乎佔總用電量的 18%,到 2028 年將增加至 30%。在全球最大的資料中心市場維吉尼亞州,維吉尼亞電力公司出售的電力中有 24% 用於資料中心。
Power availability has already become a key bottleneck to building new data centers. Some jurisdictions, including ones where data centers have historically been a major business, are curtailing construction. Singapore is one of the largest data center hubs in the world, but paused construction of them between 2019 and 2022, and instituted strict efficiency requirements after the pause was lifted. In Ireland, a moratorium has been placed on new data centers in the Dublin area until 2028. Northern Virginia is the largest data center market in the world, but one county recently rejected a data center application for the first time in the county’s history due to power availability concerns.
電力供應已成為建造新資料中心的關鍵瓶頸。一些轄區,包括那些資料中心歷來是主要業務的轄區,正在減少建設。新加坡是全球最大的資料中心樞紐之一,但在 2019 年至 2022 年期間暫停了資料中心的建設,並在暫停後製定了嚴格的效率要求。在愛爾蘭,都柏林地區已暫停新建資料中心,直到 2028 年。維吉尼亞州北部是全球最大的資料中心市場,但最近有一個縣由於擔心電力供應問題,在該縣歷史上首次拒絕了一個資料中心的申請。
In the U.S., the problem is made worse by difficulties in building new electrical infrastructure. Utilities are building historically low amounts of transmission lines, and long interconnection queues are delaying new sources of generation. Data centers can be especially challenging from a utility perspective because their demand is more or less constant, providing fewer opportunities for load shifting and creating more demand for firm power. One data center company owner claimed that the U.S. was nearly “out of power” for available data centers, primarily due to insufficient transmission capacity. Meta CEO Mark Zuckerberg has made similar claims, noting that “we would probably build out bigger clusters than we currently can if we could get the energy to do it." One energy consultant pithily summed up the problem as “data centers are on a one to two-year build cycle, but energy availability is three years to none."
在美國,建造新電力基礎設施的困難使問題變得更加嚴重。公用事業公司正在建造的輸電線路數量創下歷史新低,而冗長的互聯隊列也延誤了新的發電來源。從電力公司的角度來看,資料中心尤其具有挑戰性,因為它們的需求或多或少都是恆定的,因此負載轉移的機會較少,對穩定電力的需求也更大。一家資料中心公司的所有者聲稱,主要由於輸電能力不足,美國可用的資料中心幾乎 "無電可用"。 Meta 執行長 Mark Zuckerberg)也有類似的說法,他指出:「如果我們能獲得能源,我們可能會建立比現在更大的集群。一位能源顧問將這個問題精闢地概括為:"資料中心的建置週期是一到兩年,但能源的可用性卻是三年到零"。
Part of the electrical infrastructure problem is a timing mismatch. Utility companies see major electrical infrastructure as a long-term investment to be built in response to sustained demand growth. Any new piece of electrical infrastructure will likely be used far longer than a data center might be around, and utilities can be reluctant to build new infrastructure purely to accommodate them. In some cases, long-term agreements between data centers and utilities have been required to get new infrastructure built. An Ohio power company recently filed a proposal that would require data centers to buy 90% of the electricity they request from the utility, regardless of how much they use. Duke Energy, which supplies power to Northern Virginia, has similarly introduced minimum take requirements for data centers that require them to buy a minimum amount of power.
電力基礎設施問題的部分原因在於時機不匹配。公用事業公司認為,大型電力基礎設施是一項長期投資,需要根據需求的持續成長來建造。任何新的電力基礎設施的使用時間都可能遠遠超過資料中心的使用時間,因此公用事業公司可能不願意純粹為了滿足資料中心的需求而建造新的基礎設施。在某些情況下,資料中心和電力公司之間必須簽訂長期協議,才能建造新的基礎設施。俄亥俄州的一家電力公司最近提交了一份提案,要求資料中心向電力公司購買其所需電力的 90%,無論其用電量多少。為維吉尼亞州北部供電的杜克能源公司也提出了類似的最低用電要求,要求資料中心購買最低數量的電力。
Data center builders are responding to limited power availability by exploring alternative locations and energy sources. Historically, data centers were built near major sources of demand (such as large metro areas) or major internet infrastructure to reduce latency.4 But lack of power and rising NIMBYism in these jurisdictions may shift their construction to smaller cities, where power is more easily available. Builders are also experimenting with alternatives to utility power, such as local solar and wind generation connected to microgrids, natural gas-powered fuel cells, and small modular reactors.
資料中心建造者正在透過探索替代地點和能源來應對有限的電力供應。從歷史上看,資料中心都建在主要需求來源(如大型都會區)或主要網路基礎設施附近,以減少延遲。 4 但在這些地區,電力匱乏和日益高漲的 "NIMBY主義 "可能會將資料中心的建設轉移到電力供應更方便的小城市。建商也嘗試公用電力的替代品,如與微電網相連的本地太陽能和風能發電、天然氣燃料電池和小型模組化反應器。
Influence of AI 人工智慧的影響
What impact will AI have on data center construction? Some have projected that AI models will become so large, and training them so computationally intensive, that within a few years data centers might be using 20% of all electricity. Skeptics point out that historically increasing data center demand has been almost entirely offset by increased data center efficiency. They point to things like Nvidia's new, more efficient AI supercomputer (the GB200 NVL72), more computationally efficient AI models, and future potential ultra-efficient chip technologies like photonics or superconducting chips as evidence that this trend will continue.
人工智慧會對資料中心建置產生什麼影響?有人預測,人工智慧模型將變得如此龐大,對其進行訓練的計算密集度將如此之高,以至於在幾年內,資料中心的用電量可能會占到總用電量的 20%。懷疑的人指出,從歷史上看,資料中心需求的成長幾乎完全被資料中心效率的提高所抵消。他們指出,英偉達(Nvidia)新推出的更有效率的人工智慧超級電腦(GB200 NVL72)、運算效率更高的人工智慧模型,以及未來潛在的超高效晶片技術(如光子學或超導晶片)等,都是這趨勢將持續下去的證據。
We can divide the likely impact of AI on data centers into two separate questions: the impact on individual data centers and the regions where they're built and the impact of data centers overall on aggregate power consumption.
我們可以將人工智慧對資料中心可能產生的影響分為兩個獨立的問題:對單一資料中心及其所在地區的影響,以及資料中心整體對總耗電量的影響。
For individual data centers, AI will likely continue driving them to be larger and more power-intensive. As we noted earlier, training and running AI models requires an enormous amount of computation, and the specialized computers designed for AI consume enormous amounts of power. While a rack in a typical data center will consume on the order of 5 to 10 kilowatts of power, a rack in an Nvidia superPOD data center containing 32 H100s (special graphics processing units, or GPUs, designed for AI workloads that Nvidia is selling by the millions) can consume more than 40 kilowatts. And while Nvidia’s new GB200 NVL72 can train and run AI models more efficiently, it consumes much more power in an absolute sense, using an astonishing 120 kilowatts per rack. Future AI-specific chips may have even higher power consumption. Even if future chips are more computationally efficient (and they likely will be), they will still consume much larger amounts of power.
就單一資料中心而言,人工智慧可能會繼續推動它們變得更大、更耗電。正如我們前面提到的,訓練和運行人工智慧模型需要大量的計算,而為人工智慧設計的專用電腦耗電量龐大。典型資料中心的一個機架耗電量約為5 到10 千瓦,而Nvidia superPOD 資料中心的一個機架包含32 個H100(專為AI 工作負載設計的專用圖形處理單元,即GPU,Nvidia 正以百萬計的速度銷售),耗電量可超過40 千瓦。雖然 Nvidia 的新 GB200 NVL72 可以更有效率地訓練和運行 AI 模型,但其絕對功耗要高得多,每個機架的功耗達到了驚人的 120 千瓦。未來人工智慧專用晶片的功耗可能會更高。即使未來的晶片運算效率更高(很可能會更高),它們的功耗仍然會更大。
Not only is this amount of power far more than what most existing data centers were designed to deliver, but the amount of exhaust heat begins to bump against the boundaries of what traditional, air-based cooling systems can effectively remove. Conventional air cooling is likely limited to around 20 to 30 kilowatt racks, perhaps 50 kilowatts if rear heat exchangers are used. One data center design guide notes that AI demands might require such large amounts of airflow that equipment will need to be spaced out, with such large airflow corridors that IT equipment occupies just 10% of the floor space of the data center. For its H100 superPOD, Nvidia suggests either using fewer computers per rack, or spacing out the racks to spread out power demand and cooling requirements.
這不僅遠遠超出了大多數現有資料中心的設計功率,而且排出的熱量也開始超出傳統空氣冷卻系統所能有效去除的範圍。傳統的空氣冷卻可能僅限於 20 到 30 千瓦的機架,如果使用後置熱交換器,則可能達到 50 千瓦。一份資料中心設計指南指出,人工智慧的需求可能需要大量的氣流,因此需要將設備間隔開來,設置大型氣流走廊,使 IT 設備僅佔資料中心地面空間的 10%。對於其 H100 superPOD,Nvidia 建議每個機架使用更少的計算機,或將機架間隔開來,以分散電力需求和冷卻要求。
Because current data centers aren’t necessarily well-suited for AI workloads, AI demand will likely result in data centers designed specifically for AI. SemiAnalysis projects that by 2028, more than half of data centers will be devoted to AI. Meta recently canceled several data center projects so they could be redesigned to handle AI workloads. AI data centers will need to be capable of supplying larger amounts of power to individual racks, and of removing that power when it turns into waste heat. This will likely mean a shift from air cooling to liquid cooling, which uses water or another heat-conducting fluid to remove heat from computers and IT equipment. In the immediate future, this probably means direct-to-chip cooling, where fluid is piped directly around a computer chip. This strategy is already used by Google’s tensor processing units (TPUs) designed for AI work and for Nvidia’sGB200 NVL72. In the long term, we may see immersion cooling, where the entire computer is immersed in a heat-conducting fluid.
由於目前的資料中心並不一定非常適合人工智慧工作負載,因此人工智慧的需求可能會導致專門為人工智慧設計的資料中心的出現。 SemiAnalysis 預測,到 2028 年,將有超過一半的資料中心專門用於人工智慧。 Meta 公司最近取消了幾個資料中心項目,以便重新設計以處理人工智慧工作負載。人工智慧資料中心將需要能夠為單一機架提供更大量的電力,並在電力變成廢熱時將其移除。這很可能意味著從空氣冷卻轉向液體冷卻,即使用水或其他導熱液體從電腦和 IT 設備中帶走熱量。在不久的將來,這可能意味著直接對晶片冷卻,即在電腦晶片周圍直接鋪設液體管道。谷歌專為人工智慧工作設計的張量處理單元(TPU)和英偉達的GB200 NVL72已經採用了這種策略。從長遠來看,我們可能會看到浸入式冷卻,即整台電腦浸入導熱液體中。
Regardless of the cooling technology used, the enormous power consumption of these AI-specific data centers will require constructing large amounts of new electrical infrastructure, such as transmission lines, substations, and firm sources of low-carbon power, to meet tech companies' climate goals. Unblocking the construction of this infrastructure will be critical for the U.S. to keep up in the AI race.
無論使用哪種冷卻技術,這些人工智慧專用資料中心的龐大耗電量都需要建造大量新的電力基礎設施,如輸電線路、變電站和穩定的低碳電力來源,以實現科技公司的氣候目標。美國要想在人工智慧競賽中保持領先地位,阻礙這些基礎設施的建設將是至關重要的。
Our second question is what AI’s impact will be on the aggregate power consumption of data centers. Will AI drive data centers to consume an increasingly large fraction of electricity in the US, imperiling climate goals? Or will increasing efficiency mean a minimal increase in data center power consumption in aggregate, even as individual AI data centers grow monstrous?
我們的第二個問題是,人工智慧將對資料中心的總耗電量產生什麼影響。人工智慧是否會促使資料中心消耗越來越多的美國電力,從而危及氣候目標?或者說,即使單一人工智慧資料中心的規模越來越大,但效率的提高是否意味著資料中心總耗電量的成長微乎其微?
This is more difficult to predict, but the outcome is likely somewhere in between. Skeptics are correct to note that historically data center power consumption rose far less than demand, that chips and AI models will likely get more efficient, and that naive extrapolation of current power requirements is likely to be inaccurate. But there's also reason to believe that data center power consumption will nevertheless rise substantially. In some cases, efficiency improvements are being exaggerated. The efficiency improvement of Nvidia's NVL72 is likely to be far less in practice than the 25x number used by Nvidia for marketing purposes. Many projections of power demand, such as those used internally by hyperscalers, already take future efficiency improvements into account. And while novel, ultra-lower power chip technologies like superconducting chips or photonics might be plausible options in the future, these are far-off technologies that will do nothing to address power concerns over the next several years.
這種情況更難預測,但結果可能介於兩者之間。懷疑論者正確地指出,從歷史上看,資料中心耗電量的成長遠低於需求的成長,晶片和人工智慧模型可能會變得更加高效,對當前電力需求的天真推斷很可能是不準確的。但我們也有理由相信,資料中心的耗電量仍將大幅上升。在某些情況下,效率的提高被誇大了。 Nvidia NVL72 的效率改進在實際應用中可能遠低於 Nvidia 用於行銷目的的 25 倍數字。許多功耗需求預測(如超大規模廠商內部使用的預測)已經將未來的效率提升納入考量。雖然超導晶片或光子學等新型超低功耗晶片技術可能是未來可行的選擇,但這些技術還很遙遠,對解決未來幾年的功耗問題毫無幫助。
In some ways, there are far fewer opportunities for data center energy reductions than there used to be. Historically, data center electricity consumption was flat largely due to increasing PUE (less electricity spent on cooling, UPS systems, etc). But many of these gains have already been achieved: the best data centers already use just 10% of their electricity for cooling and other non-IT equipment.
在某些方面,資料中心的節能機會比以前少得多。從歷史上看,資料中心的耗電量持平主要是由於 PUE 的增加(冷卻、UPS 系統等的耗電量減少)。但其中許多成果已經實現:最好的資料中心僅將 10% 的電力用於冷卻和其他非 IT 設備。
Skeptics also fail to appreciate how enormous AI models are likely to become, and how easily increased chip efficiency might get eaten by demands for more computation. Internet traffic took roughly 10 years to increase by a factor of 20, but cutting-edge AI models are getting four to seven times as computationally intensive every year. Data center projections by SemiAnalysis, which take into account factors such as current and projected AI chip orders, tech company capital expenditure plans, and existing data center power consumption and PUE, suggest that global data center power consumption will more than triple by 2030, reaching 4.5% of global electricity demand. Regardless of aggregate trends, rising power demands for individual data centers will still create infrastructure and siting challenges that will need to be addressed.
持懷疑態度的人也沒有意識到人工智慧模型可能變得多麼龐大,以及晶片效率的提高可能多麼容易被更多的運算需求所吞噬。網路流量大約花了 10 年時間成長 20 倍,但尖端人工智慧模式的運算密集度每年增長四到七倍。 SemiAnalysis 的資料中心預測考慮了當前和預計的人工智慧晶片訂單、科技公司資本支出計劃以及現有資料中心耗電量和PUE 等因素,認為到2030 年,全球資料中心耗電量將增加兩倍多,達到全球電力需求的4.5%。無論整體趨勢如何,單一資料中心不斷增長的電力需求仍將帶來基礎設施和選址方面的挑戰,亟待解決。
Conclusion 結論
The rise of the internet and its digital infrastructure has required the construction of vast amounts of physical infrastructure to support it: data centers that hold tens of thousands of computers and other IT equipment. And as demands on this infrastructure rose, data centers became ever larger and more power-intensive. Modern data centers demand as much power as a small city, and campuses of multiple data centers can use as much power as a large nuclear reactor.
互聯網及其數位基礎設施的崛起需要建造大量的實體基礎設施來支援:容納數以萬計的電腦和其他 IT 設備的資料中心。隨著對這些基礎設施的要求不斷提高,資料中心變得越來越大,耗電量也越來越大。現代資料中心所需的電力相當於一個小型城市,而由多個資料中心組成的園區所需的電力相當於一個大型核反應器。
The rise of AI will accelerate this trend, requiring even more data centers that are increasingly power-intensive. Finding enough power for them will become increasingly challenging. This is already starting to push data center construction to areas with available power, and as demand continues to increase from data center construction and broader electrification, the constraint is only likely to get more binding.
人工智慧的興起將加速這一趨勢,需要更多的資料中心,而這些資料中心的耗電量越來越大。為其尋找足夠的電力將變得越來越具有挑戰性。這已經開始將資料中心的建設推向有電力供應的地區,隨著資料中心建設和更廣泛的電氣化需求的不斷增加,這種限制只會變得更加具有約束力。
A floating-point operation is a mathematical operation on decimal numbers, like 11.2 + 3.44 or 99.8 / 6.223.
浮點運算是對十進位數進行的數學運算,如 11.2 + 3.44 或 99.8 / 6.223。
Per the steel presentation, a typical electric arc furnace makes between 130 and 180 tons per hour, and requires 650 kilowatt-hours of power per ton. That yields 97,500 kilowatts, or 97.5 megawatts.
根據鋼鐵業的介紹,典型的電弧爐每小時生產 130 到 180 噸,每噸需要 650 千瓦時的電力。這就產生了 97500 千瓦,即 97.5 兆瓦。
Other countries sometimes have their own data center grading systems that broadly correspond to this tiered system. Some providers claim they have even more reliable Tier V data centers, an unofficial tier that doesn’t seem to be endorsed by the Uptime Institute, a data center trade organization.
其他國家有時也有自己的資料中心分級系統,與這個分級系統大致對應。一些提供者聲稱,他們擁有更可靠的 V 級資料中心,這是一個非官方級別,似乎沒有得到資料中心行業組織正常運作時間協會的認可。
Being near major internet infrastructure is part of the reason why Northern Virginia became a data center hotspot.
北維吉尼亞州之所以成為資料中心的熱點地區,部分原因在於這裡靠近主要的網路基礎設施。
Subscribe to Construction Physics
訂閱建築物理學
Essays about buildings, infrastructure, and industrial technology.
有關建築、基礎設施和工業技術的文章。
Why not build data centers where there is lots of power and cooling is less of an issue due to cold outdoor temperatures - like the James Bay area of Quebec?
為什麼不把資料中心建在電力充足、因室外溫度低而冷卻問題較少的地方--例如魁北克的詹姆斯灣地區?
I'm kinda surprised at the NIMBY issue. I mean a data center is in some sense the perfect neighbor as they will occupy high value real estate and have virtually no burden in terms of transit nor burden most city services.
我對NIMBY問題感到有點驚訝。我的意思是,從某種意義上說,資料中心是完美的鄰居,因為它們將佔據高價值的房地產,在交通方面幾乎沒有負擔,也不會對大多數城市服務造成負擔。