《中國電信:2024亞太區智算中心液冷應用現狀與技術演進白皮書(英文版)(118頁).pdf》由會員分享,可在線閱讀,更多相關《中國電信:2024亞太區智算中心液冷應用現狀與技術演進白皮書(英文版)(118頁).pdf(118頁珍藏版)》請在三個皮匠報告上搜索。
1、White PaperLiquid Cooling Application and Technology Evolution in APAC AIDCAcknowledgementsHere,we would like to express special thanks to the management of China Telecom global,especially to WuXiaolei,Chen Kai and Cheng Yong for their great support during the writing process of this whitepaper.We a
2、lso would like to express thanks for the industry experts who share their expertise and experiencethrough Deep Knowledge in-depth seminar,interviews and other forms during the writing process of theWhitepaper“liquid cooling application and Technology evolution in APAC AIDC”(in alphabetical order,wit
3、houtranking):Cao weibing,Chai Xue,Chen Gang,Chen Miao,Chen Tianpeng,Ding Haifeng,Du Huarui,DuanZhen,Feng Libo,Fu Xiao,Ge Ying,Huang Weihua,Jing Tangbo,Ju Changbin,Kozen,Li Jian,Li Dianlin,Li Hui,Li Zhiqiang,Li Wei,Liu Weimin,Liu Xin,Lu Gan,Lu Jingying,Luo Zhiming,Wang Yiou,WangHaifeng,Qiao Xingbo,Qi
4、ao Qiao,Ren Zheng,Ren Huahua,Sun Di,Tang Hu,Tang Kang,Ng Daoxiong,NgJianyu,Tuan Giang,Zhang Binghua,Zhang Guanghong,Zhang Peng,Zhang Qixin,Zhang Shanshan,ZhuLiang,Zhang Yi,Zhou Xiaowei,etc.Thanks for your hard work and effort,the presentation of each knowledge point in this whitepaper is a resultof
5、your efforts.We hope that this whitepaper“Liquid cooling application and technology evolution in APAC AIDC”willnot only provide a solid theoretical basis for the development of liquid cooling technology,but also promote theinnovation and development of liquid cooling technology in the APAC data cent
6、ers,and contribute to theevolution of the data centers!Liquid cooling application and technology evolution in APAC AIDCEditorial BoardDecember 20241 1IntroductionIn the last month of 2024,we can briefly review what this year means for the data center industry.If 2023was the first year of AI due to t
7、he release of Chat GPT,then we are confident enough to call 2024 the first year ofliquid cooling,specifically in the data center industry.However,from the beginning to the end of 2024,theapplication of liquid cooling technology in the data center industry has experienced a shift of attitude fromskep
8、ticism,to understanding and acceptance,and finally to firm embrace.One of the dramatic aspects of thisprocess is that it has completed the entire process of the industry from doubting to embracing a core technology injust one year,this has never happened before in data center industry which has long
9、 been known for itsconservatism.As one of the leaders of global telecommunications companies,China Telecom has long focused on thedevelopment of intelligent computing technology and market,and devoted itself to the research and innovation ofAI-related infrastructure.In December 2023,according to the
10、 development status and expectations of the domesticAIDC,China Telecom Group first proposed a new generation of intelligent computing infrastructure constructionguide with“Two resiliences and one optimization”as the core methodology,from the three dimensions of“Energyresilience,refrigeration resilie
11、nce and airflow optimization”,the core management objectives of intelligentcomputing infrastructure are summarized,which effectively guides the whole industry ecology scientifically andefficiently.At the beginning of 2024,China Telecom began to pay attention to the development of internationalintell
12、igent computing infrastructure represented by the APAC,especially the application and trend of liquidcooling technology in AIDCs(AIDC).In view of the large number of countries in the APAC,under specialcircumstances such as different national conditions,rapid development of chip technology,and opaque
13、 industrialinformation,CTG hopes to quickly investigate the application of liquid cooling technology in APAC AIDCthrough an international,dynamic and standardized research approach,CTG hopes to quickly investigate theapplication of liquid cooling technology in APAC AIDC,and propose specific liquid c
14、ooling deployment methodsbased on different scenarios,thus illuminating a beacon for the APAC intelligent computing infrastructure industryin the uncertainty.Based on the above background and demands,CTG joint initiate the compilation of this whitepaper liquidcooling application and technology evolu
15、tion in APAC AIDC,and strived to conduct in-depth research and field2 2investigation,accurately describe the current application of liquid cooling in different national conditions,differentmodels and different stages in the APAC.Through summary,induction and innovation,based on the overalltechnical
16、framework of“Two resiliences and one optimization”,the scenario comparison of liquid coolingapplication is carried out,and the development trend of the AIDC is effectively judged.From May to November 2024,CTG and DeepKnowledge Community have held a number ofDeepKnowledge Symposiums and live intervie
17、ws on liquid cooling applications in Jakarta,Shanghai,Hong Kong,Singapore,Kuala Lumpur,Ho Chi Minh,Beijing and other cities,through DeepKnowledge Symposium and fieldinterviews,a large amount of liquid cooling-related knowledge material was generated,and then carefully sortedout and compiled by the C
18、TG expert team and the DeepKnowledge Intelligence researcher team to bring thiswhitepaper“liquid cooling application and technology evolution in APAC AIDC”.Distinguished by a purely engineering perspective,a broad international vision,and in-depth investigativeinterviews,this white paper aims to ser
19、ve as a valuable reference for data center practitioners in the APAC regionto gain in-depth understanding of liquid cooling.During the interview and preparation of this Whitepaper,CTG and DeepKnowledge Community receivedgreat support from ASHRAE National branch organizations in APAC.Not only the whi
20、tepaper itself,but also thewhole process of research,interview,compilation,and translation of the whitepaper is carried out in the context ofan international team.This has effectively promoted communication and exchange between China and data centerengineers in the APAC.Finally,we would like to expr
21、ess thanks to the researchers and volunteer team ofDeepKnowledge Community for their hard work in preparing this whitepaper.We welcome your comments and suggestions after reading this whitepaper.From the APAC and global scope,CTG will continue to listen carefully to the voices from the front line of
22、 the industry,with a view to providingbetter services for the development of the broader international intelligent computing industry.China Telecom global Co.,Ltd.December 10,20243 3Contents1.The development of AIDC industry and the opportunities and challenges of liquid cooling technology.11.1 The
23、challenges of energy consumption and heat dissipation brought by the rapid development of GPUchips.11.2 large-scale deployment brings new challenges to rack cooling.31.3 The challenges of scale and energy efficiency in AIDC.41.4 The challenge of water efficiency WUE in AIDC.61.5 opportunities and ch
24、allenges of liquid cooling technology application.72.The development of the APAC AIDC.92.1 The Current situation and analysis of key regions of data center industry in APAC.92.2 The challenges of climate in APAC AIDC.122.3 Application status and development trend of liquid cooling in APAC AIDC.163.T
25、he analysis of liquid cooling main technology roadmap and architecture.233.1 General architecture of liquid cooling.233.2 Heat capture.233.3 Heat exchange.323.4 Cold source.343.5 Classification of liquid cooling architectures.434.The characteristics analysis of air-liquid hybrid refrigeration archit
26、ecture.474.1 Wind-liquid fusion is the only way for the application of liquid cooling in AIDC.474.2 Common air-liquid fusion architectures.484.3 Comparative analysis of WUE,Pue and TCO under different wind-liquid fusion architectures.554.4 Architecture selection recommendations.565.The selection and
27、 analysis of typical AIDC liquid cooling application scenarios.595.1 Liquid-cooled architecture of large data center.595.2 Liquid-cooled architecture of small and medium-sized data center.626.Prefabricated Technology of liquid cooling system.676.1 Development trend and value of Prefabricated Technol
28、ogy of data center products.674 46.2 Prefabricated Technology of Cold source solution.686.3 Integrated liquid-cooling cabinet and liquid-cooling micro-module.726.4 Direct-to-chip liquid cooling container.747.Liquid-cooling transformation of the traditional Air-cooling Data Center.777.1 Liquid coolin
29、g transformation of Chilled water system.787.2 Liquid cooling transformation of Direct Expansion Air Conditioning system.798.The operational challenges of the liquid cooling system in the AIDC.838.1 Reliability verification of the Direct-to-chip system.838.2 Compatibility verification of Direct-to-c
30、hip server.848.3 Division of operation and maintenance interface of Direct-to-chip system.868.4 Operation and maintenance of the Direct-to-chip system.879.The Outlook of the new technology of the AIDC.919.1 Evolution analysis of data center refrigeration technology.919.2 Prospects for the future of
31、popular liquid cooling technology.9210.Summary.105Appendix:Terms and Definitions.107Introduction of the members of the editorial board.109Copyright Notice.1105 5The development of AIDC industry and the opportunities and challenges of liquid cooling technology1.1 The challenges of energy consumption
32、and heat dissipation brought by the rapid development of GPU chips1.2 Large-scale deployment brings new challenges to cabinet cooling1.3 The challenges of scale and energy efficiency in AIDC1.4 The challenge of water efficiency WUE in AIDC1.5 Opportunities and challenges of liquid cooling technology
33、 application1.The development of AIDC industry and theopportunities and challenges of liquid coolingtechnology1.1 The challenges of energy consumption and heatdissipation brought by the rapid development of GPU chipsWith the development of chips such as CPU,GPU,NPU and TPU,the AI industry based on l
34、arge models isable to iterate rapidly.for example NVIDIAs Pascal GPU,introduced in 2016,a rapid increase in the Intelligencecomputing power within just 2 years.As a result,during the training and fine-tuning large-scale artificialintelligence(AI)models,the energy consumption required formodel evolut
35、ion under the same computingpoweris steadily decreasing.Data presented by NVIDIA founder Wong at the 2024 GTC conference(figure 1-1)shows that when training models such as GPT-moe-1.8 t,from the P100 chip to the current B100 chip,the energyconsumption per token is reduced to 1/45,000 of the original
36、.Figure 1-1 B100 GPU reduces energy consumption per Token to 1/45000 compared to P100 processingTo put this into perspective,the total energy required to train a GPT4-moe-1.8 T over a 10-day period isshown in table 1-1.This highlights that the improvement of chip computing power is the key to make c
37、omputing1 1power more accessible.Only when the cost of computing power becomes sufficiently low,can the intelligentcomputing industry flourish and enabling the widespread adoption of technologies andempoweringcountlessindustries.Table 1-1 P100 to B100(based on GPT4-moe-1.8 T 10-day benchmark)Year201
38、62018202020222024GPUP100V100A100H100B100Peak(TFLOPS)191306204,00020000Training and inference accuracyFP16FP16FP16FP8FP4Reasoning joule per token17,0001,200150100.4Electricity required for training(GW/h)100014040133However,the rapid development of chips in the application also encountered infrastruct
39、ure bottlenecks andchallenges.With the rapid development of chip computing Power,Chips TDP(Thermal Design Power)-heatdissipation power climbs rapidly.From NVIDIA V100 to GB200 chip cooling power changes,and future Rubinseries power consumption projections(figure 1-2),chip cooling power will soon rea
40、ch 1,200 W or above.Fig.1-2 Comparison of chip computing power and TDP rising trend2 2According to NVIDIAs latest plan,the GB200 Computing Power Module with the latest chip architecturewill have a cooling power of 5,400W(two GB200s).Such high chip power density poses serious challenges to theinfrast
41、ructure of powering and cooling GPU servers.So as the Blackwell family of chips begins to be deployed inlarge numbers,cooling technologies from chips to servers to data centers will need to iterate quickly to adapt to thecooling challenges posed by the large-scale deployment of AI chips.1.2 large-sc
42、ale deployment brings new challenges to rackcoolingIn addition to the rapid increase of chip heat dissipation power,the level of network architecture,networkbandwidth and network delay directly affects the effective computing power of the cluster,and indirectly affectsthe heat density of the cabinet
43、.The powerful parallel computing ability of GPU greatly improves the computing performance.With theincreasing amount of computing data,GPU needs to exchange a large amount of data,therefore,GPUcommunication performance has become a very important indicator.In the distributed training of AI cluster,c
44、ommunication is an essential part,and it is also more system overhead than single machine training.The timeratio of communication and computation often determines the upper limit of speedup in Distributed ML systems.Therefore,the key to distributed machine learning is to design a communication mecha
45、nism,so as to reduce thetime ratio of communication and calculation,and train a high-precision model more efficiently.The training of AI large-scale model is based on the large-scale cluster of GPU.In the process of modeltraining,a large amount of data exchange between GPUs is required,which brings
46、a large amount ofcommunication traffic between nodes.The scale of computing power composed of GPU clusters is not simplyobtained by multiplying the number of GPU by the computing power of a single unit.The effective computingpower of the cluster is positively correlated with the scale of the network
47、,the performance of the network and thereliability of communication.In general,the following formulas can be used to evaluate:Cluster effective computing power GPU single card computing power*total number of cards*linearspeedup ratio*effective running time1The total number of units:the capacity of n
48、etwork equipment determines the scale of GPUcluster network.AIDC adopts non-convergent network architecture.Under the two-layer network architecture,the maximum number of GPU expansion=P 2/4(P is the number of switch ports).3 32Linear Speedup:network communication delay determines the linear speedup
49、 of clustercomputing power.In distributed scenarios,a single computation time contains the computation time of asingle unit and the communication time between cards,resulting in an acceleration ratio of 1.3Effective Running time:network reliability determines the effective uptime of the GPU cluster.
50、Long-term training network instability has a greater impact.When the network failure,the network will fallback to the last breakpoint for re-training,or the whole task will start from begining.4It can be seen that the total limited computing power of the AIDC is determined by the total unitrunning t
51、ime,network architecture,network communication delay and effective running time.In order toachieve large-scale deployment of GPUs to generate higher computing power,users often need to adoptadvanced GPUs dedicated networking systems and minimize the network routing distance,to reducetransmission lat
52、ency and network costs(network hierarchy,communication cable distance),so also moreGPU servers deployed in the same cabinet.In this way,the total number of units per unit area,network linearspeedup and effective running time will be better,and the effective computing power of the cluster will behigh
53、er.5But it can also cause the power density of the GPU server cabinet to increase as more equipmentis deployed.Taking the most advanced GB200 cabinet product NVL72 on the market as an example,thenumber of GPU cards in one cabinet reaches 72,and the total power reaches 132kW.Considering the abovenetw
54、orking scheme,an 18-cabinet scalable micro module composed of NVL72 contains a total of 576 GPUs,called a“Super Pod”,with a power of 1200 kW,thats equivalent to the power of 240 IT cabinets in atraditional data center.Such a high power density(up to 13 times)also poses an unprecedented challenge tot
55、he cooling of the data center infrastructure.1.3 The challenges of scale and energy efficiency in AIDCWith the rapid development of AI technology,the demand for power in data centers has also ushered in anunprecedented growth.According to projections from the GOLDMAN SACHS(figure 1-3),AI will add ab
56、out200 TWh per year to data center power demand between 2023 and 2030,for a total consumption of nearly 1,100TWh.AIDC will account for about 20%of that demand.4 4Fig.1-3 power consumption forecast of data centerWith the rapid iteration of the large model,the next evolutionary goal of AI is to train
57、and apply amulti-modal large model with a trillion-parameter scale,which contains a large number of videos,pictures,audioand text.A number of enterprise AI labs(including but not limited to:OpenAI/Microsoft,xAI,Meta)are racing tobuild 100,000+GPU clusters.Taking the current mainstream NVIDIA H100 as
58、 an example,the investment tobuild a 100,000-unit training cluster server is over$4 billion,which requires more than 150 MW of ICC IT powercapacity and consumes 1.59 TWh of electricity a year,at a cost of$0.078 per kWh of electricity,thats a total of$123.9 million.The power consumption of the H100 c
59、luster is as follows:the rated power of each GPU itself is 700W,andthe total power of the CPU,network interface card(NIC)and power supply unit(PSU)that make up the server is575W,in general,8 units can form a server module.In addition to the H100 servers,the AI cluster requires storageservers,network
60、 switches,CPU nodes,optical transceivers,and many other ancillary products,which account for10%of the total power of the IT,as well as 10%of the total power of the AI cluster,At the current average APAC Power Usage Efficiency(PUE)level of around 1.5,a 150MW computingcluster would require 75MW of aux
61、iliary power,operating costs run to$60m a year.It can be seen that in thefuture period of time,the total power demand and unit energy consumption ratio PUE will become the key torestrict the construction of AIDC.Among them,the energy consumption of refrigeration system accounts for 70%,which is the
62、key to energy saving and consumption reduction.5 5On the one hand,theconstruction ofAIDC mustfocus on addressing power supply challenges whileensuringthe sustainability of energy.This requires the integration of a variety of clean energy source to establish arobust and sufficient capacity power syst
63、em capable of supporting the rapid development of the AIDC.Secondly,it is necessary to adopt advanced key technologies and facilities,such as efficient power supply and distributionsystem,natural cooling,liquid cooling and other technologiesThese technologies can significantly reduce the PUEand ener
64、gy costs of the AIDC,ultimately enabling the widespread accessibility and affordability of computingpower.1.4 The challenge of water efficiency WUE in AIDCIn addition to energy,water has always been another natural resource of more concern in data centers,how toimprove the efficiency of water use is
65、 an important indicator for the AIDC to achieve sustainable development.Water Utilization Efficiency(WUE)is a key measure of the IT workload and water efficiency of a data center.In the case study in the previous section,for a 100,000-card H100 cluster,IT power was exceed 150MW.If wetake a district
66、PUE of 1.3 and assume a chiller system with WUE of 2m3/MWh,the annual water consumptionwould be 1,708,200 m3,it available for 110,000 households,water consumption is staggering.So it is urgent touse technological innovation to save water resources while developing intelligent computing.Singapore,a t
67、ypical market in the APAC,was used as an example.In 2021,the median WUE for data centersin Singapore that used a lot of water(i.e.used at least 60,000 m3 of net water in the previous year)was 2.2m3/MWh.Based on the data,Singapores Public Utilities Authority(PUB)has proposed lowering WUE to 2.0m3/MWh
68、 to help new and existing data centers.WUE could be brought down to even lower levels over the nextdecade through a number of technological revolutions.1.Optimizing cooling tower water consumption:cooling towers have great potential to reduce WUE,as theyaccount for more than 90%of data center water
69、consumption;The goal of water saving can be achieved byusing more water-saving cooling tower technology,recycling cooling water through sewage discharge toreduce water consumption,and using electrolysis and other technologies to clean cooling water.2.The use of more water-efficient cooling technolog
70、ies,including air-cooled air and liquid cooling,can alsoreduce data center water use significantly.The level of water use and the combination of solutions will be described in a later section of this whitepaper.6 61.5 opportunities and challenges of liquid cooling technologyapplicationAs the above c
71、hapter analysis shows,the GPU computing power as the core of the AIDC,will become themain driving force of data center growth in the next 10 years.Due to the need of technology,high-performanceGPU chips will bring higher heat dissipation power consumption,shorter distance and high bandwidth networki
72、ngtechnology,and increasing power density of data centers.At the same time,large-scale application brings a lot ofpower and water consumption,through more advanced technology to provide lower PUE and WUE.Asrecommended by ASHRAE,liquid cooling is recommended when the TDP of the chip is greater than 3
73、00 W andthe power density of the cabinet is greater than 40 kw.At the same time,the application of liquid cooling can alsogreatly reduce the PUE and WUE.Therefore,the rapid development of intelligent computing provides anunprecedented opportunity for the application of liquid cooling technology in d
74、ata centers.Although the prospect of liquid cooling technology is broad,the process of realization is still full ofchallenges.In more than 60 years of data center development and more than 20 years of cloud data centerdevelopment,air-cooled refrigeration has been absolutely dominant,liquid cooling t
75、echnology is only used insupercomputing sites where scientific research is the main goal.Whether immersion liquid cooling or cold plateliquid cooling,large-scale commercialization under ultra-large-scale and hosted services has not yet been achieved.AIDC of refrigeration technology can successfully
76、complete the transition from air-cooled to liquid-cooled,butalso full of uncertainty.The challenges come from the maturity of various liquid cooling technologies,chip compatibility,infrastructure compatibility,difficulty of operation and maintenance,difficulty of fault handling and maturity ofthe in
77、dustrial chain.In the process of technology integration and switching,there are still problems ofcompatibility and transformation between liquid cooling equipment and existing traditional air-cooled computerrooms,and the transition of air-liquid coexistence needs systematic planning.This whitepaper
78、will analyze theabove-mentioned challenges in the application of liquid cooling technology and try to give solutions.7 7The development of the APAC AIDC2.1 The Current situation and analysis of key regions of data center industry in APAC 2.2 The challenges of climate in APAC AIDC2.3 Application stat
79、us and development trend of liquid cooling in APAC AIDC2.The development of the APAC AIDC2.1 The Current situation and analysis of key regions of datacenter industry in APACAccording to the market research report APAC Data center Update 2024-H1 by consulting firmCUSHMAN&WAKFIELD,reveals that in the
80、first half of 2024,operating capacity in the regions Data centermarket approached 12 GW,with about 1.3 GW of new supply,thats the biggest increase in recent years.At thesame time,the total number of development projects under construction in the whole region is 4.2 GW,and thetotal number of projects
81、 in the planning stage is 12.0 GW,an increase of 2.8 GW since the end of the second halfof 2023.Of the 14 markets in the APAC,the main mature markets are:Chinese mainland(4.2 GW),Japan(1.4 GW),India(1.4 GW),Australia(1.2 GW),Singapore(0.98 GW);The fastest growing markets were Malaysia(2.1 GW)and Hon
82、g Kong(0.58 GW).Here we are from the mature markets and emerging markets to select a few typical detailed analysis.Japan-the Tokyo Rim regionIn 2023,the operating capacity of data centers in the Tokyo Economic Circle exceeded 1 GW and maintainedsteady growth in the first half of 2024.Since the secon
83、d quarter of 2023,the overall operating capacity hasincreased by 14%.By the end of 2023,the market had absorbed an additional 44 MW of ultra-large and hostedservice capacity,with an average of 9 MW of data center operating capacity,and plans to continue to increasecapacity to an average of 40 MW.Con
84、struction of many data centers in the region is currently being delayed by power shortages and a shortageof skilled workforces with infrastructure.The development of a small pastoral data center,for example,hasannounced a delay in bringing services online until 2027-2028,on the premise that it will
85、have access to electricityby 2025.The challenges are also reflected in the progress of the citys power system,which is still lagging farbehind demand,despite a steady increase in capacity and a reduction in the power gap from 340MW to 236MW.Labour shortages are expected to improve ahead of the 2025
86、World Expo in Osaka.9 9Based on this situation,more and more operators have begun to focus on the surrounding areas outsideTokyos central business district(figure 2-1).In the INZAI and Sagamihara regions,for example,land costs aresignificantly lower,electricity supply is unrestricted and demand acco
87、unts for more than 60 per cent of Tokyosfuture data center supply capacity.Keppel signed a memorandum of understanding with Mitsui Fudosan to exploredata center development and investment opportunities in Japan and Southeast Asia.In addition,Keppel DataCenter Fund II(KDCF II)has also signed a forwar
88、d purchase framework agreement with Mitsui Fudosan,based in300,west of Tokyo(part of the Sagamihara Cluster),000 square foot(27,870 m2)permanent property-only datacenter,due to be completed in 2027,will be Keppels first data center project in Japan.Fig.2-1 distribution of data centers in Tokyo Econo
89、mic CircleSingaporeSingapore is currently the hot spot for data center investment in the APAC.By 2024,Singapores total ITcapacity will be 1,347 MW,of which 965 MW(71.64%)is already in operation,101 MW(7%)is underconstruction and 281 MW(20.87%)is planned.Custody business accounted for 55.23%,44.77%of
90、 its ownbusiness,the current vacancy rate is only 8MW.Although Singapore is currently the core of data center products in the APAC,however,the construction andoperation of data center infrastructure,especially the AIDC infrastructure driven by AI,requires a large amount ofland,electricity and water
91、resources,and these resources are relatively scarce in Singapore itself.To meet theexploding demand for data centers in the era of Intelligence computing,the Singapore government is increasinglypushing for cooperation with Malaysia to locate data centers in Johor and Batam.It is reported that,more t
92、han60%of Singapores new data centers will be located in Johor and Patam.1010Malaysia and IndonesiaMalaysia,with its geographical advantage,has become the hottest region for data center investment in theAPAC after Singapore suspended the construction of local data centers in 2019.According to statist
93、ics released by“First Big Data”,there are currently two major clusters of Data centers in Malaysia:Greater Kuala Lumpur andJohor.Details are as follows:Cluster 1:Greater Kuala Lumpur region.The Greater Kuala Lumpur area is one of the earliest developed data center clusters in Malaysia.Companiessuch
94、as CSF Group,Basis Bay and VADS are pioneers in the Greater Kuala Lumpur region market.First datasurvey found(figure 2-2),from 2010 onwards,most international operators through acquisitions to enter theKuala Lumpur market.In 2018,for example,Chindatas Bridge Data Centers acquired the CX2 Data center
95、 fromthe CSF group.Data centers in operation and under construction in Greater Kuala Lumpur have a capacity of211MW,according to published figures.Microsoft announced in April 2021 that it would build its own data centerin Syracuse,which is number one in terms of market share.The remaining companies
96、 with leading market shareinclude NTT,AIMS,Bridge Data Centers and VADS.Fig.2-2 statistics of data center construction in Greater Kuala Lumpur1111Cluster 2:Johor and Batam.According to data disclosed by STRCTURE RESEARCH in its April 2024 REPORT DCI Report SeriesMarket:Johor&Batam(figures 2-3):the n
97、umber of data centers planned for the Johor and Batam region in 2024will reach 82 with a total capacity of 2,000,153 MW.Figure 2-3 data center construction planning and distribution in Johor and Batam region in 20242.2 The challenges of climate in APAC AIDCThe APAC is vast,spanning multiple climatic
98、 zones and marine systems,and thus the climatic characteristicspresent diversity and complexity:Southeast Asia has a typical tropical rainforest climate and a tropical monsoonclimate,and Southeast Asia has a typical tropical monsoon climate,these areas have high temperatures andabundant rainfall thr
99、oughout the year;southern China,southern Japan,and parts of Australia have a subtropicalmonsoon climate or a subtropical humid climate,and these areas have high temperatures and abundant rainfall in1212summer,winter mild and little rain;eastern China,most of Japan and the Korean peninsula,etc.,belon
100、g to thetemperate monsoon climate,hot and rainy summer,cold and dry winter.Main regional climate characteristics and heat dissipation challengesIn this chapter,we will take Singapore,Malaysia and Indonesia as examples to illustrate the climatecharacteristics and cooling challenges of data centers in
101、 Southeast Asia.SingaporeFigure 2-4 annual temperature and humidity in SingaporeSingapore is located at 1 18 N,103 51 E,at the southern tip of Peninsular Malaysia.Singapore is locatedin the intertropical convergence zone and controlled by the tropical rainforest.It has a rainy tropical climate withs
102、mall annual and daily temperature differences.The annual average temperature is between 23-33 C and thehumidity is between 65%and 90%.December is the coldest month of the year with an average temperature ofaround 24 C.June is the hottest month of the year with an average temperature of around 29 C.A
103、ccording to ASHRAEs meteorological data(figure 2-4),Singapore has had an extreme high temperature of36.1 C,an extreme low temperature of 21.7 C,and an extreme wet bulb temperature of 30.3 C over the past 10years.Malaysia1313According to OMDIAs research,the three regions with the most data centers in
104、 Malaysia are Kuala Lumpur,Syracuse,and Johor Bahru.Kuala Lumpur is more than 30 km from Sai City and 300 km from Johor Bahru.,butJohor Bahru.is close to Singapore.Therefore,the climatic characteristics of Kuala Lumpur are selected foranalysis in this chapter.Kuala Lumpur is located at 3 08 N,101 42
105、 E,on the west coast of Peninsular Malaysia.Kuala Lumpurhas a tropical rainforest climate with summer-like weather all year round with plenty of sunshine and rainfall.Annual and daily temperature range is small,the annual average temperature between 23-34 C,humidity between70%to 95%.January is the c
106、oldest month of the year,with an average temperature of around 27 C.May is thehottest month of the year,with an average temperature of around 29 C.Fig.2-5 annual temperature and humidity in Kuala LumpurAccording to ASHRAEs meteorological data(figure 2-5),Kuala Lumpur has had an extreme hightemperatu
107、re of 36.9 C,an extreme low temperature of 21.3 C,and an extreme wet bulb temperature of 31.3 Cfor the past 10 years.1414IndonesiaAccording to OMEDIAs research,the regions with the most data centers in Indonesia are Jakarta,Surabaya,Bandung,Batam and Mindanao.Although these five cities are relativel
108、y scattered,their climate characteristics arevery similar.This chapter therefore takes Jakarta as an example to analyse the climate characteristics of thecountry.Jakarta is located at 6 09 S,106 49 E,on the northwest coast of Java.Jakarta has a tropical rainforestclimate with high temperatures and r
109、ainfall throughout the year,with no obvious seasonal variations.Annual anddaily temperature range is small,the annual average temperature between 24-32 C,humidity between 60%to80%.The dry season(May to October)had a relatively high temperature between 29-32 C,and the wet season(November to April)had
110、 a relatively low temperature around 28 C.Fig.2-6 annual temperature and humidity in JakartaAccording to ASHRAEs meteorological data(figure 2-6),Kuala Lumpur has had an extreme hightemperature of 37.7 C,an extreme low temperature of 19.2 C,and an extreme wet bulb temperature of 729.3 Cover the past
111、10 years.1515Cooling challengesFrom the above analysis,it can be seen that the climate characteristics of several core cities in Southeast Asiaare very similar,mainly as follows:high temperature and high humidity throughout the year,small annualtemperature difference and daily temperature difference
112、.From the perspective of heat dissipation in data centers,the climatic conditions in Southeast Asia are notsuperior.The challenge is that the humid and hot environment makes the energy saving of the refrigeration systemvery difficult,mainly relying on mechanical refrigeration,and the time available
113、to use the natural cold source isvery limited,which eventually leads to a high PUE.To sum up,the heat dissipation of data centers in Southeast Asia can be considered from the following fouraspects:1.Increasing the temperature set point of the server room can effectively reduce energy consumption.2.m
114、ake full use of water resources,using efficient water cooler,and with the water side of natural cooling.3.Adopt advanced energy-saving technologies,such as liquid cooling and waste heat recovery.4.fine operation and maintenance,optimize the air distribution,the use of intelligent temperature control
115、system.2.3 Application status and development trend of liquidcooling in APAC AIDCIn the era of non-intelligent computing,liquid cooling is more deployed in super data centers,andsingle-phase immersion is the mainstream.The large-scale commercial deployment of AI-oriented AIDCincreasingly utilize col
116、d-plate liquid cooling methods.At present,there are not many large-scale deployment ofcold plate liquid cooling data centers in the world.We take Tesla xAI colossus data center as an example tounderstand the current situation of large-scale deployment of liquid cooling in AIDCs(AIDC).TeslasxAIColoss
117、us Data Center currently has 100,000 NVIDIA H00 GPUs deployed.The server is 4U High and contains 8GPUs.The cabinet can hold 8 servers,so the cabinet contains 64 GPUs.An array of 8 cabinets contains 512 GPUs.The Colossus cluster has more than 1500 racks and nearly 200 arrays(figure 2-7).1616Figure 2-
118、7 Xai Colossus computer room layoutThe liquid Cooling technology in the Tesla xAI Colossus cluster is designed with liquid-cooled cabinets,eachcontaining a rack-mounted Cooling Distribution Unit(CDU).In order to ensure the reliable operation of the liquidcooling system,the core components of the CDU
119、 in the cabinet,such as the circulating pump and the power supply,are designed with N+1 redundancy and support online maintenance(figure 2-8).Fig.2-8 Cooling capacity distribution unit inside Tesla Xai Cabinet1717The 8 servers in the cabinet have liquid cooling inlets and outlets,which are connected
120、 with the coolingcapacity distribution unit through the branch pipe Manifold for heat dissipation.The rack still retains a fan systemfor cooling low-power components such as DIMM,power unit,BMC,and NIC.To maintain the thermal balancein the cabinet,the server cabinet is also equipped with a Rear Door
121、 Heat Exchange(RDHX)(figure 2-9).Theserver fan draws cold air from the front and discharges it from the back,which is then processed by the Rear DoorHeat Exchange.Fig.2-9 xAI back door heat exchanger processorIn the APAC(excluding China),the largest and fastest data center construction in the last t
122、wo years isMalaysia.In the past two years,Chinas Data Center operators led by Chindata Overseas Company Bridge DataCenter and GDS,as well as local mature operators such as NTT and STT in Southeast Asia,are actively deployingliquid-cooled Data centers,at present,the scale of liquid-cooled data center
123、s in Malaysia has reached more than150 MW.From 2025,with the large-scale deployment of NVIDIA GB200 NVL liquid-cooled cabinet business,liquid-cooled data centers in the APAC will also enter a period of rapid growth.As the largest independent market in the APAC,China has systematic plans for Intellig
124、ence data centers.Atthe 2024 Computing Power Conference,the China Academy of Telecommunications released the“ResearchReport on the development of the intelligent computing industry(2024)”(figure 2-10).As of June 2024,more than250 Intelligence data centers have been built or are under construction in
125、 China,and more than 40 have beencompleted.1818Figure 2-10 the layout of China AIDCAmong them,China Telecom Lingang Data center is one of the excellent cases.As one of the east data,westcomputing projects data centers in Shanghai,Lingang Computing Power Company has built the first domesticsingle poo
126、l ten-thousand units level liquid-cooled computing power cluster in the Yangtze River Delta,which hasmore than 5 Eflops of computing power and can support the training of trillion-parameter large-scale models.Thisachievement has provided surging computing power support for research institutions such
127、 as scientific researchinstitutions and large model companies in AI computing,Deep Learning,training reasoning and other research.Itsoutstanding characteristics at the infrastructure level are as follows:The first is to build an innovative“two resiliences and one optimization”infrastructure.Through
128、theflexible power supply,flexible cooling and air distribution optimization of the“two resiliences and oneoptimization”,the port computing network realizes the rapid delivery of multiple computing power combinations(figure 2-11).In terms of flexible power supply,through the“Small busbar+flexible she
129、lter”distribution systemof cabinet equipment,Cabin 1 solves the cabinet power change demand caused by different customer clusterdeployment modes.In terms of flexible cooling,a variety of data center cooling methods such as chiller system,heat pipe multi-connected system and liquid cooling system are
130、 used,and the infrastructure tube wells and1919interfaces are reserved as a whole to realize the flexible application of refrigeration technology.In terms of airdistribution optimization in the computer room,the air distribution of the computer frame and the computer roomis simulated by computationa
131、l fluid dynamics,the cold and hot channels are designed in detail,and theconstructive design and periodic optimization are carried out to comprehensively improve the cooling efficiency.Fig.2-11 elastic refrigeration cycle system and energy efficiency distributionThe second is to solve the industrys
132、difficult problems through innovative research.Through the use ofmedium-distance cross-district RDMA(remote direct memory access)for intra-city computing power networking,the problem that computing resources are scattered and can not be called centrally is solved.By using cross-regionRDMA protocol a
133、nd wide-area topology-aware parallel strategy,this paper proposes targeted model trainingstrategy suggestions for cross-data center large model training,provides appropriate solutions for customers,andimproves training efficiency,cooperate with in-house computing power upstream and downstream ecolog
134、icalpartners provide adaptation testing services to customers.Since it was put into operation in September 2023,a large-scale commercial liquid cooling resource pool hasbeen built to solve the changing power requirements of cabinets through the practice of“Two bombs and oneoptimization”,support flex
135、ible adaptation of single-row cabinet power density from 8kW to 48kW.At present,thecross-regional RDMA network has completed 128 and 512 units and transmission distance of 30 km comparisonverification,and the training efficiency can reach more than 95%of a single cluster,and five large model trainin
136、g2020performance tests have been completed,to support the steady training of basic large models suchas TeleChat2-115B,the availability of the cluster reaches more than 90%.2121The analysis of liquid cooling main technology roadmap and architecture 3.1 General architecture of liquid cooling3.2 Heat c
137、apture3.3 Heat exchange3.4 Cold source3.5 Classification of liquid cooling architectures3.The analysis of liquid cooling main technologyroadmap and architecture3.1 General architecture of liquid coolingThe liquid cooling system consists of several parts,and each part can be further subdivided,but in
138、 essence theliquid cooling system can be divided into three parts:heat capture,heat exchange,and cold source.Figure 3-1 below is a simplified version of the general architecture of liquid cooling.It shows the three corecomponents of liquid cooling system so that we can summarize and discuss them.Fig
139、.3-1 general framework of liquid cooling system3.2 Heat captureLiquid-cooled heat capture refers to the use of liquids to remove heat from IT components.According to thedifferent ways of heat capture,liquid cooling can be divided into various forms.At present,there are threemainstream technical solu
140、tions:DtC,immersion and spray.DtC type2 23 3DtC liquid cooling(figure 3-2)does not require direct contact with the IT heating element,instead,the heat iscarried away by cold plates(closed chambers of heat-conducting metals such as copper and aluminum)mountedon heating elements(usually high-power com
141、ponents such as CPU/GPU),this form of cooling is also known ascontactless liquid cooling.Fig.3-2 cold plate liquid cooling physical diagram and schematic diagramCold plate liquid cooling can be classified into two types based on whether whether the coolant undergoes aphase change in the cold plate:s
142、ingle-phase cold plate and two-phase cold plate.Two types of heat transfer refrigeration architecture is basically the same,the main difference lies in thesecondary side of the coolant is different.The single-phase DtC generally uses water-based coolant with high2 24 4boiling point,and no phase chan
143、ge occurs in the heat transfer process.Two-phase DtC generally uses refrigerantwith low boiling point,and phase change occurs in the heat transfer process.A comparison of the refrigerants is shown in table 3-1 below:Table 3-1 comparison of cold plate liquid cooling fluidsForm of liquidcoolingSingle-
144、phase cold plateTwo-phase cold plateCoolantDeionized waterEthyleneglycol/propyleneglycol aqueoussolutionFluoridated solution(single phase)Fluoridated solution(two-phase)ComprehensivethermalperformanceHigherMediumMediumHighCompatibility ofmaterialsMediumMediumHighHighBoiling PointHighHighHighlowFreez
145、ing Point00 C0 0 C0 0 C045 45 C3 34 4When the water temperature is high and the local climate is favourable,a free cooling system can be fullyutilized,while when the water temperature is low,a mechanical cooling system is generally required.There is also a special case:some old computer rooms will a
146、lso have intelligent computing needs,so theyneed to be transformed into liquid cooling,but the common problem is that a new set of cold sources for liquidcooling can not be added to the site,so we can only use the original precision air conditioning as a cold source.Therefore,there are three types o
147、f cold source:free cooling system,mechanical cooling system(with naturalcooling),the original precision air conditioning free cooling system.Natural cold source system can be divided into:open circuit cooling tower,closed circuit cooling tower,drycooler,pump drive two-phase system.Open circuit cooli
148、ng tower system:Open circuit cooling towers are widely used in various refrigeration scenarios.Theiradvantages includehigh heat dissipation efficiency,small footprint and low price.However,they also have disadvantages,such ashigh WUE and poor operating water quality.Therefore,when used in liquid coo
149、ling system,it is necessary toimplement additional plate heat exchanger and pump group at the cooling tower outlet to avoid CDU plateexchanger scaling.The system schematic diagram is shown in figure 3-7 below:Fig.3-7 working principle diagram of open cooling tower systemClosed circuit cooling tower
150、system:3 35 5The closed-circuit cooling tower is composed of internal circulation and external circulation.The internalcirculation provides cooling water to the system.It is a closed system,so the water quality is higher,there is noneed to add the plate changer,and the WUE will be lower than the ope
151、n circuit cooling tower.The disadvantage isexpensive and bulky.The system schematic diagram is shown in figure 3-8 below:Fig.3-8 working principle diagram of closed cooling tower systemDry Chiller system:Dry cooler that is dry cooler,its tube is the coolant,and the ambient air direct heat transfer,w
152、orking processwithout water consumption.The advantage is that the WUE is 0(or very low)and the price is cheap.Disadvantages are lower heat transfer efficiency and higher requirements for air quality and ambient temperature.The cooler can also be equipped with a water spray system to enhance the heat
153、 transfer capacity of the coolerduring high temperature seasons.The system schematic diagram is shown in figure 3-9 below:3 36 6Fig.3-9 working principle diagram of dry coolerPump-driven two-phase system:The pump-drive two-phase system primarily consists of a fluorine pump,a condenser and a liquid s
154、toragepipe,utilizingphase change cooling.The system offers advantages such as high heat transfer efficiency and theelimination of concerns related to anti-freezing and water quality treatment.However,its drawbacksincludeshigher cos and complex maintenance requirement.The systems condensercan be eith
155、er anair-cooled condenser or an evaporative condenser,the former has aWUE of zero,while the latter offershigh heat transfer efficiency.The system schematic diagram is shown infigure 3-10 below:3 37 7Fig.3-10 working principle diagram of pump flooding two-phase systemIn addition to the above natural
156、cold sources,there are also some more efficient natural cold sources,such asindirect evaporative cooling towers,dry-wet combined cooling towers,etc.They are based on the above naturalcold source on the basis of the optimization of expansion,can use the same liquid cooling architecture.Mechanical coo
157、ling system(with natural cooling)Mechanical cooling means a direct expansion system with a compressor.The system can provide a lowertemperature of the primary side cooling fluid to meet the requirements of the liquid cooling system,and is notlimited by the ambient temperature.In the application of l
158、iquid cooling systems,free cooling modules are added to mechanical cooling to improveyear-round energy efficiency.It can be divided into:air-cooled Chiller+water side free cooling system,water-cooled chiller+water side free cooling system,magnetic suspension phase transition+fluorine pump freecoolin
159、g system.Air-cooled chiller+water-side free cooling systemThe common configuration of the system is a set of dry cooler or adiabatic cooler on the basis of theair-cooled water chiller(can be additionally configured or integrated into the chiller),partial free coolingtransition season and complete fr
160、ee cooling in cold season are realized.The advantage of this system include the elimination ofthe need for a cooling water system,easiertoinstall,great adaptability,and has a WUE of 0(dry cooler)or very low(adiabatic cooler).The disadvantage3 38 8include susceptibility environmental factors during o
161、peration process,and lowenergy efficiencycompared withthe water-cooled chiller system.The system schematic diagram is shown in figure 3-11 below:Fig.3-11 working principle of air-cooled chiller+water-side free cooling systemAir-cooled chiller+water-side free cooling systemMechanical cooling,partial
162、free cooling and complete free cooling can be realized respectively by adjustingthe states of valves 1,2 and 3.The working states are as follows:Table 3-7:Table 3-7 work mode switch tableValve 1 statusState of Valve 2Valve 3 statusOperating modeTurn it onOffOffMechanical coldOffTurn it onOffPart mec
163、hanical cold+part naturalcoldOffTurn it onTurn it onCompletely natural coldWater-cooled chiller+water-side free cooling system:The system is widely used in the chilled water data center,it is in the water-cooled chiller on the basis of anadditional set of plate change components to achieve the use o
164、f natural cold source.The system has the advantages of high refrigeration efficiency,stable operation and simple operation andmaintenance.The disadvantages are large investment in the early stage and large water consumption.3 39 9The system schematic diagram is shown in figure 3-12 below:Fig.3-12 wo
165、rking principle of water-cooled chiller+water-side free cooling systemMechanical cooling,partial free cooling and complete free cooling can be realized respectively by adjustingthe states of the valves 1-4.The working conditions are as follows:table 3-8:Table 3-8 work mode switch tableValve 1 status
166、State of Valve 2Valve 3 statusValve status 4Operating modeTurn it onOffOffTurn it onMechanical coldOffTurn it onTurn it onTurn it onPart mechanical cold+part natural coldOffTurn it onTurn it onOffCompletely natural coldMagnetic suspension phase change+fluorine pump free cooling systemThe system cons
167、ists of magnetic suspension compressor,fluorine pump,liquid storage tank,valves,heatexchanger and other components.The condenser can choose dry condenser or evaporative condenser,the formerhas low WUE and the latter has high energy efficiency.In this scheme,water is not used as the secondary refrige
168、rant,and the second refrigerant is directly deliveredto the CDU,corresponding to the type of CDU is L2R or R2R.The advantages are high heat transfer efficiency4040and good stability of oil-free system.The disadvantages are expensive and more difficult to maintain than thewater system.It has two oper
169、ation modes:mechanical cooling and fluorine pump natural cooling.When the temperature islow,the fluorine pump works alone to achieve complete natural cooling.The system schematic diagram is shown in figure 3-13 below:Fig.3-13 working principle of magnetic levitation phase change systemOriginal Preci
170、sion Air conditioning systemWhen using the original precision air conditioning in the computer room as the cold source of the liquidcooling system,the corresponding CDU type is L2A or R2A,and the installation form can be rack type or cabinettype.The heat from the liquid-cooled server is transferred
171、to the air in the computer room,and finally dischargedto the outside by the condenser of the precision air conditioner.Liquid cooling retrofit projects have many constraints,so not every form of heat capture is applicable.Atpresent,the cold plate liquid cooling has the best compatibility with the lo
172、ad-bearing,cabinet and server of theoriginal computer room,so the liquid cooling transformation project is mainly cold plate liquid cooling.The schematic diagram is as follows figure 3-14:4 41 1Fig.3-14 schematic diagram of liquid cooling transformation of original air conditioning systemSuggestion
173、of cold source selectionThe cold source selection recommendations in this section are for new liquid-cooled data centers.Whenselecting a cold source,the following considerations should be considered:1、the primary side of the liquid temperature grade;2、Climatic conditions(temperature,humidity,tempera
174、ture range,etc.)3、Water Resources and WUE policy;4、Technical factors(energy efficiency,reliability,cooling medium,etc.)5、Economic factors(initial investment&operating costs)6、Other factors(construction period,scalability,building form,etc.).The cold source selection recommendations discussed in this
175、 paper are shown in figure 3-15 below:4 42 2Fig.3-15 suggestion of cold source selectionNote:It should be noted that the temperature of the primary liquid supply is related to the local meteorologicalparameters,and the cold source form may be different in different regions with the same liquid suppl
176、y level.3.5 Classification of liquid cooling architecturesThe three core components of a liquid cooling system can be combined to form a variety of liquid coolingarchitectures.In the form of heat capture,the application cases of spray liquid cooling are too few to form acomplete industrial chain.As
177、a result,the liquid-cooled configurations are still dominated by the cold plate andimmersion configurations,which can be grouped into eight configurations as shown in table 3-9 below:4 43 3Table 3-9 classification of liquid cooling structuresSerialnumberCold sourceHeat captureCDUMechanical cold sour
178、ceCold Plate typeCabinet typeRack typeImmersionCabinet typeRack typeNatural cold sourceCold Plate typeCabinet typeRack typeImmersionCabinet typeRack typeEach of the above architectures has its own scenario,and the appropriate liquid cooling architecture can beselected according to table 3-10 below:T
179、able 3-10 characteristics of various liquid cooling structuresBoundary conditionsCold sourceHigh ambient temperature&low supplytemperatureLow ambient temperature&high supplytemperatureCDULarge-scale deploymentRapid deploymentHeat captureHigh Compatibility&low costPue Low&high performance computingHi
180、gh ambient temperature&low supply temperatureHigh Compatibility&low costLarge-scale deploymentRapid deploymentPue Low&high performancecomputingLarge-scale deploymentRapid deploymentLow ambient temperature&High Compatibility&low costLarge-scale deployment4 44 4high supply temperatureRapid deploymentP
181、ue Low&high performancecomputingLarge-scale deploymentRapid deployment4 45 5The characteristics analysis of air-liquid hybrid refrigeration architecture4.1 Wind-liquid fusion is the only way for the application of liquid cooling in AIDC4.2 Common air-liquid fusion architectures4.3 Comparative analys
182、is of WUE,Pue and TCO under different wind-liquid fusion architectures4.4 Architecture selection recommendations4.The characteristics analysis of air-liquid hybridrefrigeration architecture4.1 Wind-liquid fusion is the only way for the application ofliquid cooling in AIDCAccording to the Uptime Inst
183、itutes report,shown in figure 4-1,the majority of data centers currently useliquid-cooled panels.Therefore,it is necessary to focus on cold-plate liquid cooling at this stage.As mentionedabove,cold plate liquid cooling mainly solves the problem of heat dissipation of high-power components inservers.
184、This part of the heat accounts for about 50%-85%of the total heat of the server,and the rest of the heatdissipation still depends on the traditional air cooling form.Fig.4-1 application proportion of liquid cooling technologyThis combination of liquid and air cooling is known as“Air-liquid fusion”.T
185、he liquid-cooled architecture hasbeen described previously,and the air-liquid hybrid architecture only requires the addition of an air-cooled sectionto the liquid-cooled architecture.4 47 74.2 Common air-liquid fusion architecturesThe air-liquid fusion structure can be divided into three parts:liqui
186、d cooling and air cooling part of thesecondary side and cold source of the primary side.Among them,the liquid cooling part is determined,and the aircooling part and the cold source have many changes.According to whether liquid cooling and air cooling share thesame set of cold source,the architecture
187、 can be divided into two types:“Air and liquid source”architectureand“Air and liquid independent”architecture.Air-liquid co-source architectureAccording to the different combinations of cold source and air cooling part,the air-liquid homologousarchitecture can be divided into:Cooling Tower+dynamic d
188、ouble cold source architecture,cold water host+chilled water terminal architecture,magnetic suspension phase change system+heat pipe terminal architecture.Cooling Tower+dynamic double cold source architectureThe outdoor side cold source of the structure uses cooling towers(open,closed)or dry coolers
189、 to providecooling water to the secondary side.The air cooling part of the secondary side adopts dynamic double cold sourceair conditioning,which includes a set of cooling water system and a set of compressor system.Liquid cooling iscompletely natural cooling,and air cooling is partially natural coo
190、ling.The architecture model is shown in figure 4-2 below:Fig.4-2 Cooling Tower+dynamic double cold source structure4 48 8The double cold source air conditioner is a big wind wall with diffuse air supply,which is installed in theequipment room.When the water supply temperature of the cooling tower is
191、 low,the cooling water coil worksalone;when the water supply temperature of the cooling tower is high,the compressor system is opened to makeup the cooling.The current application cases of the architecture are more,and the whole system has no chillers,so the cost islower;and the use of distributed c
192、ooling towers can effectively avoid single point of failure.The double coldsource air conditioner is installed in the equipment room and decoupled from the equipment in the computer room,which is very suitable for leasing business.Cold Water Chiller+chilled water terminal architectureThe outdoor col
193、d source of the architecture uses the air cooler(air cooler,water cooler)with free coolingmodule described above,so the water temperature range can be provided is large.The secondary side air coolingpart uses the chilled water end,which is not limited to the wind wall,but also can be in the form of
194、row,smallwind wall,backboard,etc.The architecture model is shown in figure 4-3 below:Fig.4-3 Chiller+chilled water terminal structure4 49 9Compared with the“Cooling Tower+double cold source air conditioning”,the architecture realizes“Doublecold sources”on the outdoor side,and the compressor system o
195、f the chiller is only turned on when the ambienttemperature is high.The advantage is that the end of the architecture is very flexible,there are many forms of application,canadapt to more scenarios,such as the liquid-cooled cabinet of the prefabricated one-piece solution.Thedisadvantage is that the
196、water supply temperature of the chiller should take into account the terminal chilled waterair conditioning,so the overall energy efficiency will be slightly lower than the“Cooling tower+double coldsource”scheme.In order to improve energy efficiency,the details of the architecture can be optimized,a
197、s shown in figure 4-4below:the outdoor side uses a cooling tower with a water cooler replaced by an integrated plate,and the indoorside uses a water-cooled DX air conditioner for the air-cooled part,cooling by cooling tower.Chillers in liquidcooling systems only deal with low-temperature water deman
198、d,so energy efficiency will be significantlyimproved.The architecture is a full-link cold water solution,and the air conditioning uses water fluorine heat exchangersfor heat dissipation,which can be compatible with dual cold source air conditioning,water-cooled fluorine pumpair conditioning and othe
199、r refrigeration forms.The liquid cooling section is equipped with integrated board watercooler,the compressor is only opened when needed,and can provide a wide range of water temperature.Fig.4-4 Chiller+water-cooled DX air conditioning system5050With the rapid development of AI,the TDP of chips is a
200、lso increasing rapidly,according to OCP research(Figure 4-5):by 2030,the TDP of GPU chips will reach 1.5 kw.Higher calorific value requires a lower coolanttemperature,and the coolant temperature range corresponding to 1.5 kw calorific value is 20 C-40 C.in order tocope with the rapid iteration of th
201、e chip and ensure the long-term use of the refrigeration equipment,the coolingdevice is used to improve the cooling performance of the chip,the reasonable coolant temperature is 30 C,and thecorresponding supply water temperature of the primary side must be lower than 30 C.In this case,the advantage
202、of using the chiller as the primary cold source is obvious,as it can still provide arelatively low supply water temperature during the high temperature season,which the cooling tower obviouslycan not do.The solution of cold source side equipped with cold machine can deal with the development trend o
203、f chippower in the future very well,and it is an excellent solution from the perspective of durability.Therefore,it can be predicted that in the future liquid cooling system,the cooling unit will become anecessary option,and the air-liquid homologous architecture with it as the cold source will also
204、 be more widelyused.Fig.4-5 OCP study on chip power and coolant temperatureMagnetic levitation phase change system+heat pipe end architecture5 51 1A magnetically levitated phase change system with a fluorine pump for free coolingis used as the outdoorcooling source(Figure 4-6).The end of the heat pi
205、pe is used for the secondary side air cooling section,which canbe in various forms,such as big wind wall,small wind wall,row room,backboard,etc.The heat transfer type ofCDU is L2R.Fig.4-6 magnetic levitation phase change+heat pipe end structureCompared with the previous two types of architecture,the
206、 system does not use water as a secondaryrefrigerant,but directly delivers the refrigerant to the CDU and the terminal air conditioning,which can reduce thenumber of heat exchanges and thus bring higher energy efficiency.At the same time,it can also provide lowerprimary side liquid supply temperatur
207、e as the chiller,but it is not dominant in cost and maintenance,so the currentapplication cases are relatively few.Analysis of wind-fluid homologous architecture:The biggest advantage of the air-liquid homologous architecture is that the air-liquid ratio can be adjusted andadapted to flexible deploy
208、ment.For many data centers(especially leased data centers),it is difficult to accurately predict the specificdeployment of the business during its life cycle at the early stage of construction.Different types of servers havedifferent air-liquid ratios.For example,the liquid cooling ratio of the GPU
209、training server can be as high as about85%,while the liquid cooling ratio of the big data storage server is about 50%,and at a certain stage,the two maybe mixed or even all use air-cooled servers.5 52 2Therefore,it is necessary to make clear that the primary cold source is shared in the design stage
210、 of such datacenters,which can be compatible with both air-cooled and liquid-cooled to provide 100%cooling capacity.Andthe cold source can be adjusted with the end of the secondary side to achieve different air-liquid ratios.In addition,the primary cold source of the wind-liquid co-source architectu
211、re needs to be distributed andintegrated when it is applied in large scale.This has the advantage of supporting small-scale phased construction,reducing systemic risk,and simplifying site deployment and commissioning.Wind and liquid architectureAir-liquid independent structure means that air cooling
212、 and liquid cooling use their own independent primaryside cold source respectively.At present,the cooling tower is usually used as the primary side cold source of theliquid cooling part(dry cooler or chiller is used in some areas).This scheme can take into account both energyefficiency and cost.The
213、selection of air-cooled parts is more flexible than the air-liquid homologous architecture.The types of air conditioning commonly used in data centers,for example,fluorine pump free coolingunit,chillersystem,indirect evaporative cooling unit,fresh air system,air-cooled direct expansion air condition
214、ing can beused.The air-cooled part uses the chiller system architecture model,figure 4-7:Fig.4-7 the air-cooled section is constructed with chiller system5 53 3The architecture is most similar to the traditional data center cooling method,with high maturity and goodcompatibility.It is very friendly
215、to the building form and equipment maintenance of the computer room,and thecontrol is simpler and the operation is more stable.The disadvantages are complex piping and high cost.The air-cooled section uses an indirect evaporative cooling architecture model,figure 4-8:Fig.4-8 air-cooled section adopt
216、s indirect cooling structureThe advantage of this architecture is that the combination of indirect evaporative cooling and liquid cooling isvery energy efficient,and both of them are prefabricated equipment,and the delivery cycle is greatly shortened.However,the indirect evaporative cooling unit req
217、uires the number of floors in the building,which is generally nomore than four.The WUE of the system will be relatively high,so traditional precision air conditioning can also beused in water-scarce areas.The air-cooled part uses a precision air-conditioning architecture model,figure 4-9:Figure 4-9.
218、The air-cooled section uses a CRAC architecture5 54 4The air-cooled part of the architecture has a high degree of technological maturity and is not limited by waterresources and building forms.It is a highly versatile solution.If the local winter temperature is low,consideradding fluorine pump free
219、cooling module to improve energy efficiency.Wind-fluid independent architecture analysis:Compared with the wind-liquid homologous architecture,the wind-liquid independent architecture is notsuitable for data centers that require flexible deployment because of the high cost of adjustable wind-liquid
220、ratio.The total cooling capacity of cold sources in a highly resilient room can be compared with table 4-1 below:Table 4-1 cooling capacity requirements of flexible rooms with different structuresMaximum proportionof air coolingMaximum proportionof liquid coolingTotal coolingcapacity of coldsourceAi
221、r-liquid co-sourcearchitecture100%85%100%Wind and liquidarchitecture100%85%185%For business-specific data centers,where the air-liquid ratio is relatively fixed,the advantages of an air-liquidindependent architecture are obvious:the air-cooled and liquid-cooled parts are completely decoupled,and the
222、air-cooled and liquid-cooled parts are fully decoupled,the best combination of air cooling and liquid cooling canbe selected according to the actual situation of the project,and the energy efficiency and reliability are improved.To sum up,both architectures have their own suitable scenarios.For scen
223、arios with uncertain business andelastic cooling,the wind-liquid homologous architecture is preferred;for scenarios with clear business,thewind-liquid independent architecture is preferred.At the same time,no matter what kind of architecture is selected,the distributed and integrated design of the o
224、utdoor side cold source should be carried out.4.3 Comparative analysis of WUE,Pue and TCO underdifferent wind-liquid fusion architecturesThe previous two sections introduced various forms of wind-fluid fusion architectures.For a comparison ofWUE,PUE,and TCO,see table 4-2.The following conditions nee
225、d to be clarified before making a comparison:1.New Hills,Malaysia2.The primary side supply temperature was 35 C;5 55 53.The water cooler is used in the chiller system,the evaporative condenser is used in the magnetic levitationsystem,and the cooling tower is used in the liquid cooling part of the wi
226、nd-liquid independent structure4.The proportion of liquid cooling is 50%-85%Table 4-2 comparison of various wind-fluid fusion architecturesThe same source of wind and fluidWind and fluid independenceLiquid coolingschemeCold Plate typeAir-cooledsolutionCooling Tower+dynamicdouble coldsourceWater Cool
227、er+CrahMaglev phasechange+end ofheat pipeIndirectevaporativecoolingChiller systemTraditionalprecision airconditioningWue(L/kWh)1.6-2.01.8-2.21.0-1.41.4-1.81.6-2.00.4-0.8Pue1.21-1.301.24-1.341.22-1.321.21-1.301.22-1.321.28-1.37TCOA1.5 A2.0 A1.3 A1.7 A1.4 ATable 4-3 below is the grading of PUE in the
228、Green Data center specification-2024 issued by the MalaysianCommunications and Multimedia Commission(MCMC).The Best Excellent rating can be achieved when usingan air-fluid fusion architecture.Table 4-3 PUE classification of data centers in MalaysiaLowestGoodExcellent1.91.5-1.91.54.4 Architecture sel
229、ection recommendationsIn the application of cold plate liquid cooling,both air-liquid homologous and independent architectures havetheir own suitable scenarios,and the best architecture should be selected according to the specific situation of theproject.From what has been described above,an air-liq
230、uid hybrid architecture with cold plate liquid cooling can bedesigned with reference to figure 4-10 below:5 56 6Fig.4-10 proposal for selection of wind-fluid fusion architectureIt should be noted that the choice of architecture should be flexible,and be determined based on the specificcircumstances
231、and requirements of the analysis.For example,if the business need is fixed but the site space islimited,or if there is a desire to simplify the system to reduce the deployment duration,a homologous architecturecould be considered as an alternative5 57 7The selection and analysis of typical AIDC liqu
232、id cooling5.1 Liquid-cooled architecture of large data center5.2 Liquid-cooled architecture of small and medium-sized data center5.The selection and analysis of typical AIDC liquidcooling application scenariosAIDC is the most important computing power production center in the era of AI.It can drive
233、AI models toprocess data in depth with powerful computing power,and continuously produce various intelligent computingsolutions,and through the network to provide organizations and individuals in the form of cloud services.In chapter 3 and Chapter 4,we analyze different liquid cooling technologies,l
234、iquid cooling system schemesand air-liquid hybrid liquid cooling architectures.From the analysis,we found that different solutions andarchitectures have their own characteristics and suitable scenarios.In practice,we need to choose the appropriatearchitecture based on the actual projects environment
235、al conditions,business requirements,and operational goals.In this chapter,the selection of liquid cooling architecture will be illustrated by the examples of large-scaleand small-scale AIDCs.5.1 Liquid-cooled architecture of large data centerCharacteristics of large-scale AIDCsLarge-scale AIDCs are
236、usually equipped with thousands to tens of thousands of high-performance serverswith PFLOPS(quadrillion floating-point operations per second)level or higher computing power,which can meetthe needs of complex intelligent computing.It usually adopts advanced computing architecture and hardwaredevices,
237、such as HPCC cluster,large-scale storage system and high-speed network,to ensure the efficientexecution of computing tasks.Large-scale AIDCs have very high requirements for business continuity,and have certain flexibility indeployment,and can customize services according to customer needs and scenar
238、ios.At the same time,it needs tohave the ability to support flexible switching and expansion of multiple computing modes and architectures.Large-scale AIDCs mainly serve areas such as AI,big data analysis,and deep learning that require strongintelligent computing support.In addition to the AI field,
239、it is also widely used in emerging fields such as theInternet of things and Industrial Internet,and is deeply integrated with more industries,such as health care andtransportation.5 59 9Liquid-cooled architecture of large-scale AIDCSince high-performance computing devices,such as GPU and AI accelera
240、tors,generate much more heat thantraditional servers and the cooling efficiency of air-cooled systems is limited,liquid-cooled mode becomes the bestchoice.As mentioned earlier,cold-plate liquid cooling is preferred for large-scale applications due to its greatercompatibility and maturity.Referring t
241、o chapters 3 and 4,the liquid cooling structure is divided into three parts:heat capture,CDU andcold source,it can be divided into two types:“Air-liquid co-source”and“Air-liquid independent”.The finalarchitecture can be determined in two steps at design time:First,list all the known conditions,then
242、select each part according to these conditions,and finally combinethese parts according to 1-8 in chapter 3,to obtain the most suitable liquid-cooled architecture.Second,according to the type of business,determine the“Air and liquid homologous”or“Air and liquidindependent”architecture,and then choos
243、e the appropriate air cooling program with it.We take Southeast Asia as an example,the primary side supply fluid temperature of 35 C,each part as far aspossible to choose high energy efficiency,moderate cost,mature technology scheme.When selecting the cold source,a comprehensive analysis should be m
244、ade:from the point of view of theliquid supply temperature of the primary side,the cooling tower can be selected;considering the long-term use ofthe refrigeration equipment and the flexibility of the air cooling end,the water cooler with free cooling is moresuitable.Step 1,Figure 5-1:Fig.5-1 selecti
245、on of liquid cooling architecture for large data center6060Step 2,figure 5-2:Fig.5-2 selection of wind-fluid fusion architectureAfter the above two steps,the construction of the wind-fluid fusion architecture is completed.Thearchitecture model is as follows:Figure 5-3:Fig.5-3 architecture diagram of
246、 air-liquid homologous schemeFigure 5-4 for the separate Feng Ye scheme:6 61 1Fig.5-4 architecture diagram of wind-fluid independent schemeThe wind wall is recommended for both air-cooled terminals,which can be installed in the equipment room todecouple from the equipment in the computer room.5.2 Li
247、quid-cooled architecture of small and medium-sizeddata centerCharacteristics of small and medium-sized computing centersSmall and medium-sized Intelligence computing centers are typically compact,generallyequipped withtens to hundreds of servers,containing limited computing resources,storage resourc
248、es and network resources.Based on high performance computing,it integrates advanced algorithms such as deep learning and machinelearning to focus on processing large-scale and complex data analysis and intelligent decision-making tasks.Compared with large-scale computing centers,small and medium-siz
249、ed computing centers are more flexiblein deployment and can provide customized services according to customer needs and scenarios.In a rapidly6 62 2changing market environment,it is able to respond more quickly to customer needs,providing timely technicalsupport and solutions.Small and medium-sized
250、Intelligence computing centers are more focused on meeting the computing needs ofspecific industries or scenarios.For example,they provide customized solutions for healthcare,retail and otherfields.In addition,some areas of high performance computing,such as education and scientific research,are bas
251、edon small and medium-sized AIDC.Liquid-cooled architecture of small and medium-sized computing centersCold plate liquid cooling is also suitable for small and medium-sized AIDCs,but compared with large AIDCs,it requires slightly lower compatibility and maturity of liquid cooling technology,and the
252、heat generation ofhigh-performance computing is very large,therefore,immersion liquid cooling is also a good choice.According to the characteristics of small and medium-sized AIDCs,liquid cooling architecture should beflexible and efficient,simple system,rapid deployment,“One-button boot”and other r
253、equirements.We still follow the two steps described above to build a liquid cooling architecture,for example,in SoutheastAsia,the primary side supply temperature of 35 C.Step 1,Figure 5-5:Fig.5-5 selection of liquid cooling architecture for small and medium-sized data centers6 63 3Step 2:Figure 5-6:
254、Immersion cooling is 100%liquid cooling solution that eliminates the need forair cooling,resulting in arelatively simple structure.A cooling tower can be used as the cold source.One-piece TANK is recommended,asit includesa built-in CDU,requiring only a few simple operations at the project site.Fig.5
255、-6 immersion liquid cooling schemeCold plate liquid cooling scheme,Figure 5-7:when applying cold plate liquid cooling in small andmedium-sized AIDCs,the system should prioritize integration of the air-cooled and liquid-cooled components toenablerapid deployment,simplify the system,reduce engineering
256、 requirements,etc.Bothair and liquid share thesame cold source,which significantly streamliningthe field pipeline.The integrated cabinet integrates air coolingand liquid cooling with backplane air conditioning installed in the rack,to handlethe air-cooled part,this steprequires low water temperature
257、 and relies on a mechanical cold source.The program is highly prefabricated,allowing forrapid deployment,“One-button Boot”and other functions,making itthe preferred program.6 64 4Fig.5-7 prefabrication integrated liquid cooling scheme6 65 5Prefabricated Technology of liquid cooling system6.1 Develop
258、ment trend and value of Prefabricated Technology of data center products6.2 Prefabricated Technology of Cold source scheme6.3 Integrated liquid-cooling cabinet and liquid-cooling micro-module6.4 Direct-to-chip Liquid cooling container6.Prefabricated Technology of liquid cooling systemThe application
259、 of AIDC and liquid cooling technology,in addition to the introduction of new technology,also poses new challenges to the construction,deployment and engineering of AIDC.Especially in the case ofexpensive chips,new product technology,and insufficient industry engineering experience,there are more an
260、dmore contradictions and challenges between the shorter delivery time that customers want and the high-qualityoperation and maintenance that operators want.In this case,liquid cooling system prefabrication has increasinglybecome a popular choice.This chapter will start with the prefabrication develo
261、pment of data centers,explainingthe formation of prefabrication,subsystem morphology and product characteristics of liquid cooling systems.6.1 Development trend and value of PrefabricatedTechnology of data center productsTraditional data centers not only have a long construction period and high init
262、ial investment costs,but alsotheir subsystems are isolated from each other.The separation of planning and construction,along with thepatchwork construction model,brings great difficulties to later operation and maintenance management.Toaddress these drawbacks,the prefabrication and modularization of
263、 data centers have gradually extended from weakcurrent equipment and environmental equipment in data centers to the entire data center.This development rangesfrom modular components in data centers such as modular UPS,modular temperature control,and modularbusbars,to modular solutions like power/hyd
264、raulic modules,micro-modules,IT modules,and finally to modulardata centers.Under the fully modular and prefabricated design,the individual subsystems are pre-assembled,theproduction process are standardized to ensure each module has consistent quality,Multiple systems are designed ina coordinated ma
265、nner,and the full-system debugging and testing are completed before leaving the factory toensure high quality and high reliability.At the same time,the site only needs to complete the minimalist construction,that can greatly reduce thedifficulty of site management and construction risk,and effective
266、ly improve the reliability of data centers,Prefabricated data centers have the advantages of rapid deployment,scalability,simple operation and maintenance,6 67 7efficient and energy saving,In general,the data center will inevitably develop towards productization,prefabrication and modularization.In
267、addition,with the rapid development of data centers in China,liquid coolingis becoming the trend of data center development in response to the continuously increasing energy consumptionproblem in the development of computing power.According to the 2023-2024 China Liquid Cooling Data Center Market Re
268、search Annual Report releasedby CCID Consulting,the market scale of Chinas liquid cooling market in 2023 was 8.63 billion yuan,with ayear-on-year growth of 26.2%compared with the previous year,2 percentage points higher than the world,and hasmaintained a rapid growth of more than 20%for three consec
269、utive years.It is expected that by 2026,the marketsize of Chinas liquid cooling data center will reach 18.01 billion yuan,with a year-on-year growth of 29.1%.Regarding the prefabrication of liquid cooling system,there are related prefabricated products and solutions for thecold source side,liquid co
270、oling cabinet,liquid cooling micro module,and direct-to-chip liquid cooling container.6.2 Prefabricated Technology of Cold source solutionPrefabricated integrated cooling stationThe prefabricated integrated cooling station is a highly efficient chiller water system that organicallyintegrates the tra
271、ditional chiller plant room system,integrating the chiller,chilled water transmission anddistribution and water treatment system,cooling water transmission and distribution and water treatment system,heat exchange station,power system,and centralized control system.Compared with the prefabricated co
272、olingstation,it has a higher degree of integration and features energy saving and low management&maintenance costs.The main forms of prefabricated integrated cold station are container type and square cabin type.The container type can be installed indoors or outdoors according to the project situati
273、on and climateconditions.Specifically,it can be subdivided into unit prefabricated container-base integrated cold station modeand combined prefabricated container-base integrated cold station mode.Unit prefabricated container-base integrated cooling station(Figure 6-1)can be divided into three types
274、according to different cooling capacity:type 1:unit cooling capacity below 350RT,type 2:unit cooling capacityrange 400600RT,type 3:unit cooling capacity range 7001800RT;type 1 and type 2 are independent,Type 3needs to be spliced,and the splicing method is horizontal or vertical stacking.6 68 8Figure
275、 6-1 Integrated cold station of unit-type prefabricated containerCombined prefabricated container-base mode(Figure 6-2)can be divided into three types.Type 1:chillermodule+plate heat exchanger module+pipeline switching module+pump module.The disadvantage of thismode is that maintenance space is rela
276、tively small and later operation and maintenance is slightly inconvenient;Type 2:chiller module+plate heat exchanger module+pump module;Type 3:chiller module+plate heatexchanger and pump module+pipeline module+maintenance module.Fig.6-2 Integrated cooling station of combined prefabricated containerT
277、he shelter type can be divided into indoor shelter and outdoor shelter.The indoor shelter(Figure 6-3)iscomposed of pipe skids and single-unit equipment skids,etc.The corresponding skid block can be directly hoistedand delivered;the outdoor shelter(Figure 6-4)is composed of chiller module,pump module
278、,cold storage tank andcooling station control room.Figure 6-3 Indoor shelter integrated cooling station6 69 9Figure 6-4 Outdoor square cabin shelter integrated cooling stationIntegrated cold sourceIntegrated cold source is a highly integrated product,which integrates cooling tower,pump,dosing device
279、,constant pressure water supply device,etc.On site,it only needs to be connected to power and water to operate.The integrated cold source type includes closed cooling tower,open cooling tower and indirect evaporativecooling chiller.The integrated cold source has a good application in the direct-to-c
280、hip liquid cooling homologousarchitecture-dynamic dual-cold source system(Figure 6-5).At the present stage,Vertiv technology,and Suganhave launched integrated cold source solutions to deal with direct-to-chip liquid cooling.Figure 6-5 Direct-to-chip liquid cooling integrated cold source(dynamic dual
281、 cooling source)systemThe indirect evaporative cooling integrated cold source(Figure 6-6)can use the indirect evaporative coolingtechnology to produce the cooling water lower than the wet bulb temperature,which can enable the air-coolingsystem and the liquid-cooling system to achieve year-round natu
282、ral cooling in most parts of the country.Thecold source of the air cooling system and liquid cooling system is integrated in the cold station,which greatly7070reduces the difficulty and cost of the system construction(reducing the input of chiller)and minimize the energyconsumption of the system.Tak
283、ing northwest China as an example,the extreme wet bulb temperature is 20-24,and the indirect evaporative cooling technology can produce cooling water below 20-22 all year round to meetthe water supply temperature requirements of the air cooling in the liquid cooling room.At present,the indirectevapo
284、rative cooling technology combined with the direct-to-chip liquid cooling system has achieved 100%freecoolingin Guangdong(high temperature and high humidity areas),Zhejiang(East China)and Shanxi(NorthChina).Figure 6-6 Indirect evaporative cooling and integrated cold sourceThe core of an integrated c
285、old source is to use a set of free cooling cold sources too address all the coolingrequirements for air and liquid cooling throughout the data center.In this way,a large-scale loop pipe network candrive the precision air conditioning of the air side and the CDU of the liquid cooling side,Whether air
286、 side airconditioning or liquid cooling side CDU,their heat transfer cold sources come from the same primary side system.This greatly simplifies the management complexity of the data center.In addition,during the data center construction,a set of free cold sources matching the power capacity canbe b
287、uilt according to the planned capacity of the whole data center,Or at least you can build a big loop pipenetwork first,Then cooling towers and pumps are built in stages.In addition,in order to carry a direct-to-chipliquid cooling server,a free cooling system must be built.In the foreseeable future,t
288、he liquid to air ratio willdefinitely be higher and higher,and the demand for air cooling will be reduced.The proportion of liquid to air inthe integrated cold source system can be adjusted flexibly,and flexible deployment can be realized.For large datacenters,the construction can usually be adopted
289、 in stages,and systematic risks should be strictly controlled.Thesmall granularity design of distributed integrated cold source can also support the construction and expansion ofsmall modules in stages to reduce systemic risks.7 71 16.3 Integrated liquid-cooling cabinet and liquid-coolingmicro-modul
290、eIntegrated liquid-cooling cabinetAt present,the products of the integrated liquid cooling cabinet can be divided into direct-to-chip liquidcooling cabinet and single-phase immersion liquid cooling cabinet according to the liquid cooling form,which areintroduced as follows respectively.Integrated di
291、rect-to-chip liquid cooling cabinet(Figure 6-7)is based on a single cabinet and adopts a modulardesign concept with high integration and high standard design.It integrates independent units such as IT cabinets,power distribution units,enclosure components,refrigeration units,cabling,and integrated o
292、peration andmaintenance.It is composed of the cabinet,Manifold pipelines,liquid-cooled servers,liquid-cooling quick-connect connectors,and plug-in CDU.All the components in the single cabinet are prefabricated,installed and debugged in the factory,which canbe flexibly disassembled and carried out,wh
293、ich not only saves the space of the computer room but also can beflexibly expanded,and realizes the rapid batch deployment on site.The integrated liquid cooling cabinet serves asthe carrier of the liquid cooling equipments,and each equipment is connected with a special liquid cooling hose toensure t
294、he heat dissipation effect.Figure 6-7 Schematic diagram of the integrated direct-to-chip liquid cooling cabinetIntegration of single-phase immersion liquid cooling cabinet(Figure 6-8),adopts the single phase immersionliquid cooling technology.The heat-generating electronic components such as chip,mo
295、therboard,memory,hard7 72 2plate are directly immersed in the insulating and chemically inert coolant,The heat generated by the electroniccomponents is carried away by the circulating coolant.Due to the better cooling uniformity of the heat-generatingcomponents,the heat transfer efficiency is signif
296、icantly improved.At the same time,the built-in monitoring module is built to monitor the power and operating environment inthe liquid cooling cabinet,implement real-time control according to the operating condition,and control thesupply/return fluid flow of each cabinet.The primary CDU adopts a cent
297、ralized liquid supply scheme.This notonly meets the requirement of centralized heat exchange but also enables independent operation and maintenance.Flange links can be used between the secondary side supply and return fluid pipeline and the cabinet,andvalves shall be set on the pipeline to ensure th
298、at the cabinet can be disassembled and maintained.The supplycircuit can adopt double in and double out pipeline design.The coolant circulation pipeline and joint shall havegood sealing performance and compatibility,and there shall be no potential risks such as corrosion and leakage inthe service lif
299、e cycle of the system.The cabinet side pipeline may be made of polymer materials or seamless steelpipe.Fig.6-8 Schematic diagram of integrated single immersion liquid cooling cabinetLiquid cooling micro moduleLiquid cooling micro module products(Figure 6-9)integrated each subsystem such as air cooli
300、ng,liquidcooling,power distribution,cabinet,air containment,monitoring,lighting,wiring,etc.Each subsystem is highlystandardized and intelligent,capable of independent operation and joint management.This takes the complexliquid cooling engineering into simple modular products.Through the modular desi
301、gn and factory prefabrication,itcan reduce the cost of data center design and operational costs,and can realize the deployment speed by 50%.Theexisting liquid cooling micro module products adopts Direct-to-chip liquid cooling to meet the requirements ofhigh heat dissipation and high power density sc
302、enarios.7 73 3Figure 6-9 Liquid cooling micromodule product6.4 Direct-to-chip liquid cooling containerThe Direct-to chip liquid cooling container(see Figure 6-10)is a container-based data center solution thatemploys a liquid-cooling refrigeration system plus an auxiliary air-cooling refrigeration sy
303、stem.It integratescold-plate IT equipment,and the power of a single cabinet can reach 20 kW to 50 kW.It is mainly composed of container structure system,power supply and distribution system,refrigerationsystem(liquid cooling system+auxiliary air cooling system),liquid cooling server cabinet system,f
304、ire protectionsystem,security and power and environment monitoring system,etc.Direct-to-chip liquid cooling container is akind of edge data center product with high density,energy efficiency and independence,which can meet the strictoutdoor environment operation conditions,and has many advantages su
305、ch as economic flexibility,rapiddeployment,on-demand construction and so on.At present,the standard container specifications are 20 feet,40 feet,and 45 feet,and the non-standard sizescan be customized according to different projects.Taking a brand of 45 foot direct-to-chip container as an example,th
306、e size of this product is 13,7163,0003,600mm(WDH),the average power consumption of single cabinet is20kW.A single module can place 6 liquid cooling cabinets,the maximum IT total power in one single module isless than 120kW,the total electric power of single module is less than 150kW,and the maximum
307、number of nodesin single cabinet is 24.Liquid cooling configuration is with compatible with design,primary side supports 37supply water(deionized water);The primary-side cold source adopts the N+1 redundancy design and is configuredwith ring pipeline network.The primary-side circulation pump operate
308、s in the(1+1)backup mode,and the7 74 4liquid-cooled CDU also has(1+1)backup.The secondary side supports a maximum water supply temperature of40C(using deionized water).It also uses a ring pipeline network to minimize the impact range of faults caused bypipeline leakage on the secondary side.A dry co
309、oler is adopted as the cold source.Figure 6-10 Direct-to-chip liquid cooling container(45 ft)7 75 5Liquid cooling transformation of the traditional7.1 Liquid cooling transformation of chilled water system7.2 Liquid cooling transformation of Direct Expansion Air Conditioning system7.Liquid-cooling tr
310、ansformation of the traditionalAir-cooling Data CenterIn addition to the introduction of liquid cooling in the new AIDC,the traditional data center will graduallystart to carry the intelligent computing business with high power density because of the business change.Due to the low power density of t
311、raditional data center cabinets,the refrigeration mode is generallyair-cooling.At present,many traditional data centers have the need for liquid cooling transformation,for two mainreasons:1.Business adjustment:With the advent of the AI era,many data centers will gradually upgrade theirbusiness types
312、 from General-Purpose Computing Power to Intelligent Computing Power.The upgrade ofcomputing power requires an efficient liquid-cooling technology.2.Energy saving and carbon reduction:Governments around the world are imposing increasingly strictrequirements on data center PUE.Compared with tradition
313、al air-cooling,liquid cooling can save about 20%-30%of energy,effectively reduce PUE,and help to achieve the goal of energy saving and carbon reduction.For liquid cooling transformation projects,special attention should be paid to the compatibility of liquidcooling technology and existing equipment,
314、including server,rack,computer room load-bearing,powerdistribution,etc.If the compatibility is poor,then the difficulty,the the engineering scope,the cost of thetransformation will be very high.From this point of view,single-phase direct-to-chip liquid cooling is a verysuitable transformation soluti
315、on at this stage.The air cooling types of traditional data centers are mainly divided into the following two categories,and weformulate the corresponding transformation plans respectively.Still taking Southeast Asia as an example,theprimary side liquid supply temperature is 35.1.Chilled water system
316、2.Direct expansion air conditioning system7 77 77.1 Liquid cooling transformation of Chilled water systemWhen the system is reformed,it is necessary first to judge whether there have spaces on the site to addliquid-cooling cold source system.According to this condition,the transformation can be divi
317、ded into twoscenarios:use the original cold source and build new cold source.Use the original cold sourceWhen there is no enough space on the site or the project budget is limited,the original cold source schemecan be used.This scheme is similar to the homologous scheme described in Chapter 4,increa
318、sing a set of pipelinesfor the new liquid cooling system on the original cool water system and sharing the same set of cold source withthe original air cooling system.The scheme model is shown in Figure 7-1 below:Figure 7-1 Use the original cold sourceThe advantages of this scheme are:high compatibi
319、lity with original cool source,only need to increase a newset of pipelines;overall cost is low;energy efficiency will be significantly improved.The disadvantage is that thetwo systems are coupled,which will affect the original air cooling system during retrofit and later using.Due tothe use of old c
320、old sources,it is not suitable for capacity expansion projects.Build new cold source.This scheme is similar to the air-liquid independent scheme introduced in Chapter 4.Before deployment,itshould be confirmed that there are enough spaces on the site to add a set of liquid-cooling cold source system.
321、Thenew liquid cooling cold source chooses the cooling tower.7 78 8The scheme model is shown in Figure 7-2 below:Figure 7-2 Build new cold source.The advantages of this scheme are that the two systems are completely decoupled,the continuity of theoriginal business will not be affected in the retrofit
322、 process;the energy efficiency is significantly improved,whichis better than using the original cold source,and suitable for capacity expansion projects.The disadvantage is thatit needs to add a set of equipment and pipeline,which has large amount of retrofit workload and high cost.7.2 Liquid coolin
323、g transformation of Direct Expansion AirConditioning systemThe liquid cooling modification of this system can be divided into two types:L2A and L2L.L2A(Liquid to Air)type modification schemeThis scheme is the same as the architecture introduced in chapter 3,using the original precision airconditioni
324、ng as the cold source,which transfers the liquid cooling heat to the air in the computer room,and thendischarges the heat from the original air-cooling air conditioning to the outdoor.CDU adopts L2A type,which isarranged side by side with the new liquid cooling cabinet and connected with the pipelin
325、e.The scheme model is shown in Figure 7-3 below:7 79 9Figure 7-3 System architecture of the L 2 A schemeThe advantages of this scheme are:it is highly compatible with the original cooling equipment withoutmodifying the original air cooling system;low overall cost;CDU of L2A type can be prefabricated
326、 for quick fieldinstallation and deployment.The disadvantage is that energy efficiency is improved but lower than L2L scheme;redundancy is poor;CDU occupies more space and is not suitable for large-scale deployment.L2R(Liquid to Refrigerant)type modification schemeThe scheme adopts a split-type cool
327、er,which can be transformed from the original precision air conditioningand integrated with CDU.It is composed of water-fluorine heat exchanger,pump assembly,constant pressurewater supply device,etc.The outdoor unit can directly use the original condenser.The schematic diagram of the split-type cool
328、er is shown below in Figure 7-4.If the project is located in a coldregion,it is advisable to consider adding a refrigerant pump free-cooling module to enhance the annual energyefficiency.Figure 7-4 Schematic diagram of the split-type cooler8080The scheme model is shown in Figure 7-5 below:Figure 7-5
329、 System architecture of the L2R schemeThe advantages of this scheme are that the two systems are completely decoupled and the continuity of theoriginal business will not be affected in the transformation process;the energy efficiency is significantly betterthan the L2A scheme.The disadvantage is tha
330、t has large amount of transformation workload and high cost.8 81 1The operation challenges of the liquid cooling system in the Artificial Intelligence Data Center8.1 Reliability verification of the Direct-to-chip system8.2 Matching verification of Direct-to-chip server8.3 Division of operation and m
331、aintenance interface of Direct-to-chip system8.4 Operation and maintenance of the Direct-to-chip system8.The operational challenges of the liquid coolingsystem in the AIDCCompared with the traditional data center air cooling system,the AIDC liquid cooling system is very differentin terms of architec
332、ture,terminal heat capture form,operation and maintenance interface,etc.While in operation,If the pipeline cooling medium pressure drop,liquid leakage,air resistance,dirty blocking occurs,the emergencytreatment time given to the operation and maintenance personnel will be greatly shortened due to th
333、e rapidaccumulation of heat.At the same time,the operation and maintenance experience of the liquid cooling system isrelatively limited.On the one hand,the operation and maintenance personnel need to adapt to the new systemarchitecture and equipment products,and change the traditional operation and maintenance habits,and monitor theinfrastructure and IT equipment;on the other hand,they need to res