《6591 - Data Center Evolution Powering the AI Revolution Sustainably (Google).pdf》由會員分享,可在線閱讀,更多相關《6591 - Data Center Evolution Powering the AI Revolution Sustainably (Google).pdf(27頁珍藏版)》請在三個皮匠報告上搜索。
1、April 2025Madhu IyengarPrincipal Engineer,GoogleData Center EvolutionAuto buildsDoneAI vs Traditional ComputeAI is driving different requirementsEvolving Data Center Design for AIGoogle+partners using OCP to meet this momentData Center Evolution Powering the AI Revolution SustainablyDone201820192020
2、2021202220232024Gemini 2.0Gemini 1.52017XLNetT5BERTLaMDAPaLMFlan-T5Gemini 1.0PaLM2ExaflopsTransformerExplosive growth in deployed ML capacity Increasing demand for space and powerGraph Auto buildsAI Model Compute ScalingRecently released Gemini 2.5!DoneNon-AI WorkloadsDoneAI WorkloadsDoneAI Workload
3、sNon-AI WorkloadsAI is synchronous,creating Quality&Reliability challenges from chip to gridDonekW per Rack201820202022202420262028Compute&StorageAI/MLAI power demand requires new power delivery&cooling approachesGraph Auto buildsDoneSolving problems togetherSustainability Quality Reliability DoneGr
4、een ConcreteNew metrics and materials;next focus is driving adoption Clean Backup Clean energy alternatives to diesel generatorsCollaboration SpacesFounded OCP DCF Sustainability,Founder of NZIHAuto buildsDoneDoneBetter reliability by codesign of hardware&softwareDoneOpen innovation neededfrom chip
5、to gridPower DeliveryDone48V widely adopted over the last decadeDoneAC 48VBBUAC 48VBBUML SystemPDUAC 48VAC 48VBBUBBU120kWAC 48VBBUAC 48VBBUML SystemPDUAC 48VAC 48VBBUBBUPDUBBUAC 48V150kWAC 48VML SystemBBUPDU50kWBBUAC 48VML power per rack increasing rapidly48V with batteries in ML rack does not scale
6、Higher power disaggregated approach requiredXAuto buildsDoneML System500kWAC DCPower ShelfPower ShelfPower ShelfPower ShelfML System500kW+/-400Vdc416 VacMove power components into sidecar rackScale power 10 x to+/-400Vdc from 48VdcLong term,eliminate sidecar with building infraDisaggregated Power Ra
7、ckDone0.5 spec coming in May!DoneLiquid CoolingDoneTPU v32018TPU v42021TPU v52023Ironwood2025ML performance&density requires liquid coolingDoneDoneExpanded infrastructure footprint to Design for fungibility for next gen AIFlow,pressure,temperature across IT suppliersQuality&reliability with wetted materialsDoneGoogles 4th generation CDUGoogle has 1 GW of liquid cooling infra20 data centers with 99.999 rack uptime4th gen CDU embodies learnings,deploying nowDoneDone5th gen CDU DoneDone 1 MW racksProject Deschutes CDU coming to OCPDoneDone