當前位置：首頁 > 報告詳情

a-story-of-managing-kubernetes-watch-events-end-to-end-flow-in-extremely-large-clusters-nanomao-zhong-shi-fa-lia-kubernetes-watchguo-guo-xiao-zha-bo-tang-ant-group.pdf

上傳人：山海編號：627236 2025-04-21 PDF PDF 26頁 5.69MB

該報告所屬合集： KubeCon China 2024嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/26

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《a-story-of-managing-kubernetes-watch-events-end-to-end-flow-in-extremely-large-clusters-nanomao-zhong-shi-fa-lia-kubernetes-watchguo-guo-xiao-zha-bo-tang-ant-group.pdf》由會員分享，可在線閱讀，更多相關《a-story-of-managing-kubernetes-watch-events-end-to-end-flow-in-extremely-large-clusters-nanomao-zhong-shi-fa-lia-kubernetes-watchguo-guo-xiao-zha-bo-tang-ant-group.pdf（26頁珍藏版）》請在三個皮匠報告上搜索。

1、A Story of Managing K8s Watch Events End-to-end Flow in Extremely Large ClustersBo Tang,Ant GroupAug,2024OutlineThe Watch Mechanism and The Importance of Watch in KubernetesDefinition of Watch SLO and What It Brings to UsWhat We Did to Optimize Watch Flow and the Benefits We GotSummary and Future Pl

2、ansWatchMechanismWatchSLOOptimizationSummaryK8s OverviewKubernetes is getting boring?Let us investigate K8s moreYou will get what you wantStable SkeletonComprehensive extensibilityDesign PhilosophyAfter KubernetesBefore KubernetesK8s Birds-eyes ViewSchedulerKubeletControllersWhen Cluster Gets LargeK

3、CMSchedulerKubeletCustom ControllersKube-proxyWhen Cluster Gets LargeProblems arise when cluster gets large:Large number of nodes and pods.Large number of CRDsHeavy traffichigh churn rate of resources.The provisioning path is longDifferent usages of different controllersUsers requirements diverse/va

4、ry a lotAlmost every problem is connected to Watch mechanism.Weve done optimization for each component of Kubernetes system,but the link/connection part seems missing.Watch Procedure1.Apiserver obtains data from etcd,decodes it,formats it,and sends it to apiservers internal cache.2.Based on the watc

5、hed data,apiserver builds its cache internally,which is called the WatchCache object.It includes a full set of data and a circular array.*The full set of data is a set of data that includes indexing and is used for various client list requests,list-by-label requests,and list-by-field requests.*The c

6、ircular array contains the latest watchCache event and is used for various client watch requests.3.The apiserver internally iterates through all of the cacheWatchers for a resource and sends each cacheWatcher the change events it is interested in one by one.4.When the client receives data from apise

7、rver,it performs decoding,and puts it into the clients cache.At the same time,generate corresponding add/update/delete events and put them into the workqueue.5.The clients user code reads the workqueue and performs corresponding reconcile reconciliation processing.WatchMechanismWatchSLOOptimizationS

8、ummaryK8s SLO Definitionhttps:/ SLOSLI：The SLI is defined as the duration from the time an event enters the APIserver until it leaves the APIserver,effectively measuring as the ApiserverTimeSpan.Measured as P99 over last 1 min.SLO：The SLO is defined as the latency of all events from entering the API

9、 server until they exit the API server.Within a rolling window of 1min,the 99th percentile of this latency time should be less than 1 second,which is mathematically represented as P99(ApiserverTimeSpan)14.05s，a 59.2%enchancementAnd the apiserver network bandwidth decreases from 20GB to an average of

10、 7.5 GBWatchMechanismWatchSLOOptimizationSummarySummary2x to 10-90 x times reduction in terms of the following metrics:P99/P999 RT from etcd to apiserver watchcacheP99/P999 RT from apiserver to client sideClient side relist/rewatch number reduction and the delay caused by the full relistThe end-to-e

11、nd pod provisioning time,from pod creation to running.Apiserver load balancer bandwidth reductionThe scalability/stability of the whole systemPositively Affects billions of watch events per day.Positive feedback from internal users(Alipay ads/search,risk management,hpa,etc)and supports Alibaba/Alipa

12、y 11.11 for recent years BandwidthRelistRewatchRTPod ProvisionAdsSearchRisk11.11You will get what you wantvery soonSummary and Road AheadAre we done yet?NoThere are still problems going onThe reduction of client side CPUThe reduction of lock contention for arbitrary client codeThe proper adjustment of apiserver watch cacheThe proper usage of client for removing 409The etcd watch guaranteeTraffic measurement and analysisThe inclusion of higher version K8s featureThank youYou will get what you wantvery soonReferencehttps:/ 5ImageSrc:https:/

相關圖表

Story模式的內容（2022年4月13日按照順序劃?。? title=

行業數據

2022-07-15

原圖定位查看詳情

極狐阿爾法SHI版與普通版配置情況

極狐阿爾法SHI版與普通版配置情況

行業數據

2022-07-06

原圖定位查看詳情

本文主要探討了在大型 Kubernetes 集群中優化 WatchEvents 的端到端流程。作者 Bo Tang，來自螞蟻集團，在文章中概述了 Kubernetes 的監聽機制及其在集群中的重要性，定義了監控服務級別目標（Watch SLO），并分享了他們為優化監控流程所采取的措施及獲得的效益。關鍵數據包括：將 etcd 到 apiserver 的監控緩存時間從 3 秒降低到 100 毫秒級，將監控事件從進入 apiserver 到離開 apiserver 的時間從 5 秒降低到 500 毫秒級。通過優化，P95 情況下部署 1000 個 Pod 的時間從 34.43 秒減少到 14.05 秒，網絡帶寬也從 20GB 降低到平均 7.5GB。文章詳細介紹了 Watch 機制、Watch SLO 的優化摘要、Kubernetes 的概述、優化的具體措施，包括數據結構改進、減少鎖爭用、計算和數據減少、異步計算、帶寬減少和適當的緩存等，以及他們如何針對自定義控制器運行時進行優化，如何減少 apiserver 流量，以及如何調整監控緩存大小等。最后，作者指出仍有問題需要解決，如客戶端 CPU 減少、客戶端代碼中鎖爭用的減少、apiserver 監控緩存的使用、客戶端的正確使用、etcd 監控保障、流量測量和分析以及包含更高版本 Kubernetes 功能的優化等，并展望了未來的工作。

"K8s集群大規模應用挑戰" "如何優化Kubernetes的Watch機制？" "大型K8s集群監控優化實踐"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站