《何濤-Vineyard:開源分布式內存數據管理引擎(GOTC深圳會場)(23頁).pdf》由會員分享,可在線閱讀,更多相關《何濤-Vineyard:開源分布式內存數據管理引擎(GOTC深圳會場)(23頁).pdf(23頁珍藏版)》請在三個皮匠報告上搜索。
1、綜合技術專場何濤 2021年 8月 1日阿里巴巴:開源分布式數據管理引擎Why bother1.Sharing data efficiently(with“0-copy”)between libraries is easy within a single python process2.It is not as easy to do so across processes/runtimes on a single machinePossible with plasma from Apache Arrow,a local object store using shared-memory3.Wha
2、t about processing big data that cannot fit into a single machine,and involving different workloads?Use vineyard+K8s!PyData is the de-facto standard for data analysisThere are lots of libraries for different workloads(image credit:https:/coiled.io/blog/pydata-dask/)Big data analytical pipelinesAn an
3、ti-fraud pipelineLoad data,ETLLabel PropogationFraud detection with DNNPost ETL processingSQLML/TensorflowGraph Computation(LPA)/GRAPESQLBig data analytical pipelinesObversation:A typical big data application involves various kinds of workloads,and thus involves multiple dedicated systems for each w
4、orkloadThese dedicated systems typically shares intermediate data with external file systemsThe workflow is often organized as a chain/DAG,and each individual task only gets invoked after their prerequisite tasks are completedAn anti-fraud pipelineData Extraction and PreprocessingData Warehouse(Hive
5、)、Distributed FS(HDFS),Object Store(S3)Postprocessing and ETLETL(SQL)SQLGraph Analytical AlgorithmsGRAPETrain/Infer with a NN modelTensorflow/PytorchBig data analytical pipelinesProblem:Production-ready systems(Hive,Tensorflow,)are hard to develop.Sharing data with external file systems has huge I/O
6、 cost.Applying cross-task optimization(pipelining)on tasks is challenging.An anti-fraud pipelineData Extraction and PreprocessingData Warehouse(Hive)、Distributed FS(HDFS),Object Store(S3)Postprocessing and ETLETL(SQL)SQLGraph Analytical AlgorithmsGRAPETrain/Infer with a NN modelTensorflow/PytorchBig
7、 data analytical pipelinesHardness in developing production-ready systemsProblem:Many dedicated systems(e.g.,for graph computing)are developed these years,but only a few are production-ready.Huge efforts are required just to implementI/O adaptorsData partition/chunking strategiesFault-tolerance mech
8、anismsScale in/outData sink/sourceBig data analytical pipelinesHuge I/O cost in workflowsProblem:Data could be polymorphicNon-relational data,such as tensors,dataframes and graphs/networks are becoming increasingly prevalent.Tables and SQL may not be best way to store/exchange or process them.Having
9、 the data transformed from/to tables back and forth between different systems could be a huge overhead.Saving/loading the data to/from the external storage requires lots of memory-copies and IO costs.Big data analytical pipelinesHardness of cross-job optimizationProblem:Tasks in workflows has no inf
10、ormation about other tasks The immediate data cannot be placed in a optimized fashion for the dependent tasksThe data transfer from one task to another is a barrierUsually requires transformation of format and schemaIt is hard to do cross-task pipeliningTasks in a typical workflowRun this firstJoinB
11、ranch 1Branch 1Branch 1Follow Branch 1Follow Branch 1Follow Branch 1VineyardBig data systems at production-ready quality are hard to developVineyard has an extensible design,that supports pluggable routines for I/O,data partition,scaling and fault-toleranceI/O cost in workflows is usually highVineya
12、rd enables sharing in-memory immutable data in a zero-copy fashionI/O flows tasks in a workflow dont require extra copy,and data can be accessed an in-memory data object.Cross-task optimization is challengingData in memory can be directly shared between different systemsVineyard supported streams in
13、 shared memory,provides opportunity for pipelining between dedicated systemsMotivationVineyardDistributed in-memory object store for immutable dataZero-copy in-memory data sharing between different systemsOut-of-the-box high-level abstraction for developing big data applicationsLocal data access as
14、native objectsDrivers for data partitioning,I/O,checkpointing,migration,.What is VineyardVineyardA vineyard object consists of data payload and metadataData payload is storing in shared memoryMetadata is synced through the cluster with ETCDVineyard daemon instances are accessed via IPC/RPC connectio
15、nsData payload can only be accessed by IPC connectionsPluggable drivers can provide certain functionalities to certain data formatsArchitectureVineyardObject=Metadata+BlobDecouple the payload and semanticsShare by memory mappingZero-copyShare with the data structure abstractionsShares the data struc
16、ture directlye.g.,Tensors,DataFrames,GraphsBuilders+ResolverInterpret the vineyard objects to engines native value typeEfficient Object Sharing across EnginesVineyardVineyard support distributed objectsA global object consists of a set of chunksA client can accessing payload of local chunksand metad
17、ata(only)of remote chunksMetadata is synced using etcdPerformance:only metadata of objects that are refered by a global object are synced to other instancesDistributed Objects SharingVineyardZero-copy sharing unlocks new opportunityThe intermediate data sharing is not a barrier anymoreStream in Vine
18、yardStream over chunks of data structurese.g,tensor stream,dataframe streamTasks can be pipelined using vineyard stream!Pipelining between tasks in a workflowVineyardEngines usually are hard to be connected to production systemsIntegration with internal I/OIntegration with other internal enginesVine
19、yard serves as a bridgeI/O is delegated to vineyardEngines consume data structures in vineyard directlyEngine talks to other engines via shared intermediate objects in vineyardPluggable driversVineyard on KubernetesThe end-to-end big data task is deployed on KubernetesIntermediate data is abstracted
20、 as a Kubernetes resource(CRDs),and is sharing with vineyard through memory mapping“Data”lives in memory,and the scheduler optimizes the data flow among cluster nodesVision:a new cloud-native paradigm for bigdata tasksData Extraction and PreprocessingVineyard Daemonset(on Kubernetes)Postprocessing a
21、nd ETLETL(SQL)SQLGraph Analytical AlgorithmsGRAPETrain/Infer with a NN modelTensorflow/PytorchVineyard on KubernetesVineyard requires IPC communication between vineyard server pods and application pods for memory sharingThe domain socket of vineyard server could be mounted on hostPath or PersistentV
22、olumeClaimWhen users bundle vineyard and the workload to the same pod,the domain socket could be shared using an emptyDirMemory Sharing on KubernetesVineyard on KubernetesVineyard objects are abstracted as Kubernetes resources(i.e.,CRDs)Each CRD contains the metadata of the represented vineyard obje
23、ctLocation specs that describe which node an object is located are added to the CRDs of local objectsVineyard objects as Kubernetes resources(CRDs)Vineyard on KubernetesJob and its required data cannot be always alignedThe cluster environment is dynamic and constrainedThe requirements of different w
24、orkloads is differentThe location information can be used to guide the scheduling process:A vineyard scheduler plugin!It still can be unalignedAuto migration in intiContainerScheduling on KubernetesVineyard on KubernetesDeploymentVineyard is deployed as a DaemonSet in Kubernetes clusterDeploy using
25、HelmVineyard can be easily deployed in Kubernetes cluster using Helm:Deploying Vineyard on Kuberneteshelm repo add vineyard https:/ install-namespace vineyard-name vineyard stable/vineyardRoadmapOngingConnecting to machine learning frameworksIntegration with Tensorflow/Pytorch to share objects in vi
26、neyard to machine learning frameworksSDK in more languagesPythonJavaRustGoIntegration with workflow engines Integration with airflow:brings better immediate data sharing solution for workflows orchestrated by airflowRoadmapFurther aheadVineyard Operator for KubernetesBetter cluster management and mo
27、nitor on Kubernetes clusterBetter data-aware scheduler policy within the scheduler pluginApplication-aware Far MemoryVineyard supports global object abstractions,e.g,GlobalDataFrameSupport for application-aware far memory will enables single-machine applications to leverage remote memory resourcesBe
28、tter performance than raw RPCStorage hierarchyIn-memory objects can be swapped out in certain casesSnapshot the objects and restore back to memory later benefits the end-to-end performanceVineyard CommunityVineyard is open source under the Apache-2.0 LicenseAny contribution from the community are we
29、lcomedIssues about bugs and feature requestsPull requests for bugfix,enhancement,feature implementation and extensionsDiscussion about the installation,deployment,usage of vineyardWe have comprehensive documentation for the underlying design and how to build application on vineyardhttps:/v6d.io/Opensource