Talkingdata：Alluxio - 開源AI和大數據存儲編排平臺（36頁）.pdf

報告預覽

Talkingdata：Alluxio - 開源AI和大數據存儲編排平臺（36頁）.pdf

編號：9276

PDF 36頁 18.30MB 下載積分：VIP專享

下載報告請您先登錄！

Talkingdata：Alluxio - 開源AI和大數據存儲編排平臺（36頁）.pdf

1、Alluxio - 開源AI和大數據存儲編排平臺顧榮 Alluxio PMC & Maintainer 南京大學計算機系副研究員、博士提綱 1. Alluxio項目&系統簡介 2. Alluxio 2.0新特性概覽 3. Alluxio未來發展趨勢快覽 4. 總結數據處理的四大趨勢驅動了新型基礎架構的需求 Separation of Compute & Storage Hybrid Multi cloud environments Self-service data across the enterprise Rise o

2、f the object store Data Ecosystem - BetaData Ecosystem 1.0 COMPUTE STORAGESTORAGE COMPUTE 大數據之路與企業創新的選擇同置 (Co-located ) Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS 分散 (Disaggregated) Burst HDFS data in t

3、he cloud, public or private Support Presto, Spark and other computes without app changes Enable & accelerate big data on object stores 向對象存儲過渡混合云化部署HDFS 支持更多計算框架技術轉變中的挑戰 Accessing data over WAN too slow Copying data to compute cloud time consuming and complex Using anot

4、her storage system like S3 means expensive application changes Using S3 via HDFS connector leads to extremely low performance 混合云部署HDFS Copying data to multiple compute clouds time consuming and error prone Migrating applications for new storage systems is complex & time consuming Storing and ma

5、naging multiple copies of the data becomes expensive 支持更多計算框架 Object stores performance for big data workloads can be very poor No native support for popular frameworks Expensive metadata operations reduce performance even more No support for hybrid environments directly 向對象存儲過渡 12

6、/2/19 7 計算與存儲實現獨立可擴展性 FUSE Compatible File SystemHadoop Compatible File SystemNative Key-Value InterfaceNative File System Unifying Data at Memory Speed GlusterFSInterfaceAmazon S3 InterfaceSwift InterfaceHDFS Interface Alluxio: a Virtual Distributed File System (VDFS) Java File APIHDFS

7、 InterfaceS3 InterfaceREST API HDFS DriverS3 DriverSwift DriverNFS Driver FUSE Interface 12/2/19 計算與存儲實現獨立可擴展性 Master-Worker Master 管理全部元數據監控各個Worker狀態 Worker 管理本地MEM、SSD和HDD Client 向用戶和應用提供訪問接口向Master和Worker發送請求 Under File System 一般用于備份 9 Under File Syst

8、em node 1node 2node 3 Master Client 齒齒侈侈尺尺 MEM Worker1 SSD HDD MEM Worker3 SSD HDD MEM Worker2 SSD HDD Alluxio系統內部整體架構 Alluxio數據編排賦能的幾類場景 Burst big data workloads in hybrid cloud environments On premise Same instance / container Accelerate big data frameworks on the public cloud Same i

9、nstance / container Dramatically speed-up big data on object stores on premise 高級使用場景 Enable big data on object stores across single or multiple clouds Orchestrate data frameworks on the public cloud Alluxio的核心創新數據伸縮性 Data Elasticity with a unified namespace Abstract data silos &a

10、mp; storage systems to independently scale data on-demand with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data 數據可訪問性 Data Accessibility for popular APIs & API translation 數據本地性 Data Locality

11、 with Intelligent Multi-tiering 基于智能多層緩存實現數據本地性 Local performance from remote data using multi-tier storage 通過提供流行APIs和API轉換實現數據可訪問性 Convert from Client-side Interface to native Storage Interface 通過統一命名空間實現數據可伸縮性 Enables effective data management across different Under Store Uses M

12、ounting with Transparent Naming 統一命名空間（Unified Namespace） Transparent access to understorage makes all enterprise data available locally SUPPORTS HDFS NFS OpenStack Ceph Amazon S3 Azure Google Cloud IT OPS FRIENDLY Storage mounted into Alluxio by central IT Security in Alluxio mirrors source d

13、ata Authentication through LDAP/AD Wireline encryption HDFS #1 Object Store NFS HDFS #2 100+ Known Production Deployments ConsumerTravel & TransportationTelco & Media TechnologyFinancial ServicesRetail & EntertainmentData & Analytics Services Incredible Open Source Moment

14、um with growing community 1000+ contributors & growing 4278+ Git Stars Apache 2.0 Licensed Hundreds of thousands of downloads Github: Join the conversation on Slack alluxio.org/slack Finding high-fit use-cases Example First Projects Enterprise Storage & Big Data Teams Virt

15、ual Data Lakes Gradual transition to low cost storage Unify hybrid-cloud storage Machine Learning & Data Science Teams Accelerate training Improve productivity Compute Zone Standalone or managed with Mesos or Yarn Storage in Different Availability Zone Either on-prem or cloud TensorflowPresto HD

16、FS Spark Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance. 19 Alluxio適用場景分析 Alluxio適用場景分析 20 21 Alluxio 2.x新特性介紹支持超大規模數據工作負載 l 支持超過10億+個文件 2.0引入了分層元數據管理(tiered metadata management)這一新選項，以支持包含超過10億個文件的單群集部署。我們現在默認使用

17、RocksDB進行堆外存儲。熱數據的元數據繼續存儲在堆內的進程內存中，而其余元數據由Alluxio在進程內存外進行管理。 alluxio.master.metastore可以配置為僅使用堆內存儲。 l 高度分布式數據服務 2.0引入了Alluxio作業服務(Job Service)，這是一種分布式集群服務，可以實現復制、持久化、跨存儲移動和分布式加載等數據操作，從而實現高性能和大規模擴展。支持超大規模數據工作負載 l 自適應副本以增強數據本地性該功能為Alluxio配置一定數量范圍的自動管理的存儲數據副本數。 alluxio.user.file.replication.m

18、ax和alluxio.user.file.replication.min可用于指定該范圍。 l 內嵌式日志以達到高可用性 2.0設計了一種稱為內嵌式日志(embedded journal)的面向文件/對象元數據的新容錯和高可用模式。內嵌式日志使用RAFT共識算法，并且實現方面獨立于任何其他外部存儲系統。這對于抽象對象存儲特別有用。 Alluxio 2.x新特性介紹支持超大規模數據工作負載 l 自適應副本以增強數據本地性 Alluxio Master Alluxio Worker Under Store Alluxio Worker Alluxio Worker Allux

19、io Worker Application Alluxio Client Block-1 Block-1 Application Alluxio Client Block-1 Application Alluxio Client Application Alluxio Client Block-1Block-1 SetReplicaMax(2) Alluxio 2.x新特性介紹支持超大規模數據工作負載 l 內嵌式日志以達到高可用性 Alluxio 1.x HA依賴ZK/HDFS組件 lAlluxio HA運行模式 Zookeeper: 負責選擇leader master HDFS

20、: 負責存儲日志文件，并在多個 masters直接共享 l存在的問題日志存儲的選擇受限依賴于第三方組件，服務的調試恢復都比較困難。 HDFS集群本身的不穩定，會使得 Alluxio集群維護成本變大 Standby Master Leading Master Standby Master Shared Storage write journal Hello, leader read journal Alluxio 2.x新特性介紹支持超大規模數據工作負載 l 內嵌式日志以達到高可用性 Alluxio 2.x去除了ZK/HDFS依賴在Alluxio三個

21、Master內部利用RAFT算法達成共識（ Consensus）狀態只有Leading master提交狀態變化， Standby masters保持同步優勢可以采用本地磁盤存儲日志（Master 節點間作副本）挑戰性能調優 Standby Master Leading Master Standby Master Raft State Change State ChangeState Change Alluxio 2.x新特性介紹更好的存儲抽象，實現完全獨立和彈性的計算 l支持跨不同版本的支持跨不同版本的HDFSHDFS集群集群數據的爆炸式增長導致企業通

22、常會擁有許多數據倉庫，包括采用跨不同版本的多個Hadoop集群。目前，跨這些集群的統一訪問非常困難。使用Alluxio 2.0，用戶可以使用Alluxio連接到多個多種版本的HDFS集群，并實現統一的數據訪問。 l與與HadoopHadoop主動同步主動同步該新功能是與HDFS iNotify進行對接集成，可對存儲在Hadoop中的文件所發生的任何數據和元數據更改進行更新，允許通過Alluxio訪問數據的應用程序能夠主動接收最新更新。 Alluxio 2.x新特性介紹 Alluxio 2.x新特性介紹對機器學習、數據查詢等系統更強的支撐 l 支持在任意存儲上運行機器學

23、習和深度學習工作負載機器學習和深度學習框架往往需要從Hadoop或對象存儲中提取大規模數據，這通常是手動且非常耗時的過程。 Alluxio的FUSE功能支持POSIX兼容的API，因此通過Alluxio，TensorFlow、Caffe等框架以及其他基于Python的模型可以使用傳統文件系統的訪問方式直接訪問任何存儲系統中的數據。 l 與結構化數據管理與查詢系統進行深度整合在Alluxio層面提供Catalog Service，提供了對結構化數據的抽象，添加Hive MetaStore到 Alluxio中就像掛載一個文件系統。 Alluxio感知文件和對象的數據存儲結構和模式(sche

24、ma)，從而更好地提供服務，提供了 Alluxio Data Transformation服務，例如：自動將CSV格式的文件轉成Parquet格式將很多小的表文件整合成大文件，減少查詢耗時等 Alluxio Catalog Service (Target 2.1) Serve Metadata of Tables (like Hive Meta Store) Highly Efficient by using Apache Iceberg (e.g., no slow dir listing) Speed up query planning, independent

25、 of speeding up by caching files in Alluxio File System Alluxio Connector for Presto (Target 2.1) Presto connects to Alluxio directly without Hive Connector Enable push downs to Alluxio layer Direction: Structured Data Service Call for Community Contribution! Productionize Helm Chart K8S

26、 csi-driver/provisioner Alluxio K8S Operator Direction: Alluxio on Kubernetes Automatic & Transparent Caching (Target 2.1) Use Alluxio as a caching layer for Presto, Spark or Hive without modifying HMS AWS/GCP Integration Improve EMR bootstrap script Images on AWS / GCP marketplace Direction: File System and Cloud Integration 32 Alluxio：大數據統一存儲原理與實踐范斌顧榮/著出版社：電子工業出版社. 出版時間：2019年8月 ISBN: 978-7-121-36782-3. 字數：242千字國內首本大數據存儲系統Alluxio書籍新出版的Alluxio中文書籍 33 歡迎加入Alluxio開源社區！ www.alluxio.org 掃描關注豐富的Alluxio中文技術材料與案例 34 35 顧榮

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后，可能會被瀏覽器默認打開，此種情況可以點擊瀏覽器菜單，保存網頁到桌面，就可以正常下載了。
3、本站不支持迅雷下載，請使用電腦自帶的IE瀏覽器，或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮，下載后原文更清晰。

本文（Talkingdata：Alluxio - 開源AI和大數據存儲編排平臺（36頁）.pdf）為本站（科技新城）主動上傳，三個皮匠報告文庫僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對上載內容本身不做任何修改或編輯。若此文所含內容侵犯了您的版權或隱私，請立即通知三個皮匠報告文庫（點擊聯系客服），我們立即給予刪除！

溫馨提示：如果因為網速或其他原因下載失敗請重新下載，重復下載不扣分。

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站