《4 Snowflake-The Evolving Landscape of Iceberg REST Catalog.pdf》由會員分享,可在線閱讀,更多相關《4 Snowflake-The Evolving Landscape of Iceberg REST Catalog.pdf(26頁珍藏版)》請在三個皮匠報告上搜索。
1、The Evolving Landscape of Iceberg REST CatalogYufei G 2024 Snowflake Inc.All Rights Reserved Caused by:org.apache.iceberg.exceptions.NotFoundException:Failed to open input stream for file:s3:/some/path/to/table/metadata/13648-45c53fb2-5124-4541-ace3-c63ed91e1d26.metadata.jsonAgendaWhy REST Catalog?R
2、ecent Features&Future DirectionGrowing EcosystemA Server Implementation:Apache Polaris(Incubating)Storage OnlylTable still workslPoor addressing:using path for namespace and table lLack of Data Governance,e.g.,data lineagelPerformance concerns No caching,Repeated file reads for queries and metadatal
3、Compatibility concerns:old clients vs.new clientsENGINEENGINE 2024 Snowflake Inc.All Rights Reserved Traditional Iceberg CatalogAddress table nicelyProvide basic data governance featuresEngine writes metadata.json filesEngine retries for conflict resolution:lock the tablePerformance concerns:No cach
4、ingCompatibility concern:old clients vs.new clients 2024 Snowflake Inc.All Rights Reserved CATALOGCATALOGENGINEENGINEHive CatalogJDBC CatalogIceberg Optimistic ConcurrencyIceberg Table Client1Client2v1v1v2v2v3v3Happy PathIceberg Optimistic ConcurrencyIceberg Table Client1Client2v1v1v2v2v3v3v2v2Confl
5、ict ResolutionReliable and Low-latency CommitsBetter Conflict ResolutionEasy Client ImplementationData GovernanceIceberg REST Catalog Shifting Client responsibilities to the server A simple idea to change way we use the Iceberg table completely 2024 Snowflake Inc.All Rights Reserved CATALOGENGINERES
6、T CatalogShift Client Responsibilities to ServeroServer-side handles commit and retry.No table lock.Faster and more reliableoServer-side writes the metadata.json file,no compatibility issueImproved Decision Making at Commit TimeoBetter Conflicts resolutionDDL vs DML,dont failed append when it confli
7、cts with an unrelated DDLCompaction vs IngestionoMultiple table transactionsEasier client implementationoLess responsibilities means simpler clients,such compute engines,any other systemoBetter support for multiple language clients,such as Python,Rust,GoRecent Features View Support Server-side Plann
8、ing Sever capabilityFuture Direction Server-side Metadata Tables Fine-grained commit Credential Refresh,Credential Vending Simplify LoadTable 2024 Snowflake Inc.All Rights Reserved CATALOGCATALOGENGINEENGINE Growing EcosystemIcebergIceberg Rust RustICEBERG REST SPECAPACHEPOLARISAn interoperable,open
9、 source catalog for Apache IcebergCentralized,cross-engine security and accessCross-engine read and write interoperabilityRun anywhere,no lock-inApache Polaris is currently undergoing Incubation at the Apache Software FoundationCross-engine read and write Apache Icebergs open source REST protocol fo
10、r multiple engines to read and write:Apache Flink,Apache Spark,Trino and many more.Table/View APIsOAuth2Multi-table Transaction 2024 Snowflake Inc.All Rights Reserved ENGINESCATALOGSTORAGESnowflakeTrinoPyIcebergApache SparkApache Flink REST APIApache Polaris is currently undergoing Incubation at the
11、 Apache Software FoundationAPACHE POLARISStarRocksCentralized,security and accessDefine principals/users and roles in Apache PolarisManage RBAC on Iceberg tables for users or rolesManage security at the storage layer by vending scoped credentials to engines during query execution 2024 Snowflake Inc.
12、All Rights Reserved ENGINESCATALOGSTORAGESnowflakeTrinoPyIcebergApache SparkApache FlinkAPACHE POLARISStarRocksRole Based Access ControlsCredential VendingApache Polaris is currently undergoing Incubation at the Apache Software FoundationRun anywhere,no lock-inDeploy in your own infrastructure in a
13、container(e.g.Docker,Kubernetes)Flexibility to switch infrastructure and retain RBAC,namespaces,and table definitions 2024 Snowflake Inc.All Rights Reserved Apache Polaris is currently undergoing Incubation at the Apache Software FoundationAPACHEPOLARISSnowflake Managed ServiceEC2EKSDockerVMsAKSDock
14、erGCEGKEDockerEntity Hierarchy Catalogs are INTERNAL(read/write)or EXTERNAL(read-only for now)Namespaces can be nested arbitrarily deep 2024 Snowflake Inc.All Rights Reserved Permission modelPrincipal:Identity for applications or usersCatalog Role:Groupings of permissions on entitiesPrincipal Role:C
15、onnection between Principals and Catalog Roles 2024 Snowflake Inc.All Rights Reserved Catalog 2AliceETL/Streaming enginesAI/ML AppService AdminData engineerData scientistCatalog1Catalog Catalog AdminAdminData AdminData Adminnamespace1table1Read dataWrite dataWrite propertiesRead propertiesList table
16、sList namespacesRead dataWrite dataList tablesList namespacesCatalog Catalog AdminAdmintable2Catalog Catalog readerreaderData AdminData AdminCatalog Catalog readerreaderRead dataList tablesList namespacesnamespace2table3table4Apache Polaris is currently undergoing Incubation at the Apache Software F
17、oundationTry Apache Polaris in Just 5 MinutesDEMO Try Polaris with Spark SQL 2024 Snowflake Inc.All Rights Reserved Run Spark SQL commands:create database db1;show databases;create table db1.table1(id int,name string);insert into db1.table1 values(1,a);select*from db1.table1;insert into db1.table1 v
18、alues(2,b);call polaris.system.expire_snapshots(db1.table1,timestamp 2024-10-10);Connect Polaris with Spark SQL in another terminal./regtests/run_spark_sql.shGit clone and Run Polaris locallygit clone :apache/polaris.git./gradlew runAppDEMO Try Polaris with PyIceberg 2024 Snowflake Inc.All Rights Re
19、served Run PyIceberg CLI pyIceberg listpyIceberg list db1pyIceberg describe db1.t1pyIceberg files db1.t1 cat.pyiceberg.yamlcatalog:default:uri:http:/localhost:8181/api/catalog warehouse:manual_spark token:principal:root;realm:default-realmDEMOTry Polaris with StarRocks 2024 Snowflake Inc.All Rights
20、Reserved CREATE EXTERNAL CATALOG polarisPROPERTIES(type=iceberg,iceberg.catalog.type=rest,iceberg.catalog.uri=http:/polaris.metastore.svc.cluster.local:8181/api/catalog,iceberg.catalog.credential=xxxx:xxxxxxxx,iceberg.catalog.scope=PRINCIPAL_ROLE:ALL,iceberg.catalog.warehouse=quickstart_catalog);set
21、 catalog polaris;show databasesuse db1;select count(*)from db1.table1;Whats next?A Community-Driven RoadmapCatalog synchronization powered by Notification APIsExternal catalogs-PolarisPolaris-External catalogsMore storage options,e.g.,HDFS(issue#85),S3 compatible storagesSupport table types other th
22、an Iceberg,e.g.,Hive table via federationMore governance features,e.g.,Column masking,encryption,data lineageAdvance Iceberg Catalog featuresCommit conflict resolutionServer-side planningMigration tool for HMS usersJoin the Community!Perfect Time to Joino Diverse Expertise:PPMC includes members from
23、 leading organizations like Snowflake,Dremio,Google,Microsoft,AWS,and Confluent.o Be Part of Innovation:Contribute to a cutting-edge project with a strong community.Get Involvedo Join the Community:Engage with us on the dev list and chat channel.o Contribute to code base:Adding new features Bug fixes Documentation enhancemento Share Feedback:Your experiences and suggestions are valuable to us.ResourcesPolaris Git RepoIceberg REST Spec感謝觀看!Thank you!關注公眾號