《2017年airbnb數據平臺實踐.pdf》由會員分享,可在線閱讀,更多相關《2017年airbnb數據平臺實踐.pdf(64頁珍藏版)》請在三個皮匠報告上搜索。
1、Airbnb?HONGBO ZENGData Platform at AirbnbCluster EvolutionIncremental Data Replication-ReAirUnified Streaming and Batch Processing-AirStreamAgendaData Platform at AirbnbCluster EvolutionIncremental Data Replication-ReAirUnified Streaming and Batch Processing-AirStreamAgenda13B35PB1400+Warehouse Size
2、#Events CollectedMachinesHadoop+Presto+SparkScale of Data Infrastructure at Airbnb5xYoY Data GrowthEvent LogsMySQL DumpsGold ClusterHDFSHiveKafkaSqoopSilver ClusterSpark ClusterSparkReAirAirflow SchedulingS3Presto ClusterAirPalSuperSetTableauData PlatformYarnHDFSHiveYarn5AirStreamEvent LogsMySQL Dum
3、psGold ClusterHDFSHiveKafkaSqoopSilver ClusterSpark ClusterSparkReAirAirflow SchedulingS3Presto ClusterAirPalSuperSetTableauData PlatformYarnHDFSHiveYarn6AirStreamData Platform at AirbnbCluster EvolutionIncremental Data Replication-ReAirUnified Streaming and Batch Processing-AirStreamAgendaSetupSing
4、le HDFS,MR and Hive installationc3.8xlarge(32 cores/60G mem/640GB disk)+3TB of EBS volume800 nodesTested DN on different AZsAll data managed by HiveOriginal ClusterChallengesLimited isolation between production/adhocAdhoc-Difficult to meet SLAs-Harder for capacity plan Disaster recoveryDifficult rol
5、l outsTwo independent HDFS,MR,Hive metastoresd2.8xlarge w/48TB local250 instances in final setupReplication of common/critical data-Silver is superof GoldFor disaster recovery,separate AZsTwo ClustersGold ClusterHDFSHiveSilver ClusterReplicationYarnHDFSHiveYarnAdvantagesFailure isolation with user j
6、obsEasy capacity planningGuarantee SLAsAble to test new versionsDisaster RecoveryMulti-Cluster Trade-OffsDisadvantagesData synchronizationUser confusionOperational overheadAdvantagesFailure isolation with user jobsEasy capacity planningGuarantee SLAsAble to test new versionsDisaster RecoveryMulti-Cl
7、uster Trade-OffsDisadvantagesData synchronizationUser confusionOperational overheadData Platform at AirbnbCluster EvolutionIncremental Data Replication-ReAirUnified Streaming and Batch Processing-AirStreamAgendaBatchScan HDFS,metastoreCopy relevant entriesSimple,no stateHigh latencyWarehouse Replica
8、tion ApproachesIncrementalRecord changes in sourceCopy/re-run operations on destinationMore complex,more stateLow latency(seconds)Record Changes on Source Convert Changes to Replication Primitives Run Primitives on the DestinationIncrementalReplication14 Hive provides hooks API to fire at specific p
9、oints-Pre-execute-Post-execute-Failure Use post-execute to log objects that are created into an audit log In critical path for queries Record ChangesOn Source15Example Audit Log Entry16 3 types of objects-DB,table,partition 3 types of operations-Copy,rename,drop 9 different primitive operations Idem
10、potentConvert Changes to Primitive Operations17CREATE TABLE srcpart(key STRING)PARTITIONED BY(ds STRING)Copy TableINSERT OVERWRITE TABLE srcpart PARTITION(ds=1)SELECT key FROM src Copy PartitionALTER TABLE srcpart SET FILEFORMAT TEXTFILE Copy TableALTER TABLE srcpart RENAME to srcpart_old Rename tab
11、lePrimitive Example18Copy Table Flowsource exists?dest exists and the same?copy to temp locationverify the copytmp-destadd metadatadoneYNYNData Platform at AirbnbCluster EvolutionIncremental Data Replication-ReAirUnified Streaming and Batch Processing-AirStreamAgendaBatch Infrastructure21Event LogsM
12、ySQL DumpsGold ClusterHDFSHiveKafkaSqoopSilver ClusterSpark ClusterSparkReAirAirflow SchedulingS3Presto ClusterAirPalSuperSetTableauYarnHDFSHiveYarnAirStream22SourceProcessSinkStreaming at Airbnb-AirStream23ClusterSpark StreamingAirflow SchedulingHBaseHDFSSourcesKafkaS3HDFSSinksDatadogKafkaDynamoDBE
13、lasticSearchLambda ArchitectureBatchAirStreamHiveSpark SQLLambda Architecture 25StreamingKafkaSpark StreamingState StorageSources26Streamingsource:name:source_example,type:kafka,config:topic:example_topic,Batchsource:name:source_example,type:hive,sql:select*from db.table where ds=2017-06-05;Computat
14、ion27Streaming/Batchprocess:name=process_example,type=sql,sql=SELECT listing_id,checkin_date,context.source as source FROM source_example WHERE user_id IS NOT NULL Sinks28Streamingsink:name=sink_example input=process_example type=hbase_update hbase_table_name=test_table bulk_upload=false Batchsink:n
15、ame=sink_example input=process_example type=hbase_update hbase_table_name=test_table bulk_upload=true StreamingComputation Flow29SourceProcess_AProcess_BProcess_A1Sink_A2Sink_B2BatchSourceProcess_AProcess_BProcess_A1Sink_A2Sink_B2Unified API through AirStreamDeclarative job configurationStreaming so
16、urce vs static sourceComputation operator or sink can be shared by streaming and batch job.Computation flow is shared by streaming and batchSingle driver executes in both streaming and batch mode job30Shared State StorageAirStreamShared Global State Store32HBase TablesSpark StreamingSpark StreamingS
17、park StreamingSpark StreamingSpark BatchSpark BatchSpark BatchSpark BatchWell integrated with Hadoop eco systemEfficient API for streaming writes and bulk uploadsRich API for sequential scan and point-lookups Merged view based on version 33Why HBaseUnified Write API34DataFrameHBaseRegion 1Region 2Re
18、gion NRe-partitionPutsHFile BulkLoadRich Read API 35HBase TablesSpark Streaming/Batch JobsMulti-GetsPrefix ScanTime Range ScanMerged Views36Row KeyR1V200TS200R1V150TS150R1V01TS01TimeStreaming WritesStreaming WritesStreaming WritesMerged Views37Row KeyR1V200TS200R1V150TS150R1V01TS01TimeStreaming Writ
19、esStreaming WritesStreaming WritesR1V100TS100Batch Bulk UploadOur FoundationsUnify streaming and batch processShared global state store38MySQL DB Snapshot Using Binlog Replay Large amount of data:Multiple large mysql DBs Realtime-ness:minutes delay/hours delay Transaction:Need to keep transaction ac
20、ross different tables Schema change:Table schema evolvesDatabase Snapshot40Move Elephant41Binlog Replay on Spark20+hr4+hrAirStream Job5 mins15 1 hrspinal tapseedStreaming and Batch shares Logic:Binlog file reader,DDL processor,transaction processor,DML processor.Merged by binlog position:Idempotent:
21、Log can be replayed multiple times.Schema changes:Full schema change history.42Log ParserTransaction ProcessorChange ProcessorSchema ProcessorHBASELambda ArchitectureBinlog(realtime/history)DMLDDLXVIDMysql InstanceStreaming Ingestion&Realtime Interactive QueryRealtime Ingestion and Interactive Query
22、44HBaseAirStreamSpark StreamingKafkaQuery EngineDataPortalSpark SQLHive SQLPresto SQLInteractive Query in SqlLab45ThanksRealtime OLAP with DruidRealtime Ingestion for Druid48DruidAirStreamSpark StreamingKafkaDimensionMetricsDruid BeamSuperset Powered by Druid49Realtime IndexingHiveRealtime Indexing5
23、1Elastic Searches_version=mutation idAirStreamSpark StreamingSpark BatchTable AEventEventEventKafkaTable BTable CBackup SlidesTipsMoving Window ComputationLong Window Computation55What if window is weeks,months,or even years?Distinct in a Large Window56I dont want approximation.What should I do?Dist
24、inct Count57Row KeyListing 1 Visitor 01TS100Listing 1 Visitor 02TS100Listing 1 Visitor 04TS98Listing 1 Visitor 03TS99Prefix Scan with TimeRangePrefix Scan with TimeRangeTimeMoving Average58Row KeyListing 1Total Review Cnt:100TS100Listing 1Total Review Cnt:98TS99Listing 1Total Review Cnt:01TS01Listin
25、g 1Total Review Cnt:50TS50Count Difference/Time ElapsedCount Difference/Time ElapsedTimeWindow 1Window 2Schema EnforcementStreaming EventsThrift-DataFrame60ThriftEventhttps:/ ClassThrift ObjectFieldMetaDataStruct TypeFieldValueRowDataFrameSummaryUnify Batch and Streaming Computation62Global State Store Using HBase63Serial execution-Easy to reason about operations-Very slowParallel execution-Fast and scalable-Ordering is important:e.g.create table before copying a partition-DAG of primitive operationsRun Primitiveson Destination64