《time-series-database-on-kubernetes-efficient-management-of-massive-internet-of-vehicles-data-kuberneteszha-daeppan-fa-lia-hao-ji-ni-jie-wu-dai-vicky-lee-huawei-cloud-computing-technology-co-ltd.pdf》由會員分享,可在線閱讀,更多相關《time-series-database-on-kubernetes-efficient-management-of-massive-internet-of-vehicles-data-kuberneteszha-daeppan-fa-lia-hao-ji-ni-jie-wu-dai-vicky-lee-huawei-cloud-computing-technology-co-ltd.pdf(21頁珍藏版)》請在三個皮匠報告上搜索。
1、Time Series Database on Kubernetes:Efficient management of massive Internet of Vehicles dataAbout MeIm a Time-series database expert in the HUAWEI CLOUD Database Innovation Lab and the Co-founder of the openGemini community,has been engaged in distributed databases and NoSQL databases as a cloud ser
2、vice for many years.Currently,mainly dedicated to openGemini developmentVicky LeeContent IoV Technology Architecture Evolution and Industry Challenges IoV Solution Based on openGemini openGemini Key TechnologiesIoV Technology Architecture Evolution and Industry ChallengesTechnical Challenges:Perform
3、ance,Real-time Analysis And CostsMore and more vehicles connected to internetMore than 70%new vehicles have the network connection capability,and the annual IoV access increment exceeds 30%.Data collection frequency is becoming increasingly higherGB StandardData:100+ColumnsFrequency:30sEnterprise St
4、andardData:1000+ColumnsFrequency:10S/1S/100msThe enterprise standard data is more than 10 x that of the GB standard,it means that these data need to be quickly written into the database.If millions of vehicles report data,the write traffic reaches 10 GB/sIncreasing storage costsreal-time analysis6 m
5、onths1 month1 monthoffline analysis3 years/10 years/Permanent1 year1 yearMassive data and long storage time.1 million vehicles generate 3 PB data in a year.Increasing requirements for real-time analyticsIntelligent operationsseller competence enablementdecision making and Insightexperience optimizat
6、ionThese scenarios,such as alarm,vehicle status query,and fault analysis,have higher requirements on real-time servicesEarly IoV Technical ArchitectureGatewayfilefileKafkaFormat ConversionofflineanalysisData FilesData FilesData FilesData FilesHDFSMapReduceHQL1.Big data storage2.Statistical AnalysesB
7、enefitsChallenges1.File batch processing and low efficiency2.Not meeting the requirements of real-time data analysisModern IoV Technical ArchitectureBenefits1.K8s enhances the resource utilization of computational frameworks and analysis tasks.2.Big data technologies have greatly improved data proce
8、ssing efficiency.3.Time Series Databases enable real-time analysis and querying capabilities for the IoV.Challenges1.The traditional time series database cannot meet the performance requirements of massive data writing and real-time data analysis.2.HBase lacks effective data compression for long-ter
9、m data retention,resulting in high data storage costsKafkaSpark/FlinkTime Series Database(InfluxDB/openTSDB/)HBase/obsReal-Time DatabaseOffline AnalysisOnline AnalysisData APPReal-time InspectionDetail QueryingHistorical analysistMQTTIoV Solution Based on openGeminiWhat is openGeminiDistributedHigh
10、PerformanceLow Storage CostFeaturesopenGemini is a CNCF Time Series Database project,Focusing on the storage and analysis of massive observability dataIoTIoVDevopsapplicable scenariosopenGemini:New Technical Architecture For IoV Massive DataBenefitsFast Writesupport millions of vehicles writing10GB/
11、sFast Querydata query responseMillisecond-levelLow Cost Storagereduce data storage costs 30%+openGemini Key TechnologiesopenGemini+Kubernetes LayoutCompression for numerical valueDictionary EncodingIndexInverted Index for Time SeriesSkip IndexBloom FilterFull-TextIndexParquetPrivateFormatStorageEngi
12、neQueryEngineExecutingFrameworkReplicationCacheApproximate CalculationStreamingGeneral OperatorsScanTokenizerAggregationUDFEcosystemOperatorTime series operatorPrometheus OperatorsSQL+PipelineLog operatorInterfaceUnify IRInfluxQLOpenTelemetryPromQLSQL MPP(Massively Parallel Processing)Architecture,I
13、mprove data processing performance by spreading workloads across multiple nodes Scale Horizontally,Increase the number of nodes to resist the impact of concurrent traffic.Flexibility,ts-sql and ts-store can scale-out independentlyvehiclevin=“1G1BL52P7TR115520”type=“x3”pressure=56 voltage=12168019517
14、2000vehiclevin=“1G1BL52P7TR115521”type=“x3”pressure=58 voltage=111680195172000vehiclevin=“1G1BL52P7TR115520”type=“x3”pressure=56 voltage=121680195172001vehiclevin=“1G1BL52P7TR115521”type=“x3”pressure=57 voltage=131680195172001What is Time Series TableTagFieldTimestampData source ATimestamppressurevo
15、ltage1680195172000561216801951720015612Data source BTimepressurevoltage1680195172000561216801951720015612Timestampvalue1Data Layoutvalue2Inverted IndexInverted IndexTSIDWhy Write Data So FastWriteWriteactiveactivememtablememtableMemoryMemoryimmutableimmutablememtablememtableWALWALWALWALDiskDiskTSSPF
16、ileTSSPFileTSSPFileTSSPFileWrite memTable and returnmemtableorder immemtableout-of-order immemtableLevel 0Level iLevel i+1full compactlevel compactlevel compactmergemergeDelay merge out-of-order immemtableout-of-order immemtableorder immemtableorder immemtableorder immemtablemergeWhy Write Index So
17、FastWrite IndexWrite IndexMemoryMemoryDiskDiskIndex Filebloom filter FileGlobal Index cacheLocal Index cacheLocal Index cachecheck existcheck existupdate bf filewrite new indexbloom filter FileIndex File1.Global-Local cache architecture improve cache hit ratio and reduce memory use2.Bloom filter red
18、uce the cost of index check.Benefitsbloom filter membloom filter memcheck not existQuery Data:Vectorized,Parallel Computing,Data Preprocessdata query processVectorizedOperation=ts-storets-storets-storeCacheInverted IndexesVectorizedqueryqueryqueryParallelismMake use of the parallel computing advanta
19、ges of the architecture,faster responseSELECT vin,pressure,voltage from vehicle where vin=0 AND time now()1h and(pressure50 or voltage10)group by*ts-sqlDAGCacheInverted IndexesVectorizedDAGCacheInverted IndexesVectorizedDAGOptimizerDAGData Preprocessts-metaWhy Query So FastCoordinatorQueryResult Cac
20、heMiddle CachePlan CacheSchedule CacheDiskOBSStorageMiddle CacheFilter CacheIndex CacheNodeMetadata CacheData Cache100 Query20 Query70 Query10 QueryData Compression:10 x higher data compression efficiency Series IDC1 C2 C3 C4C5C6column-based storage,Data from the same timeline is stored togetherSimp
21、le8b,Delta,Delta-of-Delta,RLE,Zigzag,Zstd,Snappy,Bit-packing,L4 Data TypeData Distribution+Data Compression AlgorithmsopenGemini vs openTSDB(HBase)KeyCloumnFamilyQualifierTimestampTypeValue rowkey1Personal_infoNameT1putPeterrowkey1Personal_infoCity T2putChicagorowkey1Personal_infoPhoneT3put132xxxxxr
22、owkey2Company_infoName T4putQtimerowkey2Company_infoCity T5putHong KongOnly Four:GZ,SNAPPY,LZO,LZ4Data Compression AlgorithmsData RedundancyIoV data:Compared with HBase,openGemini has 10 x higher data compression efficiency.Metric Store EvaluationData ModelSpecifications Query Model NameTime-seriesc
23、oncurrencyaverage delay by query(ms)openGeminiInfluxDBopenGemini/InfluxDBDevopsSingle-node32U128GBsingle-groupby-1-1-12 300,000325.61 22.93 409%single-groupby-1-1-1 322.02 4.20 208%single-groupby-1-8-1 324.10 11.72 286%single-groupby-5-1-12 329.72 95.04 978%single-groupby-5-1-1 322.79 11.66 418%sing
24、le-groupby-5-8-1 325.91 46.60 788%cpu-max-all-1323.74 13.55 362%cpu-max-all-8329.34 88.19 944%double-groupby-1821558.69 243356.06 1129%double-groupby-5 851607.11 OOM-double-groupby-all 888777.61 OOM-lastpoint85501.98 OOM-groupby-orderby-limit29014.86 OOM-high-cpu-13211.30 29.08 257%high-cpu-all13262
25、2.80 OOM-Data ModelSpecifications DatabaseTime-seriesconcurrencyWrite performanceDisk UsageRaw Datarows/secDevopsSingle-node32U128GBopenGemini300,00032469,88111GB259 million rows of data,842 GB(text file)InfluxDB73,52914GBCompared with InfluxDB,openGemini has better read/write performance and data compression Reference:https:/docs.opengemini.org/zh/guide/introduction/performance.htmlhttps:/ To Try And Give Feedbackdocker run d p 8086:8086 name openGemini-dev openGemini-serverslackhttps:/