當前位置：首頁 > 報告詳情

Merlin NVTabular：基于 GPU 加速的推薦系統特征工程最佳實踐.pdf

上傳人： li 編號：29546 2021-02-07 PDF PDF 29頁 1.17MB

該報告所屬合集： 2020年GTC中國線上大會嘉賓演講PPT資料合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/29

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《Merlin NVTabular：基于 GPU 加速的推薦系統特征工程最佳實踐.pdf》由會員分享，可在線閱讀，更多相關《Merlin NVTabular：基于 GPU 加速的推薦系統特征工程最佳實踐.pdf（29頁珍藏版）》請在三個皮匠報告上搜索。

1、NVIDIAMerlinNVTabular:基于GPU加速的推薦系統特征工程最佳實踐黃孟迪，NVIDIA深度學習工程師#page#RELATED SESSIONS IN GTC CHINALearning More About NVIDIA MerlinMerlin：GPU加速的推薦系統框架CNS20590王澤衰，英偉達亞太AI開發者技術經理，NVIDIAMerlinHugeCTR：深入研究性能優化CNS20516MinseokLee，GPU計算專家，NVIDIAMerlinNVTabular：基于GPU加速的推薦系統特征工程最佳實踐CNS20624黃孟迪，深度學習工程師，NVIDIAGPU加

2、速的數據處理在推薦系統中的應用CNS20813魏英燦，GPU計算專家，NVIDIA將HugeCTREmbedding集成于TensorFlowCNS20377董建兵，GPU計算專家，NVIDIA使用GPUembeddingcache加速CTR推理過程CNS20626郁凡，GPU計算專家，NVIDIA#page#Merlin OverviewNVTabular- Merlin ETLTutorials- Best Practices For RecSysAgendaFeature EngineeringGoal 1:lmproving Model AccuracyGoal 2: Quick ex

3、perimentation with GPU AccelerationGoal 3: Scale to Production Systems With NVTabular#page#NVIDIAMerlin Overrview#page#Industrial Recommendation ChallengesTrainingFeatureEmbeddingDataloadingDeploymentHigh AccuracyExplorationTablesTabular data scalesLarge embeddingHigh throughput toMultiple iteration

4、s canLonger iteration cyclestables requirerank more items ispoorly using thereduce the ability toconsume a lot of timesignificantmemorydifficult whilecommon deep learningtofindthemostreach highermethod of item byand lookups can havemaintaining lowaccurate feature setaccuracies as quicklylatencyiteme

5、xtrraneous operations#page#Merlin Framework BenefitsNVTabularHugeCTRTritonFeatureDeploymentDataloadingScaling TrainingHigh AccuracyEngineeringHighthroughput，low-Fast iteration time，Acceleratetabular dataOptimal lookupsReach higher accuracylatency productionloading into trainingimplementation.faster.

6、deployment.Prepare massiveframeworks.datasetsin minutesEasy to use data andShorten exploration andInference time dataallowing for moreAsynchronous batchmodel parallel trainingtraining cycles to reachtransforms and multiexploration and betterdataloading meanstheallow you to scale to TBhigher accuraci

7、es soonermodel support providemodels.GPU is always utilizedsizedembeddingsmaximumthroughputwith latencyconstraints#page#NVIDIAMerlin ETLNVTabular-Fast Feature Transforms 8 Dataloading of Tabular Data on GPU#page#Day In The Life Of A Data ScientistThe average data scientist spends 75% of their time i

8、n ETLAccelerated ComputeRegular ComputeForgot to Add aTrain ModelFeatureValidateTest ModelRestart Data Prep1212StartWorkflownt withRepeatConfigure Data PrepWorkflowGPUCPUPOWEREDPOWERED9中WORKFLOWWORKFLOWFindUnexpectedNul Values StoredDatasetasStringDownloadsDatasetOvemightRestartData PrepDownloadswor

9、kflowAgainOvernightGo Home onTimeStay LateData PrepAnalysisTrainInferenceDatasetCollection#page#NVTabular: Recommender System ETL on GPUMerlin ETLNVTabularNVTabularWhat it is:Feature engineering and preprocessing library designed toEXTRACTLOADTRANSFORMquickly and easily manipulate terabytes of tabul

10、ar dataWhat its capable of:Scale- No limit on dataset size (not bound by GPU or CPULOBmemory）Speed-GPU acceleration 10x speedup compared to CPUeliminate input bottleneckETL-ExtracUsability- Higher level abstraction， recommender systemsoriented， fewer API calls are required to accomplish thesame proc

11、essing pipeline.Interoperabiliity with PyTorch， TensorFlow，and HugeCTR#page#RecSys Pipeline Example using NVTabularBuiding ofthe datasetLoading ofthedatasetTrainingNVTabularNVTabularOflineFrameworkML-readyModelFeaturesFeaturePreProc ofData LakedatasetspecificTrainingdatasetsEngineeringDataloaderTrai

12、ning data（PTITFUpto 1EBUptUp to10PBsUp to 1PBHugeCTR）Upto 100TBs100TBsUp to 100TBSPreProcModelCandidateWeightsConfigInferenceGeneration ServerFind high-recallInference time transformation of the datacandidatesforfuriherrankingOnline Inference Server(TRITON）（up to 1 bilion inferences/second:low-laten

13、cy budget）RecommendationInferenceNVTabularWeb ServicesServerTRTCandidatesOnline FeatureModelTensors.Engineering8(socialmedia，adsRecaives raquesis/InferenceDataFramesPreprocessingbookings.fraud.）Preparesdata/dmatrix.etcReturrnsrecs#page#Latest Merlin Release- NVTabular 0.3Core FeaturesoMulti-GPU supp

14、ort using Dask CUDFo Dataloaders for PyTorch， Tensorflow Keras， HugeCTRA100 Support: Rapids 0.15，CUDA 11。Data input:S3，GCS，HDFS Formats: ORC Multi node supportS Multi hot support10 New Operators from RecSys 2020 WinTargetEncoding，Differencelag，Column Similarity Dropna， Filter，FilMedian， HashBucket，

15、JoinGroupby， JoinExternal， LambdaOp#page#NVTabular Data LoadersNVTabularv0.3+Read in file in blockRepeat untilShuffle blocks inParquet Iterable Batchchunksfinished with theGPU memoryDataloader（GPU memory）datasetTensorflow DataloaderPytorch DataloaderPreliminary results:Up to4xend-to-endimprovement c

16、omparedto native PyTorch dataimprovement compared to native TF data loaderloaderWorkflowFeaturesRead large chunks of data intoa dedicatedRead large chunks of data intoGPU memorysegment of GPU memory bufferShuffle in memory buffer(per buffersize）Shuffle in GPU memory buffer(per buffer size）Movebatch

17、sized tensorsintoframeworkframeworkBenefitsBenefitsRemoves dataloading bottleneckNoitem levelreads offiles/memoyEliminatesthelessefficient item/file level readsAllows forhigher sizesEnables larger sized batchesandfiles#page#Example NVTabular API Workflow100x fewer lines of code requiredg1obSpecify w

18、hich variables areCategorical and which are1abe1_namContinuousoll featuColumn#initialize Morkflonnvt.Norfklowcat_label_Define the location of the# create datsets from input filestraining and validation set=glob.glob（./dataset/valid/*.pvalid files=nvt.dataset（train_files.train datasetyfrac=8.1Encode

19、Categoricals using the=nvt.dataset(valid_files，valid datasetdefined thresholds.#add featureLog transform the Continuousroc.add_cont_preprocess（nvt.ops.lormalize（）variables， Zero filling any nullshold=15）Apply the operations，creating aout_files=len（trainfiles）proc.apply（train_dataset，shuffle=Truue，ou

20、tput_path=new shuffled training datasetproc.opply(valid_dataset，shuffle=False，andavalidation dataset.Pandas/numPy example provided by DLRM is 1200 lines of codeNVTabulars high-level API is 10- 20 liines of code，#page#Feature engineering operator supportCategorify:Categorify operation can be added to

21、 the workflow to transform categorical features into unique integer values.Clip:This operation clips continuous values so that they are within a min/max boundFilMissing: This operation replaces missing values with a constant pre-defined valueLogOp: This operator calculates the log of continuous colu

22、mns.Moments: Moments operation calculates some of the statistics of features including mean， variance， standarded deviation， andcount，MinMax:MinMax operation calculates min andmax statistics offeatures.Normalize:This operation can be added to the workflow to standardize the features.NormalizeMinMax:

23、This operation can be added tothe workflow to standardize thefeatures.TargetEncoding: Target encoding is a common feature-engineering technique for categorical columns in tabular datasetsMedian:This operation calculates median of features.lau auespau se npod JauulJo ausopl-J Sulsn suunio oM uaamag K

24、ueitulsau saeinie:aleitus uunioDropna:This operation detects missing values，andfilters out rows with null valuesFilter: Filters rows from the dataset. This works by taking a callable that takes a dataframe，and returns a dataframe withunwanted rows filtered out.FilMedian: This operation replaces miss

25、ing values with the median value for the column.HashBucket: This op maps categorical columns to a contiguous integer range by first hashing the column then modulating by thenumber of buckets as indicated by num_buckets.JoinGroupby: This operator groups the data by the given categorical featurels） an

26、d calculates the desired statistics of requestedContinuous features.JoinExternal: Join each dataset partition to an external table.Lambdaop:LambdaOp allows you to apply row level functions toa NVTabular workflow.Source:https:/nvidia.github.io/NVTabular/main/api/ops/index.html#page#Case Study: Criteo

27、 1TB Ads Dataset80x Speedup over CPU for ETL and 114X over Tensorflow on a 40-core CPU node for Training7.5daysNumpy CPU ETL + PyTorch CPU Training2days7.5days total=5.5days ETL+2 days Training4 hoursSpark CPU ETL + PyTorch GPU Training5.5d84 hrs total=3hrs ETL+1hr Training自3.00Merlin: NVTabular + H

28、ugeCTR5.2mins5.2 mins total=1.9 mins ETL+3.3mins Training03Performance:Fastest0CPETL032G8FP18）aBenchmarkScript on Gihub： to NVTabularNVTabular is better than Spark and Pandas for tabular recommenders ETLNVTabular is focused on tabular deep learning recommenderso Native tabular data format support: C

29、SV， parquet， orc， avroS Easy to implement the most common workflowso No limit on dataset size (not bound by GPU or CPU memory） Optimized TF，PyT，and HugeCTR dataloadersS Integrated and extensible with RAPIDS Dask CuDFIntegrated with TensorFlow Serving and Triton for production inferenceo Examples pub

30、lished for common datasets and modelso Building An easy path to production deployment for data transformso Consistency between data during training and inferencenVIDI#page#NVIDIATutorials- BestPracticesForRecSys FeatureEngineering#page#Background: cuDFDask CUDFPythonCuDFPandasCythonCUDF C+CUDA Libra

31、riesThrustCubJitifyCUDASource: RAPIDS.AI presentation#page#Scaling beyond system memory / GPU memoryDescriptionLimitationLimited by thesystem memoryLibrary for data manipulation and analysis onFl pandasCPU and system memoryLimited by the GPU memoryLibrary for data manipulation and analysis onRAPIDSG

32、PU and GPU memoryWrapper for pandas / cudf with lazy executionPartitions data in chunksnolimitationto optimize and scale beyond system memoryand GPU memoryWrapper around dask_cudf to provide bestPartitions data in chunksno limitationNVTabularpractices in feature engineering and simplifyAPI（from 100-

33、1000 lines to5-25lines）#page#What is Dask？Dask is a task-based library for parallel scheduling and executionDask decomposes large DataFrames/Series (pandas / cuDF） into a collection ofSKDataFrames/SeriesDask schedules and execute the optimized task graph on one or moreprocesses/threadsExample：Januar

34、y2016PandasFebrary，2016DataFrameDask DataFrame is a collection ofDataFramesDaskEach element can be for example aDataFramoMarch，2016pandas/cuDF DataFrameApril.2016May.2016#page#Experimentation pipeline for RecSys (tabular data）Focus一美XFeatureInput13powandanoPreprocessingEngineeringlterationsMost RecS

35、ys &Tabular Data competitions are won by feature engineering instead of model architecturesaseepp Kueu o ajqelidde saoualladxe ano uo paseq anbyuua jelauab azeuuns jellomn SIuL#page#“Automatic” feature extraction in other domainsDeep Learning = The Entire Machine ls TrainableY LeCuTraditional Patter

36、n Recognition:Fixed/Handcrafted FeatureExtractorTrainableFeatureExtractorClassifierMainstream Modern Pattern Recognition:Unsupervisedmid-levelfeaturesMid-LevelTrainableFeatureExtractorClassifierFeaturesDeepLearning: Representationsare hierarchicaland trainecLow-LeveMid-LeveHigh-leveTrainableFeatures

37、FeaturesFeaturesClassifierolpne pue xa sabeu se uons sueuop lauo u! uoyoenxa aineay oyeuone u! lM suloyad Buuea7 daadAlthough there are attempts for tabular deep learning，adding feature engineering supports models.#page#Performance improvement of 5.9%-13.4%PerformanceXGBoostwithout feature engineeri

38、ng （rawfeatures）yields 0.61AUCwith feature engineering yields in0.646AUC（+5.9%）Deep Learningwithout feature engineering （rawfeatures） yields 0.56 AUCwith feature engineering yields in0.635AUC（+13.4%）Note: Current preliminary resultsXGBootDeepLeaning#page#Dataset of the TutorialDataset: eCommerce beh

39、avior data from multi category storeSource: REES46 Marketing PlatformURL：https:/ target:PurchaseNegative target:AddToCart（removing AddToCarts of purchased items from the same session）Datasetsplit:Training:Oct-2019-Feb-2020（11.4Miosamples）Validation：March-2020（2.4Miosamples）Test：April-2020（2.7Miosamp

40、les）Baseline:37%of events are purchasesFeatures:Userld，Sessionld， ItemldPriceTimestampCategoryBrand#page#Overview Feature TypesBold techniques in focusFeature TypeExampleFeature EngineeringUser ID / Item IDTarget EncodingBrandCategoricalCount EncodingMain CategoryCategorify+Combining CategoriesKeywo

41、rdsTarget EncodingUnstructured listSubcategoriesCount EncodingColorsCategorifyPriceBinningDeliver timeNumericNormalizationAvg.reviewsGauss RankTimestampTimestampExtractmonth，weekday，weekend，hour#ofeventsinpastXEvents in orderTimeseriesTime since last eventProductimageExtract latent representation wi

42、th deep learningImageDescriptionExtractlatentrepresentation withdeep learningTextFollower/Following graphLinkanalysisSocialgraphAddressesDistances to point of interestGeo locationThetutorials will beavailable here:https:/rapidsaideepleamingtreo/main/RecSys2020Tutorial#page#HANDS-ON LABs#page#ADDITIO

43、NAL RESOURCESOverviewNVIDIA GTC Fall 2020 Keynote Part 6:NVIDIA Merlin for Recommendation Systemso Product Page: https:/ BlogsAccelerating ETL for Recommender Systems on NVIDIA GPUs with NVTabularAnnouncing the NVIDIA NVTabular Open Beta with Multi-GPU Support and New Data LoadersGTC SessionsoNVTabular: GPU Accelerated ETL for Recommender SystemsGitHuboNVTabular: https:/ you！Please take this survey to give feedback and receive more information:https:/forms.gle/SvpoHhSdT5bwHwkc7

相關圖表

本文主要介紹了NVIDIA的Merlin框架，一個用于推薦系統的GPU加速特征工程框架。文章強調了特征工程在推薦系統中的重要性，并提到大多數推薦系統競賽的勝者都是因為優秀的特征工程，而非模型架構。文中提到了一個案例研究，比較了使用不同方法進行特征工程的效果，展示了使用NVIDIA的Merlin框架和NVTabular庫進行特征工程和模型訓練的速度和效率優勢。例如，對于一個1TB的廣告數據集，使用NVTabular和HugeCTR進行特征工程和模型訓練，總共只需要5.2分鐘，而使用傳統的CPU方法則需要7.5天。此外，文章還提到了Dask庫，一個用于在GPU上進行數據處理的并行計算庫，以及一些特征工程的技術，如目標編碼、計數編碼、類別化操作等。最后，文章提供了一些額外的資源，包括NVIDIA的GTC會議視頻、開發者博客和GitHub上的NVTabular庫，并邀請讀者參與反饋調查。

"如何加速推薦系統特征工程？" "NVTabular如何優化GPU加速的數據處理？" "如何在推薦系統競賽中利用特征工程獲勝？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站