1、eBay Payment Risk Intelligent Data LabelingImproving Label Data Efficiency,Accelerating AI Productionization魏瑤(Wendy Wei)eBay支付風控資深技術專家Agenda5ReflectionsData Labeling Importance and ChallengesIntelligent Data Labeling SystemLLM Based Smart Labeling Core TechnologiesCase Study Detect Off-ebay Transac
2、tion Model Labeling Data Labeling Importance and ChallengesData Labeling Solution and Services Market20242034$12.7B$92.4BMarket forecast to grow at CAGR 22%China Market Value(2034)US$10.1 BillionSource:FACT.MRLabels in AI/MLGround Truth for Model TrainingModel Accuracy and PerformanceFeature Underst
3、anding and EngineeringModel Evaluation and OptimizationeBay Payment Risk ML LabelsSystem LabelManual LabelebayChallenges in Data LabelingData Volume&EfficiencyData Quality&ConsistencyData Diversity&ScalabilityML Models require massive volumes of labeled data for training,while labeling is time-consu
4、ming and resource-intensiveMaintaining high annotation accuracy and ensuring consistency across different annotators and datasetsEnsuring a diverse and representative dataset that covers all possible scenarios and variations is complex and hard to scaleLLM Impacts on Data Labeling Rapid PrototypingI
5、ncreased EfficiencyCost ReductionConsistency in AnnotationIntelligent Data Labeling SystemUser Interface and Job Workflow ManagementLLM automated LabelingSystem Integration and ScalabilityLabel Data Quality AssessmentLabel Data Management Role Based Access Control ManagementKey FeaturesKey Features
6、User Interface and Job Workflow ManagementCreatorResource ManagerLabeling DataLabel StoreManage ProjectLaunch Labeling JobAnnotatorApprove/Assign JobsLLM RecommendationCopyright 1995-2024 eBay Inc.All Rights ReservedKey Features User Interface and Job Workflow ManagementCopyright 1995-2024 eBay Inc.
7、All Rights ReservedKey Features User Interface and Job Workflow ManagementCopyright 1995-2024 eBay Inc.All Rights ReservedKey Features User Interface and Job Workflow ManagementCopyright 1995-2024 eBay Inc.All Rights ReservedKey Features LLM Automated LabelingCopyright 1995-2024 eBay Inc.All Rights
8、ReservedKey Features LLM Automated LabelingKey Features System Integration and ScalabilityLabel Data HubModel TrainingModel SimulationModel Auto RefitML&Decision PlatformDecision SimulationArchitectureeBay Cloud PlatformDocument DBHadoopUser InterfaceRBACWorkflow ManagerLabel Data ManagementLLM Labe
9、ling eBay AI PlatformLLM Based Smart Labeling Core TechnologiesReinforcement Learning with Human Feedback Initial Golden Data SetPre-Training ModelLLM Labeld DataHuman ScoringRanked ResultReward Model TrainingRL-Fine tuneFinetuned ModelPrompt Engineering Few-Shot Prompt:Well-crafted samples to impro
10、ve accuracy and performance Prompt Chaining:Complex Labeling task which need multiple stepsInputStep1.1Final OutputFinal StepOutput1Output2Output XStep1.2Step1.NStep2.1Step2.2Step2.MStepX.1StepX.2StepX.YFinetune Domain Specific LLMWhy need a domain fine tuned LLM?Model Performance Data Security Cost
11、How we fine tuned LLM?PEFT(Parameter Efficient Fine-Tuning)method,LoRAFinal LLM PipelineUnlabeled DataLabeled DataHumanAnnotatorHigh ConfidenceLow ConfidenceManualLabelDomain Finetuned LLMModel IterationCase Study Detect Off-ebay Transaction Model LabelingDetect Off-eBay Transaction via MessageBackg
12、roundBuyers and sellers use messages to share contact information to conduct sales outside eBayOff-eBay sales pose significant risks,including potential fraud,undermining user protection,damaging brand reputation,and diminishing platform revenueExisting Online Bert ModelBert ModelOnline InferenceAll
13、ow/BlockPerformance:Inference Time P99 200ms Precision 95.5%and recall 84.3%Model Refit from Manual Label Phone/Email Text&ImageDetect Off-eBay Transaction via Messages Golden Labeled Data Set 30K Employing 3-shot prompt A multi-task way,detect phone and email at the same timeOnly For Demo UsageDete
14、ct Off-eBay Transaction via MessagesDo we need Fine Tuning model?Use open source LLaMA-13B w/o fine tune only get 14.2%Precision and 38.2%Recall 3 shots prompt improved performance but still away from fine tuned oneDetect Off-eBay Transaction via MessagesDo we need a larger LLM?By comparing various
15、LLM scales,fine-tuned Pythia-12b model showed high precision and comparable F1 scores to the Mixtral-8*7b model,leading us to select Pythia-12b as our auto label model.According to the information on the internet,Pythia-12b was trained from a very clean,well labeled dataset.Detect Off-eBay Transacti
16、on via MessagesShould we use Multi-task or Single-task to train the model?Since one message may consist email and phone num at the same time.Instead of training two separate models,train a multi-task model to get both results at the same time is more efficient.Detect Off-eBay Transaction via Message
17、s Unlabeled data volume has been reduced by 90%.Label time has been reduced from several weeks to 1-2 days.Bert Model Performance achieved 97.7%precision and 95.5%recall Reflections5ReflectionsApplication of RLHF and Finetune in intelligent data labeling has demonstrated significant impact in enhanc
18、ing efficiency and reducing reliance on human experts.Continuous improvements in model robustness and the integration of more sophisticated feedback mechanisms are promising directions for intelligent labeling.Intelligent Data Labeling in Financial Services IndustryFraud Detection(Credit Card Fraud,Insurance Claims)Customer ServicePersonalized Financial Product RecommendationCompliance Monitoring(Anti-Money Laundering)Data Labeling is the Cornerstone of AI.Intelligent Data Labeling is the Accelerator of AI.