1、 Copyright IBM Corporation 2024The Future of RPG an AI Code Assistant!Steve Will,Distinguished EngineerIBM i CTO and Chief ArchitectJulio Sanchez Diaz,AI EngineerIBM Client Engineering,Latin AmericaYvonne(Wepner)Enselman,CTAL-TA,CTFLOneMain FinancialOctober 2024 Copyright IBM Corporation 2024 Copyri
2、ght IBM Corporation 2024 Copyright IBM Corporation 2024IBM i StrategyPower Solutions Enable clients to exploit latest Power technology Enable transformation of customer solutions with new value Mobile,Internet of Things,Cognitive,Machine Learning and AI Enable Solutions to modernize around services,
3、hybrid cloud and DevOpsOpen Platform for Choice Grow IBM i solutions options including Modern RPG&Open source technologies Flexible options On-prem and/or Hybrid Cloud-inside or outside Data Center Entice new talent with popular open languages and toolsThe Integrated Promise of IBM i Deliver a simpl
4、e,high value platform for business applications Provide exceptional security and resiliency for critical business data Leverage IBM systems,storage and software technologieshttps:/ IBM Corporation 2024IBM i StrategyPower Solutions Enable clients to exploit latest Power technology Enable transformati
5、on of customer solutions with new value Mobile,Internet of Things,Cognitive,Machine Learning and AI Enable Solutions to modernize around services,hybrid cloud and DevOpsOpen Platform for Choice Grow IBM i solutions options including Modern RPG&Open source technologies Flexible options On-prem and/or
6、 Hybrid Cloud-inside or outside Data Center Entice new talent with popular open languages and toolsThe Integrated Promise of IBM i Deliver a simple,high value platform for business applications Provide exceptional security and resiliency for critical business data Leverage IBM systems,storage and so
7、ftware technologiesAI that works with their solutions&dataAI optionsAI thats easy to deploy&manageAI that helps developers create solutions.56What about watsonx Code Assistant for Z?Copyright IBM Corporation 2024IBM i Clients Modernization,Skills and RPG7Which development languages do you use today
8、for new development on IBM i?PDFhttps:/ are your top 5 concerns as you plan your IT environment?Copyright IBM Corporation 20248PDFhttps:/ is your primary version of RPG for new development?Copyright IBM Corporation 20249 Copyright IBM Corporation 2024Terminology Generative AI10 Copyright IBM Corpora
9、tion 202411RPG Code Assistant Base FeaturesAn IBM i Code assistant tool should help programmers work with existing RPG Explain existing code Generate modern free-format ILE RPG based on a description Write test programs for RPG and,if possible,transform older RPG into modern,ILE-based free-format Co
10、pyright IBM Corporation 202412 Copyright IBM Corporation 2024Terminology Large Language Model(simpler)13https:/ IBM Corporation 2024https:/ Copyright IBM Corporation 2024https:/ Copyright IBM Corporation 2024IBM Granite Differentiators IBM Granite Differentiators Tokenized to 1+Trillion Tokens for T
11、raining7 Terabytes of English raw data6 Petabytes of multi-lingual raw data3.5 Terabytes of deduplicated data2.7 Terabytes of usable data2.4 Terabytes of usable data2.69 Terabytes of usable dataSmaller,more efficient models without sacrificing accuracyTrained on highly governed,cleansed and de-dupli
12、cated dataProvide full auditable data lineage to our clients Legal Indemnification for 3rd Party Copyright ClaimsIBM Blue PileIBM Foundation Models Copyright IBM Corporation 202417How we got here the StarCoder2 Model for RPGOur first effort began as a soft-research project:Explore challenges-Data co
13、llection and processingAvailabilityQuantity and Quality“Data Augmentation”-Hardware requirements-Model sizes-Specialized models vs.GeneralistsSupervised fine-tuningIs it possible to teach a model a new programming language that hasntseen before through SFT?Copyright IBM Corporation 202418How we got
14、here RPG Model EvaluationEvery ML model needs an evaluation metric.RPGEval-Evaluation framework to test functional correcteness.-Implemented with RPGUnit-Implements passk metric,k=130 Problems7 unit test cases per problem(avg.)-Other text similarity metrics were studied,such as BLEU,ROUGE,TER,CHRF a
15、nd TF-IDF.Copyright IBM Corporation 202419Results?StarCoderRPGLE-With a single NVIDIA L40s48GB VRAM-15B parameters-LoRA fine-tuning methodology-With around 2.2k training samples-Improvements over the base modelGeneration:2.57xExplanation:1.68xTranslation:4.72xRPG domain adaptation:Average performanc
16、e across all metrics Copyright IBM Corporation 202420StarCoder2RPGLENew RPG models:Average performance across all metrics-Foundation models released:StarCoder 2Llama 3-Now with RPG knowledge!-Fine-tuned StarCoder 2 with the same methodology-Power of fine-tuning Copyright IBM Corporation 202421Next s
17、tepsWhats next?State-of-the-art RPG model using Granite,withthe help of the RPG communityTry it out!Model available on HuggingFace Copyright IBM Corporation 202422 Copyright IBM Corporation 202423 Copyright IBM Corporation 202424 Copyright IBM Corporation 202426The Prompt(the request)drives the outp
18、ut Copyright IBM Corporation 202427 Copyright IBM Corporation 202428 Copyright IBM Corporation 202429How do you train a model to do that?Its not magic!But it feels like it!First Pre-training Books,manuals,etc.;taking advantage of LLM understanding human languageFine-Tuning,with pairs Block of Code;E
19、xplanation of Block of Code Helps the model look at code and then generate an English description.Also helps the model learn how to take English(together with other things like SQL)and create code.Block of Code;Block of Code which Tests the first block Trains the model to understand how to test vari
20、ous kinds of code.Block of old code;Block of new code which does the same thing Trains the model to transform old to new.Training based on unstructured(non-paired)input Large amounts of non-paired code Copyright IBM Corporation 2024Tasks and Flow:LLM Training for Use Case(e.g.Explain)301 Gather Pre-
21、training Materials2 Pre-train the model1 Create“Ground Truth”and validate with SMEs2Define Scoring 1 Define Batch of Input to LLM(Use Case 1:RPG to be Explained)3Evaluate Model:Run Input through model and score1 Create Fine-Tuning Input For Explain,(Code,Explanation)pairs4Fine-Tune Teach the model m
22、ore than it currently knows5Synthetic Input Generation for Explain(RPG,Explanation)pairs5Use contributions for more fine-tuning6Model Passes Evaluation:On all fine-tuning AND on“Ground Truth”4Go on to another Use Case.Its likely the model will be affected by future Use Case training re-evaluate&re-t
23、une Copyright IBM Corporation 2024IBM i Approach Involve the Community IBM i has launched a project to train a“ready for prime time”large language modelUsing the RPG code IBM has developedGetting advice and code from experts&Champions Soliciting code for training from the entire IBM i community Crea
24、ting an offering which will help clients of all sizes The project for a community-trained Large Language Model for RPG is underway!Copyright IBM Corporation 202432Find out more!Get involved!This takes you to a short survey that asks how you would like to be involved.Feel free to forward it to others
25、.SimpleClearLanguage based Source Code*Gherkin*UMLDesigned with adherence to QA standards via ISTQB protocolsRebecca WhittemoreScott KlementNature of the testing at playComponent Testing:Also known as unit testing.Focuses on testing components in isolation.Often requires specific support such as tes
26、t harnesses or unit test frameworks.Component testing is normally performed by developers in their development environment.Test Object Test Objectives Test Basis Defects and Failures Approach and ResponsibilitiesResults captured in the job logPotential Training Materials for Repository RPGLE Source
27、Code UML Code for:Entity Relationships Sequence Acceptance Criteria(Gherkin)Copyright IBM Corporation 202439Tentative Timeline for”RPG Code Assistant”Summer 2024 Gather training material IBM manuals,redbooks,presentations,test cases RPG Champion material,such as books Community-provided code Learn e
28、nough about model training to select the right development pathFall 2024 Initial training Includes baseline measurements of selected LLM“Pre-train”with large unstructured(un-paired)material Enter a“measure,fine-tune”cycle Talking to advisors in the community(“Sponsor Users”)Technical&Offering Early
29、2025“Alpha”the model with Technical Contributors/ChampionsLater 2025 is the goal“Beta”the model(and maybe the offering);then GA when the offering is ready Hope to make the trained LLM available outside the offering Copyright IBM Corporation 2024There is so much potential after the first functions!Fu
30、ture,more advanced versions of the LLM could be trained to address Converting S/36 RPG,RPG II,etc.into Free-Format ILE RPG Suggesting sections of code where open source languages would help Transforming old data definitions and access into modern Db2&SQL Helping when other languages,such as COBOL,ar
31、e the starting point and more!Copyright IBM Corporation 202441What have we decided?Whats left to decide?DecidedModel:Were starting with Granite 20bCollection of training material:on-goingLicense requirements:based on open license,but specific to use for trainingPublic or Private Contributions both a
32、re allowedNot Yet DecidedExact TimingDelivery/Offering to be determined with advice from the community Copyright IBM Corporation 202442Find out more!Get involved!This takes you to a short survey that asks how you would like to be involved.Feel free to forward it to others.Copyright IBM Corporation 2
33、02443Do you want to provide material to help us train the model?If you want to submit code,let us know you want to helpe-mail AIforIBM Agree to the licenseWell send it to you it says we can use your submitted material to train the model Submit codeInformation on how is found at:https:/ibm.github.io/
34、rpg-genai-data/#/You decide if others(outside IBM)can see&use your code or not At some point in the future,we will likely ask for volunteers to actually use the submissions in training&evaluating Copyright IBM Corporation 2024https:/ibm.github.io/rpg-genai-data/#/44https:/ibm.github.io/rpg-genai-dat
35、a/#/Copyright IBM Corporation 2024RPG Code Assist Project Details on Contributing using GitHub45https:/ibm.github.io/rpg-genai-data/#/Copyright IBM Corporation 2024IBM i&AI Three Clear Use Cases Trend analysis Anomaly detectionDb2 Data Analytics Active monitoring/alerting Self-healingOperations Help
36、 developer write code Understand codeDeveloper Experience Copyright IBM Corporation 2024 How will you use IBM i in 2025?Help the ecosystem around IBM i48Take the survey https:/ 2025 IBM i Marketplace Survey is Now Open!https:/ Copyright IBM Corporation 202449Find out more!Get involved!This takes you to a short survey that asks how you would like to be involved.Feel free to forward it to others.Copyright IBM Corporation 202450