當前位置：首頁 > 報告詳情

學習修剪和低秩自適應以實現緊湊語言模型部署.pdf

上傳人：蘆葦編號：651851 2025-05-01 PDF PDF 24頁 1.72MB

該報告所屬合集： 第三十屆亞洲及南太平洋設計自動化會議（ASP-DAC 2025）嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/24

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《學習修剪和低秩自適應以實現緊湊語言模型部署.pdf》由會員分享，可在線閱讀，更多相關《學習修剪和低秩自適應以實現緊湊語言模型部署.pdf（24頁珍藏版）》請在三個皮匠報告上搜索。

1、Copyright 2025 Arizona Board of RegentsLearning to Prune and Low-Rank Adaptation for Compact Language Model DeploymentAuthors:Asmer Hamid Ali(aali115asu.edu),Fan Zhang,Li Yang,Deliang FanEfficient,Secure and Intelligent Computing(ESIC)Laboratory(https:/faculty.engineering.asu.edu/dfan/)Arizona State

2、 UniversityCopyright 2025 Arizona Board of RegentsOutline1.Motivation and Problem StatementChallenges in deploying large pre-trained models.Limitations of existing methods.2.Key ContributionsOverview of proposed approach and its significance.3.Parameter-Efficient Fine-Tuning and Model PruningBackgro

3、und on PEFT techniques.Importance of structured pruning for efficiency.4.Methodology OverviewTrainable pruning masks.Integration with low-rank adaptation.5.Efficient Pruning and Low-Rank AdaptationDetailed explanation with equations and benefits.6.Experimental SetupModels,datasets,and evaluation met

4、rics.7.ResultsPerformance analysis and comparison with baselines.8.ConclusionSummary of contributions and future directions.Copyright 2025 Arizona Board of RegentsGrowing computational demands of large pre-trained models(LPMs).PEFT techniques address training overhead but fail to optimize inference

5、efficiency.Need for a compact and efficientdeployment-ready solution.Figure 1:Chart showing the growth in the size of models overtime with annotations on memory usage and limits of hardware(Source:LLM:The Rise of Data)Motivation and Problem StatementCopyright 2025 Arizona Board of RegentsGrowing com

6、putational demands of large pre-trained models(LPMs).PEFT techniques address training overhead but fail to optimize inference efficiency.Need for a compact and efficientdeployment-ready solution.Figure 2:Table comparing LLaMA-7B models with various PEFT methods,showing parameter reductions and accur

7、acy trade-offs.(Source:Charith Chandra Sai Balne et al.,Parameter Efficient Fine Tuning:A Comprehensive Analysis Across Applications,arXiv:2404.13506,2024)Motivation and Problem StatementCopyright 2025 Arizona Board of Regents1.Trainable Pruning MethodologyOptimizes the structure of LPMs during fine

8、-tuning.Includes learnable binary masks for channel-wise pruning.2.Low-Rank Adaptation IntegrationIncorporates low-rank adaptation to reduce computational overhead while maintaining accuracy.3.Efficiency GainsDemonstrates up to 18%speed-up in inference with real-world hardware.Figure 3:Proposed Appr

9、oachKey ContributionCopyright 2025 Arizona Board of RegentsParameter-Efficient Fine-Tuning and Model PruningCopyright 2025 Arizona Board of RegentsFigure 5:Structure of DoRA(Source:S.-Y.Liu et al.Dora:Weight-decomposed low-rank adaptation,2024.Figure 4:Structure of LoRA(Source:E.J.Huet al.Lora:Low-r

10、ank adaptation of largelanguage models,2021)Parameter-Efficient Fine-Tuning and Model PruningCopyright 2025 Arizona Board of RegentsFigure 6:Pruning TechniquesParameter-Efficient Fine-Tuning and Model PruningFigure 5:Structure of DoRA(Source:S.-Y.Liu et al.Dora:Weight-decomposed low-rank adaptation,

11、2024.Figure 4:Structure of LoRA(Source:E.J.Huet al.Lora:Low-rank adaptation of largelanguage models,2021)Copyright 2025 Arizona Board of RegentsMethodology OverviewFigure 5:Overview of the proposed approachCopyright 2025 Arizona Board of RegentsMethodology Overview1.Trainable Pruning Masks:Introduce

12、 binary masks to prune unimportant weights in both frozen and trainable components.Figure 5:Overview of the proposed approachCopyright 2025 Arizona Board of RegentsMethodology Overview1.Trainable Pruning Masks:Introduce binary masks to prune unimportant weights in both frozen and trainable component

13、s.2.Integration with Low-Rank Adaptation:Decompose weights into magnitude and direction using low-rank adaptation(based on DoRA).Optimize the pruning process by focusing only on magnitude vectors,minimizing training overhead.Figure 5:Overview of the proposed approachCopyright 2025 Arizona Board of R

14、egentsMethodology Overview1.Trainable Pruning Masks:Introduce binary masks to prune unimportant weights in both frozen and trainable components.2.Integration with Low-Rank Adaptation:Decompose weights into magnitude and direction using low-rank adaptation(based on DoRA).Optimize the pruning process

15、by focusing only on magnitude vectors,minimizing training overhead.3.Hardware-Compatible Compact Model:The final pruned model retains its compact structure.Achieves significant inference speed-up on commercial GPUs and CPUs.Figure 5:Overview of the proposed approachCopyright 2025 Arizona Board of Re

16、gentsEfficient Pruning and Low-Rank AdaptationCopyright 2025 Arizona Board of RegentsEfficient Pruning and Low-Rank Adaptation1.Integration of Pruning with Low-Rank Adaptation:Use a trainable binary mask(mb)to optimize the magnitude vector in the DoRA framework.Low-rank adaptation ensures computatio

17、nal efficiency while maintaining accuracy.2.Low-Rank Adaptation:Copyright 2025 Arizona Board of RegentsEfficient Pruning and Low-Rank Adaptation1.Integration of Pruning with Low-Rank Adaptation:Use a trainable binary mask(mb)to optimize the magnitude vector in the DoRAframework.Low-rank adaptation e

18、nsures computational efficiency while maintaining accuracy.2.Low-Rank Adaptation:Pruned Weight Update:Copyright 2025 Arizona Board of RegentsExperimental Setup1.Model Used:DistilBERT,RoBERTa(RoBbase),and LLaMA-7B2.Dataset:a.GLUE Benchmark for DistilBERT and RoBERTa.b.Commonsense reasoning datasets(e

19、.g.,BoolQ,PIQA,ARC)for LLaMA-7B.3.Hardware and Training Details:a.GPU:NVIDIA A5000b.Batch size:8,mixed precision for efficiency.c.Optimizer:Adam,learning rate fine-tuned during stages(e.g.,5e-5 1e-5).Copyright 2025 Arizona Board of RegentsResults for LLaMA-7B1.Accuracy Gains:Pruned-LLaMA-7B(Ours)ach

20、ieves a competitive average accuracy(62.77%),outperforming most models like LLM-Pruner and LoRAPrune.2.Model Size Efficiency:Reduces the number of trainable parameters to 5.09B,compared to the baselines 6.74B,while maintaining comparable performance.3.Task-Specific Highlights:a.Excels in PIQA(79.54%

21、),outperforming all baselines.b.Strong performance on BoolQ(72.12%)and WinoGrande(67.93%)compared to LLM-Pruner and LoRAPrune.Copyright 2025 Arizona Board of RegentsResults for RoBbase1.Accuracy Gains:Pruned-RoBbase(Ours)achieves the highest average accuracy(87.46%),outperforming the baseline and Lo

22、RA methods.2.Model Size Efficiency:Pruned-RoBbasereduces model size to 429.3 MB,significantly smaller than the baseline RoBbase(476.84 MB).3.Task-Specific Highlights:Achieves the best accuracy on SST-2(95.3%)and RTE(83.8%)while maintaining competitive performance on other tasks.Copyright 2025 Arizon

23、a Board of RegentsResults for DistilBERT1.Accuracy Gains:Pruned-DistilBERT(Ours)achieves the highest average accuracy(82%),outperforming other methods.Consistently performs better across tasks like RTE(65.2%),SST-2(91.4%),and MNLI(83.4%).2.Model Size Efficiency:Pruned-DistilBERT achieves an efficien

24、t model size of 222.85 MB,smaller than most other methods like LoRA(258.32 MB)and DoRA(268.5 MB).3.Improved Performance vs.Baseline:Outperforms DistilBERT Baseline by 6.4%in average accuracy(82%vs.79.6%).Copyright 2025 Arizona Board of RegentsAnalysis of Sparsity and Inference GainsCopyright 2025 Ar

25、izona Board of RegentsConclusionSummary of Contributions:Introduced a novel trainable pruning methodology for structured pruning.Integrated pruning with low-rank adaptation to reduce computational costs while maintaining accuracy.Key Results:Up to 24.5%sparsity across layers.Up to 18%inference speed

26、-up on real-world hardware.Broader Impact:Enables practical deployment of large pre-trained models in resource-constrained environments.Balances performance,efficiency,and deployment feasibility.Copyright 2025 Arizona Board of RegentsReferencesE.J.Hu et al.Lora:Low-rank adaptation of large language

27、models,2021.S.-Y.Liu et al.Dora:Weight-decomposed low-rank adaptation,2024.V.Sanh et al.Distilbert,a distilled version of bert:smaller,faster,cheaper and lighter,2020.Y.Liu et al.Roberta:A robustly optimized bert pretraining approach,2019.H.Touvron et al.Llama:Open and efficient foundation language

28、models,2023.X.Ma et al.Llm-pruner:On the structural pruning of large language models,2023.M.Zhang et al.Loraprune:Pruning meets low-rank parameter-efficient finetuning,2023M.Xia et al.Sheared llama:Accelerating language model pre-training via structured pruning,2024E.Jang et al.Categorical reparamet

29、erization with gumbel-softmax,2017.T.Chen et al.Lorashear:Efficient large language model structured pruning and knowledge recovery,2023.H.Zhou et al.Lora-drop:Efficient lora parameter pruning based on output evaluation,2024.M.Valipour et al.Dylora:Parameter efficient tuning of pre-trained models usi

30、ng dynamic search-free low-rank adaptation,2023.Copyright 2025 Arizona Board of RegentsThank You!Any Questions?AcknowledgementThis work is supported in part by the National Science Foundation under Grant No.2314591,No.2505326,No.2452573,No.2452657,No.2503906,and No.2505209.Contact information:Asmer Hamid Ali:aali115asu.eduDeliang Fan:dfanasu.eduCopyright 2025 Arizona Board of Regents

相關圖表

本文主要探討了大型預訓練模型（LPMs）部署中面臨的計算挑戰及現有方法的局限性，并提出了一種高效、緊湊的語言模型部署解決方案。作者提出了一種參數高效的精細調整和模型修剪方法，結合了可訓練的修剪掩碼和低秩適應技術。該方法優化了預訓練模型在部署過程中的結構，減少了訓練開銷，同時保持了準確性。實驗表明，該方法在多種任務中取得了競爭力，相較于基準模型，實現了更小的模型尺寸和更快的推理速度。例如，Pruned-LLaMA-7B在平均準確性上達到62.77%，Pruned-RoBbase在SST-2和RTE任務上取得了最佳準確率，Pruned-DistilBERT在多個任務上表現優于其他方法?？傊?，該研究為LPMs在資源受限環境中的實際部署提供了有效的解決方案，平衡了性能、效率和部署可行性。

"如何實現模型壓縮與高效部署？" "低秩適應技術在語言模型中的應用" 原理與實踐"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站