《mPLUG:多模態對話大模型技術與應用解析.pdf》由會員分享,可在線閱讀,更多相關《mPLUG:多模態對話大模型技術與應用解析.pdf(41頁珍藏版)》請在三個皮匠報告上搜索。
1、DataFunSummit#2023mPLUG-多模態對話大模型技術與應用解析徐海洋阿里巴巴-達摩院-算法專家01多模態大模型技術發展歷程02多模態對話大模型mPLUG技術與應用解析03ModelScope實戰分享04mPLUG項目主頁目錄 CONTENTDataFunSummit#202301多模態大模型技術發展歷程多模態預訓練背景-下游任務多模態預訓練-發展歷程多模態預訓練發展歷程:18,19年基于檢測特征的兩階段方法;20,21年端到端方法,22年-23年大一統+Scaling up的方法;最近幾個月,多模態對話大模型;VQA Leaderboard:多模態最重要的榜單,現在已達到86.
2、06,mPLUGCVPR2021 VQA Challenge排名第一,并以81.26分的成績首次超越人類結果;Vision-Language Pre-training:Basics,Recent Advances,and Future Trends.多模態預訓練-發展歷程多模態對話大模型-GPT4視覺內容細粒度理解與推理多模態對話大模型-GPT4視覺內容富文本圖片表格理解與推理多模態對話大模型-GPT4視覺內容富文本圖片表格理解與推理多模態對話大模型-System CollaborationVisual ChatGPTMM-REACTHuggingGPT通過各種視覺的模型將視覺信息轉換為文本信
3、息,之后通過ChatGPT進行信息的整理與回復多模態對話大模型-End2EndMiniGPT-4LLAVAKosmos-1類GPT4,通過一個模型同時擁有多模態與文本的能力DataFunSummit#202302多模態對話大模型mPLUG技術與應用解析模塊化多模態模型mPLUG,大一統模型mPLUG-2mPLUG:Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.EMNLP2022.mPLUG-2:A Modularized Multi-modal Foundation Model
4、Across Text,Image and Video.ICML 2023.mPLUG系列多模態預訓練工作,借鑒人腦模塊化思想,針對不同模態input,不同模態output,不同模態特有屬性,不同功能(Understanding,Generation)設計不同的模塊,層次模塊化預訓練,這樣可以輕量化,可拆拔的靈活應用到各種Zero/Few-Shot/Continue Pretrain/下游Finetuning/多模態表征等層次化應用場景。模塊化多模態對話大模型mPLUG-Owl-ArenamPLUG-Owl:Modularization Empowers Large Language Mode
5、ls with MultimodalityArena:上海人工智能實驗室OpenGVLab組織的人工標注評測多模態LLM版單,mPLUG-Owl排名第一!模塊化多模態對話大模型mPLUG-Owl-應用場景mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-應用場景mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-OwlmPLUG
6、-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-訓練方法mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality 第一階段:對齊視覺和語義,訓練視覺基礎模塊和視覺摘要模塊 第二階段:通過單模態和多模態相互協同,通過LoRA指令微調模型,固定視覺基礎模塊、視覺摘要模塊和原始LLM的參數,只在LLM引入少量參數的adapter結構用于指令微調模塊化多模態對話大模型mPLUG-O
7、wl-性能評測mPLUG-Owl:Modularization Empowers Large Language Models with MultimodalitymPLUG-Owl構建了一個多模態指令評測集評測集OwlEvalOwlEval,以下是不同模型在OwlEval上的占比表現(A最好,D最差),評分指標:ABCD打分制A:聽懂指令,且回答滿意B:聽懂指令,但是回答部分會存在一些錯誤C:聽懂指令,但是回答錯誤或者用戶不滿意D:聽不懂指令或者無效的回答模塊化多模態對話大模型mPLUG-Owl-知識問答mPLUG-Owl:Modularization Empowers Large Langu
8、age Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-多輪對話mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-笑話理解mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-涌現能力mPLUG-Owl:Modularization Empowers Large Language
9、 Models with Multimodality在訓練中,mPLUG-Owl并未見過多圖和OCR的數據,以下為多圖理解的能力:模塊化多模態對話大模型mPLUG-Owl-涌現能力mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality簡單的DocumentAI能力(OCR)模塊化多模態對話大模型mPLUG-Owl-視頻理解mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-
10、Owl-多語言版本mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-多語言版本mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality中文視頻對話模型mPULG-VideomPLUG-Owl:Modularization Empowers Large Language Models with Multimodality模塊化多模態對話大模型mPLUG-Owl-消融實
11、驗mPLUG-Owl:Modularization Empowers Large Language Models with Multimodality訓練策略和多模態指令微調數據的有效性中文視頻預訓練數據集Youku-mPULGYouku-mPLUG:A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.中文視頻預訓練數據集Youku-mPULGYouku-mPLUG:A 10 Million Large-scale Chinese Video-Language Datas
12、et for Pre-training and Benchmarks.DataFunSummit#202303ModelScope實戰分享ModelScope-mPLUG中英模型mPLUG-Owl:Modularization Empowers Large Language Models with Multimodalityhttps:/ Empowers Large Language Models with Multimodalityhttps:/ Empowers Large Language Models with Multimodalityhttps:/ Empowers Large Language Models with Multimodalityhttps:/