《OPPO研究院:2023多模態預訓練模型在OPPO端云場景的落地實踐報告(44頁).pdf》由會員分享,可在線閱讀,更多相關《OPPO研究院:2023多模態預訓練模型在OPPO端云場景的落地實踐報告(44頁).pdf(44頁珍藏版)》請在三個皮匠報告上搜索。
1、DataFunCon#2023多模態預訓練模型在OPPO端云場景的落地實踐陳宸-OPPO研究院-高級算法工程師Contents目錄端側圖文檢索技術研究圖文生成&理解模型的應用優化文圖生成模型的端側輕量化80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241
2、2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29端側圖文檢索技術研究80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-
3、11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29query1:和女朋友去迪士尼query2:山頂婚紗照一句
4、話搜索的意義:一句話搜索的意義:用戶體驗:真正解決用戶想什么就能搜什么想什么就能搜什么的痛點,“智慧搜圖,搜你所想”。依托于大模型預訓練大模型預訓練技術技術,不再依賴于標簽不再依賴于標簽的迭代和擴展的迭代和擴展https:/ CLIPCLIP(OpenAIOpenAI)的圖文理解能)的圖文理解能力力。其二,高效搜索速度高效搜索速度。相比動輒翻上十幾分鐘半個小時的相冊,現在無論從桌面下拉智慧搜索、打開相冊、或是用語音助手,都只需要一句話就能搜到想要的圖片,系統級地提升了找信息的效率。因此因此如何實現大模型在端側的輕量化部署有重大的意義如何實現大模型在端側的輕量化部署有重大的意義。大模型輕量化端側
5、技術落地的難點:大模型輕量化端側技術落地的難點:1.壓縮多模態大模型并確保精度確保精度。這并非簡單用剪枝或量化等方法,直接壓縮幾倍模型大小就能搞定。畢竟對于端側而言,算力有限的情況下,能部署的模型大小是往往只能達到大模型大模型的幾十分之一的幾十分之一。2.與算法模型升級相對應的,需要在端側實現一個性能魯棒的向量檢索引擎,保證大模型下端向量檢索引擎,保證大模型下端的工程性能的工程性能80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023
6、-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29端側圖文檢索技術研究算法優化CLIP雙塔模型ALBEF單流模型單雙流多教師蒸餾架構損失函數檢索引擎的計算分位兩部分:1.離線部分:掃描相冊所有圖片,通過圖片編碼器將所有圖片轉成向量;并經過fp16量化存儲成N
7、x200的fp矩陣2.在線部分:每次輸入query,通過文本編碼器將query轉成向量;并經過fp16量化降低計算內存;最后通過矩陣相乘計算query向量跟所有圖片的相似分數,并通過排序輸出一個有序列表。Lei,Youbo,et al.MCAD:Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval.arXiv preprint arXiv:2310.19654(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-
8、2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023
9、-11-2980387241 2023-11-29端側圖文檢索技術研究學術集效果各種蒸餾方法的效果對比Model nameimage modeltext modelfusion modelimage encoding timeretrieval timeparameter numbertestsetplatformCLIPVIT-L/1412-layer transformerdot product11.0ms32.5ms427.62Mfilckr5KV100 GPUALBEFVIT-B/166-layer transformer6-layer transformer7.6ms265ms(k=
10、16)1945ms(k=128)3865ms(k=256)419.12Mfilckr5KV100 GPU自研小模型mobileVitV2-1.54-layer TinyBertdoc product3.8 ms14.1 ms25.9 Mfilckr5KV100 GPU自研小模型mobileVitV2-1.54-layer TinyBertdoc product17.3 ms14.6 ms25.9 Mfilckr5KMTK DX3大小模型的性能對比80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2
11、980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-
12、11-29端側圖文檢索技術研究真實場景效果數據量:1111個用戶真實相冊共個用戶真實相冊共2 2萬萬+圖片圖片,手寫5400+query5400+query數據分布:測試集R1R5R10MRmAP010.47280.6710.74950.63110.6080020.49560.7580.82510.69290.5306030.40190.56650.61080.52640.4889040.45320.68470.73890.62560.6048050.58430.7530.79520.71080.6428060.53230.68550.750.65590.5890070.350.52940.6
13、0880.49610.4771080.64170.80830.84170.76390.5943090.59650.68420.71930.66670.5622100.51210.70590.76470.66090.5441110.56540.74180.7810.69610.6336平均0.48480.67680.73600.63250.584080387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 202
14、3-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29端側圖文檢索技術研究細粒度優化Doveh,Sivan,et al.Teaching stru
15、ctured vision&language concepts to vision&language models.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023.細粒度屬性詞替換+hard negative sampling+LwF抗遺忘算法80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-298
16、0387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29文圖生成&理解態模型的應用優化80387241 2023-11-298
17、0387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11
18、-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化中文文生圖大模型繼續預訓練如何做高質量低成本的繼續預訓練如何對齊中文的語言文化如何提升生成圖像的細節質量Parameter efficient adapterOrthogonal FinetuningQiu,Zeju,et al.Controlling text-to-image diffusion by orthogonal finetuning.Thirty-seventh Conference on Neural Information Pr
19、ocessing Systems.2023.80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 202
20、3-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29中文語境遷移效果圖文生成&理解模型的應用優化中文文生圖大模型繼續預訓練收斂速度80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-298038724
21、1 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化中文文生圖大模型繼續預訓練Finetune 模型鴛鴦雙棲蝶雙飛,滿園春色惹人醉LoRAControlnetSSD1.3B小模型一只
22、超級可愛的兔子穿著僧侶服裝,肖像照,皮克斯動畫SDXL inpainting青花瓷版的恐龍在長椅上西湖,塔和瀑布,日出江南,夏日湖畔的一個村莊一個漂亮的亞洲女孩,電影燈光3D電影,4k,高度細致,男人坐在馬桶上讀報帶著墨鏡的貓咪手里拿著劍,在惡魔城堡里,仙劍奇俠風格LatentCM80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-298038724
23、1 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化通用優化應用壁紙生成春節檔熱度top1春節檔熱度top3文生圖模型+超分辨率生成2k高清壁紙80387241 2023-11-29
24、80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-1
25、1-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化通用優化應用鎖屏雜志生成文生圖模型+微調LLAVA+LLM 生成圖文并茂的雜志Liu,Haotian,et al.Visual instruction tuning.arXiv preprint arXiv:2304.08485(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241
26、2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化通用優化應用Zhan
27、g,Pan,et al.Internlm-xcomposer:A vision-language large model for advanced text-image comprehension and composition.arXiv preprint arXiv:2309.15112(2023).Internlm-xcomposer訓練框架80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 20
28、23-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化-人像垂域AI模型畫人的幾個問題:1.人臉人手等身體部
29、位的崩壞。2.過于精致標準,渲染過度光滑,在質感上失真。3.細粒度屬性和文本描述的不對齊。80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241
30、 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化-人像垂域構建細粒度的人像屬性數據80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-298038724
31、1 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化-人像垂域U-Net中模塊與圖像中特征的對應關系,可用于指導LoRA微調的參數選擇厚
32、嘴唇薄嘴唇80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2
33、023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化-人像垂域小鼻子大鼻子80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-1
34、1-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化-人像垂域細眉毛粗眉毛80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-298
35、0387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11
36、-29圖文生成&理解模型的應用優化垂域優化-人像垂域垂域微調經驗:1.大量數據粗調,增加模型對新概念的泛化能力2.少量高質量數據精調,提升模型的圖片生成質量人臉修復邏輯:穿著華麗盔甲的玄幻戰士與巨龍激戰,雷霆與火焰交織在一起。(隨機6張,無cherry-pick)80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11
37、-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化-古風人像效果古道邊一騎紅塵客正巍然馬背,身披白色斗篷,踏寂靜落阿葉(隨機6張,無cherry-pick)樹叢中,翩翩少女,紅衣綠裙,手提花
38、傘,踏泥尋徑,仿佛踏入了一幅畫卷(隨機6張,無cherry-pick)80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2
39、980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化垂域優化應用廣告營銷工具(內測版)80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980
40、387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化文字渲染-問題定義如何在文生圖模型中渲染出正確的文字?Ma,Jian,et al.GlyphDraw:Learning
41、to Draw Chinese Characters in Image Synthesis Models Coherently.arXiv preprint arXiv:2303.17870(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241
42、 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化文字渲染-算法GlyphDraw訓練框架GlyphDraw推理框架數據集圖文對數量文字數量中文數據集792k3.3M 字英文數據集1.9M2.3M wordsMa,Jian,et al.GlyphDraw:L
43、earning to Draw Chinese Characters in Image Synthesis Models Coherently.arXiv preprint arXiv:2303.17870(2023).GlyphDraw數據集構建80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387
44、241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化文字渲染-客觀效果Ma,Jian,et al.GlyphDraw:Learning to Draw Chinese Characters in Image Synt
45、hesis Models Coherently.arXiv preprint arXiv:2303.17870(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2
46、023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化文字渲染-主觀效果Ma,Jian,et al.GlyphDraw:Learning to Draw Chinese Characters in Image Synthesis Models Coherently.arXiv preprint arXiv:2303.17870(2023).
47、80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-1
48、1-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-問題定義Ma,Jian,et al.Subject-diffusion:Open domain personalized text-to-image generation without test-time fine-tuning.arXiv preprint arXiv:2307.11410(2023).如何使用一張參考圖像快速生成新圖片并平衡保真度和泛化性?80387241 2023-11-2
49、980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-
50、11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-數據集SDD數據集統計數據SDD數據集詞云Ma,Jian,et al.Subject-diffusion:Open domain personalized text-to-image generation without test-time fine-tuning.arXiv preprint arXiv:2307.11410(2023).80387241 2023-11-2980387241 2023-11-2980387241
51、2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387
52、241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-算法Ma,Jian,et al.Subject-diffusion:Open domain personalized text-to-image generation without test-time fine-tuning.arXiv preprint arXiv:2307.11410(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29
53、80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個
54、性化生成-效果Ma,Jian,et al.Subject-diffusion:Open domain personalized text-to-image generation without test-time fine-tuning.arXiv preprint arXiv:2307.11410(2023).單實體生成與各種方法的對比雙實體生成與各種方法的對比80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29803
55、87241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-效果Ma,Jian,et al.S
56、ubject-diffusion:Open domain personalized text-to-image generation without test-time fine-tuning.arXiv preprint arXiv:2307.11410(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023
57、-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-效果Ma,Jian,et al.Subject-diffusion:Open domain personalized text-t
58、o-image generation without test-time fine-tuning.arXiv preprint arXiv:2307.11410(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-298038
59、7241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-效果Ma,Jian,et al.Subject-diffusion:Open domain personalized text-to-image generation without test-time fine-tuning
60、.arXiv preprint arXiv:2307.11410(2023).80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-
61、11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成應用廣告營銷工具外觀多角度生成品牌調性干預產品外觀描述生成產品外觀設計(從0-1設計)根據參照圖生成效果圖描述生成品牌調性/風格干預產品效果圖生成(工作室拍攝的效果圖)產品營銷素材生成(海報/banner)營銷文案&圖片生成素材布局生成布局描述生成參照物干預設計草圖生圖A yellow hatA girl wearing
62、the hat and facing forest選擇生成【海報】根據參考素材生成根據品牌VI,歷史產品調性生成產品設計80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 202
63、3-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-應用商品設計個性化圖片生成海報設計80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241
64、2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-應用Subject-diffusion的個性化
65、生成80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023
66、-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29圖文生成&理解模型的應用優化個性化生成-應用Stable-diffusion的outpainting80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11
67、-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29文圖生成模型的端側輕量化80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29
68、80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-1
69、1-29文圖生成模型的端側輕量化技術路線-模型結構優化Unet結構示意圖刪除某個模塊之后的效果和參數量波動分析模型采樣時間(DPMsolver+25步)運行內存UNet參數量SD 1.51.34s4105M859.52MSD base-2m0.9s3458M579.38MSD small-2m0.83s3287M482.35MSD tiny-2m0.76s2979M323.38MSD small0.88s3477M579.38MSD tiny0.75s3043M323.38M不同剪枝模型在V100上測試結果80387241 2023-11-2980387241 2023-11-29803872
70、41 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980
71、387241 2023-11-2980387241 2023-11-29文圖生成模型的端側輕量化技術路線-模型結構優化采用SDXL蒸餾SD small模型80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-
72、11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29文圖生成模型的端側輕量化技術路線-采樣加速Progressive distillationSalimans,Tim,and Jonathan Ho.Progressive distillation for fast sampling of diffusion models.arXiv
73、preprint arXiv:2202.00512(2022).Meng,Chenlin,et al.On distillation of guided diffusion models.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023.將兩次forward的CFG蒸餾成一次forward,即將scale用fourier embedding編碼,類似于timestep一樣嵌入unet中。Classifier-free guidance distillation803872
74、41 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980
75、387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29文圖生成模型的端側輕量化技術路線-效果對比研究機構運行硬件出圖時間運行內存數據出處AppleiPad Pro(M2)7.0shttps:/ 14 Pro Max7.9shttps:/ 8 Gen2 15shttps:/ S23 Ultra12shttps:/arxiv.org/abs/2304.11267Snap Inc.iPhone 14 Pro2shttps:/arxiv.org/abs/2306.00980高通驍龍 8 Gen30.6shtt
76、ps:/ 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29
77、80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29文圖生成模型的端側輕量化技術路線-效果對比SD fp32 dpm solver+25步SD_small W8A16 dpm solver+4步80387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-2980387241 2023-11-29