1、AliceMind純文本和多模態生成預訓練技術及應用李晨亮高級算法工程師|01純文本生成預訓練PALM2.002多模態統一生成預訓練模型0304總結目錄 CONTENT|生成預訓練業務應用純文本生成預訓練PALM2.001|樣例:生成式問答(GenerativeQA)swer.Seo et al.(2017)proposed BiDAF that representscontext at different levels of granularity and uses the bi-directional attention flow mechanism for answer extracti
2、on.SLQA(Wang,Yan,and Wu 2018)improves answer qual-ity with a hierarchical attention fusion network in whichattention and fusion are conducted horizontally and verti-cally across layers between the question and the paragraph.Recently,we see emerging BERT-based models(Devlin etal.2018)which are proven
3、 effective for reading compre-hension.Multi-paragraph reading comprehension has alsoattracted interest from the academic(Yan et al.2019)andindustrial community(He et al.2018).Sequence-to-sequence QA.The sequence-to-sequencearchitecture has been broadly used in a variety of QAtasks without reading co
4、ntextual paragarphs.GenQA(Yinet al.2016)combines knowledge retrieval and sequence-to-sequence learning to produce fluent answers,but it onlydeals with simple questions containing one single fact.COREQA(He et al.2017)extends it with a copy mecha-nism,and can answer an information inquired question(i.
5、e.,a factual question containing one or more topic entities).In contrast,Fu and Feng(2018)introduced a new attentionmechanism that explores heterogeneous memory for answersentence generation.The new attention encourages the de-coder to actively interact with the memory in the memory-augmented encode
6、r-decoder framework.Moreover,Tao etal.(2018)proposed a multi-head attention mechanism tocapture multiple semantic aspects of a given query and gen-erate an informative response in a dialogue system.Natural Answer Generation.There have been severalattempts at using machine reading to generate natural
7、 an-swers.Tan et al.(2018)took a generative approach wherethey added a decoder on top of their extractive model toleverage the extracted evidence for answer synthesis.How-ever,this model still relies heavily on the extraction to per-form the generation and thus needs to have start and endlabels(a sp
8、an)for every QA pair.Mitra(2017)proposed aseq2seq-based model that learns alignment between a ques-tion and passage words to produce rich question-aware pas-sage representation by which it directly decodes an answer.Gao et al.(2019)focused on product-aware answer gener-ation based on large-scale unl
9、abeled e-commerce reviewsand product attributes.Furthermore,natural answer gener-ation can be reformulated as query-focused summarization(QFS)which is addressed by Nema et al.(2017)as well asHasselqvist,Helmertz,and K ageb ack(2017).Recently,theMasque model from Nishida et al.(2019)explored the idea
10、of copying words from questions to answers with a mixtureof multiple distributions.Our model differs from Masque inthat LatentQA models inherent uncertainty with a stochas-tic mechanism and integrates a dedicated selector networkto exploit all three information sources for well-formed an-swer genera
11、tion.Pre-trained contextualized representations,such as ELMo(Peters et al.2018)used in Masque,can bereadily plugged into the stochastic selector networks for fur-ther improvement.ContextualPassageBake sirloin steaks in the oven at425 degrees Fahrenheit for 30minutes until they are cooked toyour desi
12、red taste.Baking sirloinsteaks decreases the moistureavailable in the steaks.The oventends to dry the meat out if youdo not take the time to marinateappropriately.QuestionHow long to cook sirloin steak?GeneratedAnswerIt takes 30 minutes to cooksirloin steak in the oven at 425degrees Fahrenheit.Table
13、 1:A sample well-formed answer with words in greenfrom the vocabulary,words in red from the paragraph,andwords in blue from the question.Well-formed Answer GenerationWell-formed answer generation is a question answeringparadigm where a QA model is expected to answer a givenquestion in a way that is
14、understood without perfect context.More formally,let(q,a,p)denote an instance from a QAdataset of N instances,where q denotes a question,a de-notesananswer,andpdenotesaparagraph.Well-formedan-swer generation aims to produce an abstractive answer a to agiven question q based on the content in paragra
15、ph p.Differ-ent from extractive QA,generated answer a does not have tobe a sub-span in paragraph p.Instead,answer a is supposedto be formed in natural language,and to make sense withoutthe context of either question q or paragraph p.LatentQAIn composing a well-formed answer a,our QA model,LatentQA,r
16、ecurrently selects words at the decoding stage.Traditional QFS and answer generation models select wordsfrom either vocabulary v alone(Tan et al.2018;Nema etal.2017)or a combination of vocabulary v and paragraphp(Mitra 2017).However,when it comes to generating well-formed answers,the two sources v a
17、nd p are often insuffi-cient to provide the answers with proper context.In contrast,LatentQA employs a novel stochastic selec-tor network for answer composition,which allows answerwordstocomefromthreedifferentsources:questionq,para-graph p,and vocabulary v.Table 1 shows a specific exam-ple of a well
18、-formed answer generated by selecting wordsfrom the three sources.An overview of the architecture ofLatentQA is depicted in Figure 1.Sequence-to-sequence modelLatentQA is built upon an extension of the sequence-to-sequence model(Bahdanau,Cho,and Bengio 2015;Nalla-pati et al.2016;See,Liu,and Manning
19、2017).The wordsof question q and paragraph p are fed one-by-one into twodifferent encoders,respectively.Each of the two encoders,which are both bidirectional LSTMs,produces a sequence很多文本生成任務,需要模型具備強大的理解能力,在充分的理解給定輸入內容之后再生成 生成式摘要(Abstractive Summarization)機器翻譯(Machine Translation)生成式問答(Generative QA
20、)問題生成(Question Generation)文本生成任務|圖片使用預訓練生成模型現有的預訓練生成模型 GPT,GPT-2,GPT-3:只有一個單向decoder,缺少一個encoder去更好的雙向理解文本 MASS,BART:對輸入文本做一些處理(打亂、遮蔽)等,讓decoder去還原這些輸入信息 Prophnet:提出一個新的自監督目標函數,通過預測未來的N元組提升生成的全局一致性PALM1.0的出發點 上述預訓練生成模型較少的關注encoder的理解能力,且預訓練任務與下游基于理解的生成任務關系較小 PALM預訓練模型專門為基于理解的生成任務設計模型結構訓練任務|圖片使用PALM1
21、.0模型:聯合自編碼和自回歸預訓練自編碼預訓練 自編碼訓練(Auto-encoding Pre-training),通過上下文信息重建原始輸入被遮蔽的文本,如BERT 預訓練可以得到很好的理解模型,但不太適合用于上下文無法獲取到的生成場景自回歸預訓練 自回歸訓練(Auto-regressive Pre-training),通過單向的編碼從左到右生成文本,如GPT 不太適用于需要文本雙向理解能力的下游任務場景PALM PALM模型結合了自編碼和自回歸預訓練,自回歸生成文本建立在自編碼預訓練得到的雙向理解文本的基礎上 Encoder:引入自編碼的預訓練增強encoder對輸入文本的上下文理解能力
22、Decoder:引入自回歸任務,基于encoder對輸入上下文理解來生成Bi B,Li C,Wu C,et al.PALM:Pre-training an Autoencoding&AutoregressiveLanguage Model for Context-conditioned GenerationJ|圖片使用PALM1.0實驗生成式QA任務 MARCO QA+NLG榜單less relevant to input text,since the MASS modelwas trained on individual sentences.In the firstexample,it is
23、 interesting to observe that in addi-tion to summarizing the input content,PALM isable to make a non-trivial inference of the expectedauction price and the final selling price of the car(might not be factually accurate though).An infer-ence is also made by PALM in the second examplein addition to su
24、mmarization,although the CapeVerde Islands National Park does not really exist.These examples demonstrate that PALM pre-training has learned to infer and to reason fromthe input text.Although in the pre-training phasethe generated content may not be factually accu-rate in the absence of rich context
25、,the capability ofinference can be transferred downstream by fine-tuning on specific generation tasks.3.3Fine-tuning on Generative QAWe also experiment with fine-tuning PALM on sev-eral downstream generation tasks.The MARCObenchmark(Nguyen et al.,2016)released by Mi-crosoft is a good fit for evaluat
26、ing generative QAmodels.In the MARCO dataset,the questions areuser queries issued to the Bing search engine andthe contextual passages are from real web docu-ments.The data has been split into a training set(153,725 QA pairs),a dev set(12,467 QA pairs)and a test set(101,092 questions with unpublishe
27、danswers).To evaluate the generative capability,wefocus on the Q&A+Natural Language Generationtask,the goal of which is to provide the best answeravailable in natural language that could be used bya smart device/digital assistant.The answers are human-generated and not neces-sarily sub-spans of the
28、contextual passages,so weuse the ROUGE-L(Lin,2004)metric for our eval-uation to measure the quality of generated answersagainst the ground truth.We fine-tune the pre-trained PALM on theMARCO training set for 10 epochs.We set thebatch size to 64,the learning rate to 1e-5,and themaximum input length t
29、o 512.The other hyper-parameters are kept the same as pre-training.Infine-tuning PALM,the encoder takes as inputxacontextual passage concatenated with a question atthe end,and the decoder takes an answer as inputy.During decoding,we use beam search with a beamof size 5.Table 2 presents the answer ge
30、neration resultson the test set obtained from the official MARCOMethodRouge-LConZNet(Indurthi et al.,2018)0.421Reader-Writer0.439KIGN-QA0.441SNET+CES2S0.450Communicating BERT0.483VNET(Wang et al.,2018)0.484Selector NLGEN0.487BERT+Multi-Pointer0.495Masque(Nishida et al.,2019)0.496PALM0.498Table 2:Tes
31、t results of answer generation on the offi-cial MARCO leaderboard as of December 9,2019.leaderboard.PALM achieves the 1st place on theleaderboard,outperforming all competing meth-ods in generation quality.Note that PALM pre-trains a single model,while some of the top-performing methods are ensemble
32、models,suchas Masque,on the leaderboard.Crucially,the su-periority of PALM-single over Masque-ensemblewith pre-trained ELMo(Peters et al.,2018)andBERT-based methods clearly demonstrates the ef-fectiveness and generalizability of PALM over theother pre-training approaches in language model-ing.3.4Fin
33、e-tuning on SummarizationText summarization produces a concise and fluentsummary conveying the key information in the in-put(e.g.,a news article).We focus on abstractivesummarization,a generation task where the sum-mary is not constrained to reusing the phrases orsentences in the input text.We condu
34、ct experi-ments on both the CNN/DailyMail dataset(Her-mann et al.,2015)and the Gigaword dataset(Graffand Cieri,2003).The CNN/DailyMail dataset con-tains 93K news articles from CNN and 220K arti-cles from Daily Mail,while the Gigaword datasetconsists of a total of 3.8M article-title pairs.Wetake the
35、articles as the input to the encoder and thesummary for the decoder.We adopt the same opti-mization hyperparameters from generative QA fine-tuning for the summarization task.The F1 scoresof Rouge-1,Rouge-2 and Rouge-L are reported onthe test set of both datasets for evaluation.Table 3 shows the resu
36、lts of abstractive sum-marization on the CNN/DailyMail test set andthe Gigaword test set.PALM achieves betterperformance than all strong summarization mod-|圖片使用PALM1.0實驗摘要任務CNN/Dail MailCNN/DailyMailGigawordRG-1RG-2RG-LRG-1RG-2RG-LBERTSUMABS(Liu and Lapata,2019)41.7219.3938.76-MASS(Song et al.,2019)
37、42.1219.5039.0138.1319.8135.62UniLMLARGE(Dong et al.,2019)43.3320.2140.5138.4519.4535.75T5LARGE(Raffel et al.,2019)42.5020.6839.75-BARTLARGE(Lewis et al.,2019)44.1621.2840.90-PEGASUS(Zhang et al.,2019)44.1721.4741.1139.1219.8636.24ERNIE-GENLARGE(Xiao et al.,2020)44.0221.1741.2639.2520.2536.53PALM42.
38、7119.9739.7138.7519.7935.98PALMLARGE44.3021.1241.4139.4520.3736.75Table 3:Results of abstractive summarization on the CNN/DailyMail test set and the Gigaword test set.RG isshort for ROUGEels with pre-training recently proposed,includingUniLM(Dong et al.,2019),T5(Raffel et al.,2019),BART(Lewis et al.
39、,2019),PEGASUS(Zhanget al.,2019)and ERNIE-GEN(Xiao et al.,2020).By consistently outperforming the pre-trainingmethods,PALM confirms its effectiveness in lever-aging unsupervision signals for language genera-tion.3.5Fine-tuning on Question GenerationWe conduct experiments for the answer-aware ques-ti
40、on generation task.Given an input passage and ananswer span,question generation aims to generatea question that leads to the answer.Following thepractice in(Zhao et al.,2018;Dong et al.,2019),we use the SQuAD 1.1(Rajpurkar et al.,2016)dataset,and the BLEU-4,METEOR and ROUGE-L metrics for evaluation.
41、As shown in Table 4,PALM outperforms all pre-vious question generation systems and achievesa new state-of-the-art result on BLEU-4 andROUGE-L for question generation on the SQuAD1.1 dataset.MethodBLEU-4MTRRG-LCorefNQGa15.1619.12-MP-GSNb16.3820.2544.48UNILMc22.8824.9451.80ERNIEd22.2825.1350.58ERNIE-G
42、ENLARGEd24.0326.3152.36PALM22.7825.0250.96PALMLARGE24.1125.8552.38Table 4:Question generation results on the SQuADdataset.MTR is short for METEOR and RG is shortfor ROUGE.a(Du and Cardie,2018);b(Zhao et al.,2018);c(Dong et al.,2019);d(Xiao et al.,2020).3.6Fine-tuning on Response GenerationConversati
43、onal response generation aims to pro-duce a flexible response to a conversation(Vinyalsand Le,2015).Following MASS,we conductexperiments on the Cornell Movie Dialog cor-pus5(Danescu-Niculescu-Mizil and Lee,2011)that contains 140K conversation pairs,and use thetraining/test splits provided by the dat
44、aset.Thesame training hyperparameters from generative QAfine-tuning are adopted on the response generationtask.We report the results in perplexity follow-ing(Vinyals and Le,2015)(lower is better).We compare PALM with the competing meth-ods including the baseline trained on the datapairs available an
45、d the pre-trained BERT+LM andMASS.Following MASS,we train every model on10K pairs randomly sampled and all 110K trainingpairs.As shown in Table 5,PALM significantlyperforms better than all the competitors by a largemargin on both the 10K and 110K data,demon-strating its capability in generating resp
46、onses tocontext thanks to its new pre-training objectives.3.7Ablation StudiesWe conduct ablation studies to assess the individualcontribution of every component in PALM.Table 6reports the results of full PALM and its ablationson the CNN/Daily Mail summarization dataset.We evaluate how much the point
47、er-generator net-work contributes to generation quality by remov-ing it from PALM pre-training.This ablation re-sults in a drop from 39.71 to 39.49 on Rouge-L,demonstrating the role of the pointer-generator ingenerative modeling.Given the slight drop,onemay choose to exclude it from the full model f
48、or5https:/ Data)(110K Data)Baseline82.3926.38BERT+LM80.1124.84MASS74.3223.52PALM45.4321.98Table 5:Results of conversational response generationin terms of perplexity on Cornell Movie Dialog corpus(lower is better).training efficiency.In our experiments,the pointer-generator is used in every generati
49、on task for opti-mal generation performance.To study the effect of the pre-trained encoderand decoder in PALM,we ablate autoencoding andautoregression by randomly initializing the weightsof the encoder and the decoder,respectively.Theautoencoding and autoregression components bothprove to be critica
50、l with significant drops on thethree Rouge metrics after the ablation.Finally,westudy the significance of full PALM pre-training.Over 6.5%of performance degradation resultedfrom ablating pre-training clearly demonstrates thepower of PALM in leveraging an unlabeled corpusfor downstream generation.4Re
51、lated WorkELMo(Peters et al.,2018)is an early promi-nent pre-training method based on bidirectionalLSTMs.It concatenates left-only and right-onlyrepresentations,but does not pre-train interactionsbetween these features.GPT(Radford,2018),GPT-2(Radford et al.,2019)and GPT-3(Brown et al.,2020)are propo
52、sed to base language modelingon the Transformer architecture,and use only theTransformer decoder for pre-training.Edunov etal.(Edunov et al.,2019)examine different strate-gies(e.g.,ELMo)to add contextualized embed-dingstosequence-to-sequencemodels,andobservethe most improvement by adding the learned
53、 em-beddings to the encoder.BERT(Devlin et al.,2018)introduces MaskedLanguage Modelling,which allows pre-trainingto learn interactions between left and right con-text words.Recent work has shown that verystrong performance can be achieved by training forlonger(Liu et al.,2019),by tying parameters ac
54、rosslayers(Lan et al.,2019),and by masking spans in-stead of words(Joshi et al.,2019).However,BERTdoes not make predictions autoregressively,so it isnot effective for generation tasks.AblationRG-1RG-2RG-LPALM42.7119.9739.717 pointer-generator42.5419.8639.497 autoencoding41.7819.3238.817 autoregressi
55、on41.8919.4838.927 pre-training40.3217.7837.12Table 6:Ablation tests of PALM on the CNN/DailyMail summarization dataset.UniLMs(Dong et al.,2019;Hangbo et al.,2020)fine-tune BERT with an ensemble of masks,some of which use only leftward context,allowingUniLMs to be used for generation tasks.A differ-
56、ence between UniLMs and PALM is that UniLMsare not fully autoregressive in the pre-training pro-cess.In contrast,PALM reduces the mismatchbetween pre-training and context-conditioned gen-eration tasks by forcing the decoder to predict thecontinuation of text input on an unlabeled corpus.MASS(Song et
57、 al.,2019)and BART(Lewiset al.,2019)are the two pre-training methods mostsimilar to PALM.In MASS,an input sequencewith a masked span of tokens is mapped to a se-quence consisting of the missing tokens,whereasBART is trained to reconstruct the original textfrom corrupted input with some masked tokens
58、.The difference in input&output representationsbetween PALM and MASS&BART is detailed inSection 2.2.5ConclusionsIn this work,we propose PALM,a novel approachto pre-training an autoencoding and autoregressivelanguage model on a large unlabeled corpus,de-signed to be fine-tuned on downstream generatio
59、nconditioned on context.It is built upon an ex-tension of the Transformer encoder-decoder,andjointly pre-trains the encoder and the decoder inan autoencoding denoising stage followed by anautoregressive generation stage.PALM significantly advances the state-of-the-artresults on a variety of context-
60、conditioned genera-tion applications,including generative QA(Rank 1on the MARCO leaderboard),abstractive summa-rization,question generation,and conversationalresponse generation.It has been shown in priorwork(Liu et al.,2019)that training for more stepsover a larger corpus can potentially improve th
61、eperformance of pre-training.Our future work willexplore the potential of training PALM for longeron much more unlabeled text data.Ablation Study|圖片使用PALM2.0模型:Motivation PALM1.0的生成預訓練任務難度相比于其他生成預訓練難度較大,收斂后acc只有0.45 多階段多任務漸進式預訓練,進一步提升模型的生成能力,縮小預訓練和微調之間的gap?!#!$!#!$!%!%!&(a)GPT:Tokens are predicted a
62、utoregressively,meaningthat GPT can be used for generation.However,it lacks anencoder to condition generation on context.?!?!#!$?!%!#?!&!&(b)MASS:It is based on the encoder-decoder architecture,but the decoder predicts only the tokens that are masked outin the text input to the encoder.?!?!#!$?!%!$!
63、%!$!&!&!#(c)BART:Rather than masked tokens,the decoder recon-structs the original full sentence from the corrupted input tothe encoder.However,it mismatches with most downstreamgeneration which is more than reconstructing original input.?!?!#!$?!%!&?()*()*+(d)PALM:The encoder predicts masked tokens
64、by encodingcontext bidirectionally,and the decoder predicts the textsegment subsequent to the context.It forces the model tolearn to comprehend the context for generating relevant text.Figure 1:A schematic comparison of PALM with GPT,MASS and BART.autoencoder to reconstruct the original textfrom cor
65、rupted context in which random to-kens are sampled and replaced withMASKsymbols following BERTs practice(Devlinet al.,2018).The training optimizes the cross-entropy reconstruction loss between encodersoutput and original context,as Masked Lan-guage Modeling(MLM)in BERT.By pre-dicting the actual toke
66、ns in context that aremasked out,PALM forces the encoder to com-prehend the meaning of the unmasked tokensand the full context.2.The encoder and decoder are then jointlytrained to autoregressively generate text out-put out of the context representations fromthe encoder.The training maximizes the log
67、-likelihood of the text in ground truth from thedecoders output:L()=X(x,y)2(X,Y)lognYt=1P(yt|yt,x;),(1)whereXrepresents the set of context andYrepresents the set of text to be generated.By conditioning the generation on contextrepresentations,PALM forces the decoder torely deeply on the context inst
68、ead of preced-ing generated tokens in next token prediction,which facilitates context-sensitive generation.2.2Input&Output RepresentationsIn the phase of model pre-training,input and out-put representations are tailored to minimize the dis-crepancy between self-supervised pre-training andsupervised
69、fine-tuning.In a typical downstreamgeneration task(e.g.,abstractive summarizationand generative QA),context is given as a ratherlong passage,and a model is asked to generate ashorter piece of text based on the comprehensionof the context.Given a contiguous text fragment of lengthL(composed of a few
70、sentences)from an unlabeledcorpus,PALM uses the consecutive span of length80%Lfrom the beginning of the fragment as con-text input to the encoder,and uses the remainderof text span of length20%Las text output tobe generated by the decoder.This representationdesign mimics the input and output of down
71、streamtasks,with the hypothesis that human-written textis coherent and thus the subsequent text span oflength20%Lcaptures the comprehension of thepreceding context span.In this way,PALM learnsto infer the subsequent text content from the pre-ceding content.The collection of text fragments are constr
72、uctedfrom a corpus by following the practice of BERT.In our experiments,we set the maximum length ofa fragment to be 500,i.e.,L 500.Therefore,thecontext input consists of at most 400 tokens,andthe text output consists of at most 100 tokens.|圖片使用PALM2.0模型:多階段漸進式預訓練 多階段多任務漸進式預訓練,從易到難,從任務無關到任務相關|圖片使用PA
73、LM2.0模型:多階段漸進式預訓練Pre-training ObjectivesDuReaderQG-robust(Bleu-4)CSL(Rouge-L)ADGEN(Bleu-4)LCSTS(Rouge-L)+Word-level Fill-mask37.10-40.98+Text infiling&Sentence Shuffle42.563.211.542.1+Auto-regressive Generation43.064.411.342.6|圖片使用PALM2.0模型:多階段漸進式預訓練ModelLCSTS(8k train,Rouge-L)LCSTS(8k train,Rouge-L
74、)PALM-Base30.3023.11+Task specific Pre-training(Similar to PEGASUS)32.0527.24|圖片使用PALM2.0實驗ModelDuReaderQG-robust(Bleu-4)CSL(Rouge-L)ADGEN(Bleu-4)LCSTSmT5(S)-56.710.233.5BART(B)-62.19.937.8CPT(B)-63.09.838.2PALM2.0-Base42.163.410.939.7CPM-2-10.635.9mT5(B)-61.8-36.5ERNIE-2.0 Large39.3-41.4RoBERTa Lar
75、ge37.1-41.0BART Large-64.210.040.6CPT Large-63.710.742.0PALM2.0-Large43.064.411.342.6在中文多個生成數據集上對比,PALM2.0 base/large均高于同等規模下的其他SOTA模型,mT5,BART,CPT等多模態統一生成多模態統一生成預訓練模型預訓練模型02|圖文多模態理解和生成任務 5G技術興起,豐富的多模態內容數據激增,多模態信息處理需求越來越普遍。VQA 2.0MS COCO Caption|多模態預訓練進展多模態NLU預訓練LXMERTUNITER端到端統一理解生成Pixel-BERTVLMoSi
76、mVLMCoCa|多模態統一生成預訓練模型|提出了一個高效的多模態統一框架,針對圖文特征拼接訓練效率低的問題,以實現圖文跨層連接,解決圖像特征序列過長帶來的文本信息淹沒和訓練速度慢的問題;多模態統一生成預訓練模型|訓練任務 MLM:文本mask預測,利用圖片信息 ITA:圖文對比學習 ITM:圖文匹配 PrefixLM:encoder輸入前半句,decoder預測后半句 訓練數據 14M圖文對數據實驗結果|Ours baseOurs large實驗結果|VQA樣例展示|AliceMind生成生成預訓練業務應用預訓練業務應用03|AliceMind生成預訓練業務應用|下游應用:智能FAQ挖掘基于
77、預訓練模型PALM的問題生成模型,構建文檔到QA對生成的一整套知識自動抽取系統目前該系統已應用于電力智能企業知識庫和云小蜜FAQ挖掘|下游應用:AE電商搜索query改寫針對改寫中遺留的中長尾Query無法覆蓋的問題,提出基于類目可控和正反饋的Query改寫模型單獨上線效果:uv_l2p:+0.6%,uv_value:+1.4%改寫和向量召回等策略疊加實驗:uv_l2p+1.6%,uv_value+2.1%|下游應用:車載開機歡迎語生成人工盲選測試自動量化評測總結04|總結AliceMind純文本和多模態生成預訓練:多階段預訓練+高效訓練+理解&生成多任務結合的生成預訓練,縮小預訓練和微調之間的gap高效預訓練框架,多模態生成通過跳躍式連接提升預訓練效率純文本和多模態生成預訓練模型,都兼具理解和生成能力生成預訓練在工業界的實踐:多場景智能生成應用智能FAQ挖掘:通過生成問題,從文檔獲取FAQ對,助力文檔智能結構化電商搜索:通過query生成改寫搜索query,提高長尾搜索query的召回車載歡迎語生成:通過對天氣結構化信息輸入,生成和天氣相關的車載開機歡迎語,助力智能汽車非常感謝您的觀看|