2-2 AliceMind 純文本和多模態生成預訓練技術及應用.pdf

報告預覽

2-2 AliceMind 純文本和多模態生成預訓練技術及應用.pdf

編號：102310

PDF 29頁 6.63MB 下載積分：VIP專享

下載報告請您先登錄！

2-2 AliceMind 純文本和多模態生成預訓練技術及應用.pdf

1、AliceMind純文本和多模態生成預訓練技術及應用李晨亮高級算法工程師|01純文本生成預訓練PALM2.002多模態統一生成預訓練模型0304總結目錄 CONTENT|生成預訓練業務應用純文本生成預訓練PALM2.001|樣例：生成式問答（GenerativeQA）swer.Seo et al.(2017)proposed BiDAF that representscontext at different levels of granularity and uses the bi-directional attention flow mechanism for answer extracti

2、on.SLQA(Wang,Yan,and Wu 2018)improves answer qual-ity with a hierarchical attention fusion network in whichattention and fusion are conducted horizontally and verti-cally across layers between the question and the paragraph.Recently,we see emerging BERT-based models(Devlin etal.2018)which are proven

3、 effective for reading compre-hension.Multi-paragraph reading comprehension has alsoattracted interest from the academic(Yan et al.2019)andindustrial community(He et al.2018).Sequence-to-sequence QA.The sequence-to-sequencearchitecture has been broadly used in a variety of QAtasks without reading co

4、ntextual paragarphs.GenQA(Yinet al.2016)combines knowledge retrieval and sequence-to-sequence learning to produce fluent answers,but it onlydeals with simple questions containing one single fact.COREQA(He et al.2017)extends it with a copy mecha-nism,and can answer an information inquired question(i.

5、e.,a factual question containing one or more topic entities).In contrast,Fu and Feng(2018)introduced a new attentionmechanism that explores heterogeneous memory for answersentence generation.The new attention encourages the de-coder to actively interact with the memory in the memory-augmented encode

6、r-decoder framework.Moreover,Tao etal.(2018)proposed a multi-head attention mechanism tocapture multiple semantic aspects of a given query and gen-erate an informative response in a dialogue system.Natural Answer Generation.There have been severalattempts at using machine reading to generate natural

7、 an-swers.Tan et al.(2018)took a generative approach wherethey added a decoder on top of their extractive model toleverage the extracted evidence for answer synthesis.How-ever,this model still relies heavily on the extraction to per-form the generation and thus needs to have start and endlabels(a sp

8、an)for every QA pair.Mitra(2017)proposed aseq2seq-based model that learns alignment between a ques-tion and passage words to produce rich question-aware pas-sage representation by which it directly decodes an answer.Gao et al.(2019)focused on product-aware answer gener-ation based on large-scale unl

9、abeled e-commerce reviewsand product attributes.Furthermore,natural answer gener-ation can be reformulated as query-focused summarization(QFS)which is addressed by Nema et al.(2017)as well asHasselqvist,Helmertz,and K ageb ack(2017).Recently,theMasque model from Nishida et al.(2019)explored the idea

10、of copying words from questions to answers with a mixtureof multiple distributions.Our model differs from Masque inthat LatentQA models inherent uncertainty with a stochas-tic mechanism and integrates a dedicated selector networkto exploit all three information sources for well-formed an-swer genera

11、tion.Pre-trained contextualized representations,such as ELMo(Peters et al.2018)used in Masque,can bereadily plugged into the stochastic selector networks for fur-ther improvement.ContextualPassageBake sirloin steaks in the oven at425 degrees Fahrenheit for 30minutes until they are cooked toyour desi

12、red taste.Baking sirloinsteaks decreases the moistureavailable in the steaks.The oventends to dry the meat out if youdo not take the time to marinateappropriately.QuestionHow long to cook sirloin steak?GeneratedAnswerIt takes 30 minutes to cooksirloin steak in the oven at 425degrees Fahrenheit.Table

13、 1:A sample well-formed answer with words in greenfrom the vocabulary,words in red from the paragraph,andwords in blue from the question.Well-formed Answer GenerationWell-formed answer generation is a question answeringparadigm where a QA model is expected to answer a givenquestion in a way that is

14、understood without perfect context.More formally,let(q,a,p)denote an instance from a QAdataset of N instances,where q denotes a question,a de-notesananswer,andpdenotesaparagraph.Well-formedan-swer generation aims to produce an abstractive answer a to agiven question q based on the content in paragra

15、ph p.Differ-ent from extractive QA,generated answer a does not have tobe a sub-span in paragraph p.Instead,answer a is supposedto be formed in natural language,and to make sense withoutthe context of either question q or paragraph p.LatentQAIn composing a well-formed answer a,our QA model,LatentQA,r

16、ecurrently selects words at the decoding stage.Traditional QFS and answer generation models select wordsfrom either vocabulary v alone(Tan et al.2018;Nema etal.2017)or a combination of vocabulary v and paragraphp(Mitra 2017).However,when it comes to generating well-formed answers,the two sources v a

17、nd p are often insuffi-cient to provide the answers with proper context.In contrast,LatentQA employs a novel stochastic selec-tor network for answer composition,which allows answerwordstocomefromthreedifferentsources:questionq,para-graph p,and vocabulary v.Table 1 shows a specific exam-ple of a well

18、-formed answer generated by selecting wordsfrom the three sources.An overview of the architecture ofLatentQA is depicted in Figure 1.Sequence-to-sequence modelLatentQA is built upon an extension of the sequence-to-sequence model(Bahdanau,Cho,and Bengio 2015;Nalla-pati et al.2016;See,Liu,and Manning

19、2017).The wordsof question q and paragraph p are fed one-by-one into twodifferent encoders,respectively.Each of the two encoders,which are both bidirectional LSTMs,produces a sequence很多文本生成任務，需要模型具備強大的理解能力，在充分的理解給定輸入內容之后再生成生成式摘要（Abstractive Summarization）機器翻譯（Machine Translation)生成式問答（Generative QA

20、）問題生成（Question Generation）文本生成任務|圖片使用預訓練生成模型現有的預訓練生成模型 GPT，GPT-2，GPT-3：只有一個單向decoder，缺少一個encoder去更好的雙向理解文本 MASS，BART：對輸入文本做一些處理（打亂、遮蔽）等，讓decoder去還原這些輸入信息 Prophnet：提出一個新的自監督目標函數，通過預測未來的N元組提升生成的全局一致性PALM1.0的出發點上述預訓練生成模型較少的關注encoder的理解能力，且預訓練任務與下游基于理解的生成任務關系較小 PALM預訓練模型專門為基于理解的生成任務設計模型結構訓練任務|圖片使用PALM1

21、.0模型：聯合自編碼和自回歸預訓練自編碼預訓練自編碼訓練（Auto-encoding Pre-training），通過上下文信息重建原始輸入被遮蔽的文本，如BERT 預訓練可以得到很好的理解模型，但不太適合用于上下文無法獲取到的生成場景自回歸預訓練自回歸訓練（Auto-regressive Pre-training），通過單向的編碼從左到右生成文本，如GPT 不太適用于需要文本雙向理解能力的下游任務場景PALM PALM模型結合了自編碼和自回歸預訓練，自回歸生成文本建立在自編碼預訓練得到的雙向理解文本的基礎上 Encoder：引入自編碼的預訓練增強encoder對輸入文本的上下文理解能力

22、Decoder：引入自回歸任務，基于encoder對輸入上下文理解來生成Bi B,Li C,Wu C,et al.PALM:Pre-training an Autoencoding&AutoregressiveLanguage Model for Context-conditioned GenerationJ|圖片使用PALM1.0實驗生成式QA任務 MARCO QA+NLG榜單less relevant to input text,since the MASS modelwas trained on individual sentences.In the firstexample,it is

23、 interesting to observe that in addi-tion to summarizing the input content,PALM isable to make a non-trivial inference of the expectedauction price and the final selling price of the car(might not be factually accurate though).An infer-ence is also made by PALM in the second examplein addition to su

24、mmarization,although the CapeVerde Islands National Park does not really exist.These examples demonstrate that PALM pre-training has learned to infer and to reason fromthe input text.Although in the pre-training phasethe generated content may not be factually accu-rate in the absence of rich context

25、,the capability ofinference can be transferred downstream by fine-tuning on specific generation tasks.3.3Fine-tuning on Generative QAWe also experiment with fine-tuning PALM on sev-eral downstream generation tasks.The MARCObenchmark(Nguyen et al.,2016)released by Mi-crosoft is a good fit for evaluat

26、ing generative QAmodels.In the MARCO dataset,the questions areuser queries issued to the Bing search engine andthe contextual passages are from real web docu-ments.The data has been split into a training set(153,725 QA pairs),a dev set(12,467 QA pairs)and a test set(101,092 questions with unpublishe

27、danswers).To evaluate the generative capability,wefocus on the Q&A+Natural Language Generationtask,the goal of which is to provide the best answeravailable in natural language that could be used bya smart device/digital assistant.The answers are human-generated and not neces-sarily sub-spans of the

28、contextual passages,so weuse the ROUGE-L(Lin,2004)metric for our eval-uation to measure the quality of generated answersagainst the ground truth.We fine-tune the pre-trained PALM on theMARCO training set for 10 epochs.We set thebatch size to 64,the learning rate to 1e-5,and themaximum input length t

29、o 512.The other hyper-parameters are kept the same as pre-training.Infine-tuning PALM,the encoder takes as inputxacontextual passage concatenated with a question atthe end,and the decoder takes an answer as inputy.During decoding,we use beam search with a beamof size 5.Table 2 presents the answer ge

30、neration resultson the test set obtained from the official MARCOMethodRouge-LConZNet(Indurthi et al.,2018)0.421Reader-Writer0.439KIGN-QA0.441SNET+CES2S0.450Communicating BERT0.483VNET(Wang et al.,2018)0.484Selector NLGEN0.487BERT+Multi-Pointer0.495Masque(Nishida et al.,2019)0.496PALM0.498Table 2:Tes

31、t results of answer generation on the offi-cial MARCO leaderboard as of December 9,2019.leaderboard.PALM achieves the 1st place on theleaderboard,outperforming all competing meth-ods in generation quality.Note that PALM pre-trains a single model,while some of the top-performing methods are ensemble

32、models,suchas Masque,on the leaderboard.Crucially,the su-periority of PALM-single over Masque-ensemblewith pre-trained ELMo(Peters et al.,2018)andBERT-based methods clearly demonstrates the ef-fectiveness and generalizability of PALM over theother pre-training approaches in language model-ing.3.4Fin

33、e-tuning on SummarizationText summarization produces a concise and fluentsummary conveying the key information in the in-put(e.g.,a news article).We focus on abstractivesummarization,a generation task where the sum-mary is not constrained to reusing the phrases orsentences in the input text.We condu

34、ct experi-ments on both the CNN/DailyMail dataset(Her-mann et al.,2015)and the Gigaword dataset(Graffand Cieri,2003).The CNN/DailyMail dataset con-tains 93K news articles from CNN and 220K arti-cles from Daily Mail,while the Gigaword datasetconsists of a total of 3.8M article-title pairs.Wetake the

35、articles as the input to the encoder and thesummary for the decoder.We adopt the same opti-mization hyperparameters from generative QA fine-tuning for the summarization task.The F1 scoresof Rouge-1,Rouge-2 and Rouge-L are reported onthe test set of both datasets for evaluation.Table 3 shows the resu

36、lts of abstractive sum-marization on the CNN/DailyMail test set andthe Gigaword test set.PALM achieves betterperformance than all strong summarization mod-|圖片使用PALM1.0實驗摘要任務CNN/Dail MailCNN/DailyMailGigawordRG-1RG-2RG-LRG-1RG-2RG-LBERTSUMABS(Liu and Lapata,2019)41.7219.3938.76-MASS(Song et al.,2019)

37、42.1219.5039.0138.1319.8135.62UniLMLARGE(Dong et al.,2019)43.3320.2140.5138.4519.4535.75T5LARGE(Raffel et al.,2019)42.5020.6839.75-BARTLARGE(Lewis et al.,2019)44.1621.2840.90-PEGASUS(Zhang et al.,2019)44.1721.4741.1139.1219.8636.24ERNIE-GENLARGE(Xiao et al.,2020)44.0221.1741.2639.2520.2536.53PALM42.

38、7119.9739.7138.7519.7935.98PALMLARGE44.3021.1241.4139.4520.3736.75Table 3:Results of abstractive summarization on the CNN/DailyMail test set and the Gigaword test set.RG isshort for ROUGEels with pre-training recently proposed,includingUniLM(Dong et al.,2019),T5(Raffel et al.,2019),BART(Lewis et al.

39、,2019),PEGASUS(Zhanget al.,2019)and ERNIE-GEN(Xiao et al.,2020).By consistently outperforming the pre-trainingmethods,PALM confirms its effectiveness in lever-aging unsupervision signals for language genera-tion.3.5Fine-tuning on Question GenerationWe conduct experiments for the answer-aware ques-ti

40、on generation task.Given an input passage and ananswer span,question generation aims to generatea question that leads to the answer.Following thepractice in(Zhao et al.,2018;Dong et al.,2019),we use the SQuAD 1.1(Rajpurkar et al.,2016)dataset,and the BLEU-4,METEOR and ROUGE-L metrics for evaluation.

41、As shown in Table 4,PALM outperforms all pre-vious question generation systems and achievesa new state-of-the-art result on BLEU-4 andROUGE-L for question generation on the SQuAD1.1 dataset.MethodBLEU-4MTRRG-LCorefNQGa15.1619.12-MP-GSNb16.3820.2544.48UNILMc22.8824.9451.80ERNIEd22.2825.1350.58ERNIE-G

42、ENLARGEd24.0326.3152.36PALM22.7825.0250.96PALMLARGE24.1125.8552.38Table 4:Question generation results on the SQuADdataset.MTR is short for METEOR and RG is shortfor ROUGE.a(Du and Cardie,2018);b(Zhao et al.,2018);c(Dong et al.,2019);d(Xiao et al.,2020).3.6Fine-tuning on Response GenerationConversati

43、onal response generation aims to pro-duce a flexible response to a conversation(Vinyalsand Le,2015).Following MASS,we conductexperiments on the Cornell Movie Dialog cor-pus5(Danescu-Niculescu-Mizil and Lee,2011)that contains 140K conversation pairs,and use thetraining/test splits provided by the dat

44、aset.Thesame training hyperparameters from generative QAfine-tuning are adopted on the response generationtask.We report the results in perplexity follow-ing(Vinyals and Le,2015)(lower is better).We compare PALM with the competing meth-ods including the baseline trained on the datapairs available an

45、d the pre-trained BERT+LM andMASS.Following MASS,we train every model on10K pairs randomly sampled and all 110K trainingpairs.As shown in Table 5,PALM significantlyperforms better than all the competitors by a largemargin on both the 10K and 110K data,demon-strating its capability in generating resp

46、onses tocontext thanks to its new pre-training objectives.3.7Ablation StudiesWe conduct ablation studies to assess the individualcontribution of every component in PALM.Table 6reports the results of full PALM and its ablationson the CNN/Daily Mail summarization dataset.We evaluate how much the point

47、er-generator net-work contributes to generation quality by remov-ing it from PALM pre-training.This ablation re-sults in a drop from 39.71 to 39.49 on Rouge-L,demonstrating the role of the pointer-generator ingenerative modeling.Given the slight drop,onemay choose to exclude it from the full model f

48、or5https:/ Data)(110K Data)Baseline82.3926.38BERT+LM80.1124.84MASS74.3223.52PALM45.4321.98Table 5:Results of conversational response generationin terms of perplexity on Cornell Movie Dialog corpus(lower is better).training efficiency.In our experiments,the pointer-generator is used in every generati

49、on task for opti-mal generation performance.To study the effect of the pre-trained encoderand decoder in PALM,we ablate autoencoding andautoregression by randomly initializing the weightsof the encoder and the decoder,respectively.Theautoencoding and autoregression components bothprove to be critica

50、l with significant drops on thethree Rouge metrics after the ablation.Finally,westudy the significance of full PALM pre-training.Over 6.5%of performance degradation resultedfrom ablating pre-training clearly demonstrates thepower of PALM in leveraging an unlabeled corpusfor downstream generation.4Re

51、lated WorkELMo(Peters et al.,2018)is an early promi-nent pre-training method based on bidirectionalLSTMs.It concatenates left-only and right-onlyrepresentations,but does not pre-train interactionsbetween these features.GPT(Radford,2018),GPT-2(Radford et al.,2019)and GPT-3(Brown et al.,2020)are propo

52、sed to base language modelingon the Transformer architecture,and use only theTransformer decoder for pre-training.Edunov etal.(Edunov et al.,2019)examine different strate-gies(e.g.,ELMo)to add contextualized embed-dingstosequence-to-sequencemodels,andobservethe most improvement by adding the learned

53、 em-beddings to the encoder.BERT(Devlin et al.,2018)introduces MaskedLanguage Modelling,which allows pre-trainingto learn interactions between left and right con-text words.Recent work has shown that verystrong performance can be achieved by training forlonger(Liu et al.,2019),by tying parameters ac

54、rosslayers(Lan et al.,2019),and by masking spans in-stead of words(Joshi et al.,2019).However,BERTdoes not make predictions autoregressively,so it isnot effective for generation tasks.AblationRG-1RG-2RG-LPALM42.7119.9739.717 pointer-generator42.5419.8639.497 autoencoding41.7819.3238.817 autoregressi

55、on41.8919.4838.927 pre-training40.3217.7837.12Table 6:Ablation tests of PALM on the CNN/DailyMail summarization dataset.UniLMs(Dong et al.,2019;Hangbo et al.,2020)fine-tune BERT with an ensemble of masks,some of which use only leftward context,allowingUniLMs to be used for generation tasks.A differ-

56、ence between UniLMs and PALM is that UniLMsare not fully autoregressive in the pre-training pro-cess.In contrast,PALM reduces the mismatchbetween pre-training and context-conditioned gen-eration tasks by forcing the decoder to predict thecontinuation of text input on an unlabeled corpus.MASS(Song et

57、 al.,2019)and BART(Lewiset al.,2019)are the two pre-training methods mostsimilar to PALM.In MASS,an input sequencewith a masked span of tokens is mapped to a se-quence consisting of the missing tokens,whereasBART is trained to reconstruct the original textfrom corrupted input with some masked tokens

58、.The difference in input&output representationsbetween PALM and MASS&BART is detailed inSection 2.2.5ConclusionsIn this work,we propose PALM,a novel approachto pre-training an autoencoding and autoregressivelanguage model on a large unlabeled corpus,de-signed to be fine-tuned on downstream generatio

59、nconditioned on context.It is built upon an ex-tension of the Transformer encoder-decoder,andjointly pre-trains the encoder and the decoder inan autoencoding denoising stage followed by anautoregressive generation stage.PALM significantly advances the state-of-the-artresults on a variety of context-

60、conditioned genera-tion applications,including generative QA(Rank 1on the MARCO leaderboard),abstractive summa-rization,question generation,and conversationalresponse generation.It has been shown in priorwork(Liu et al.,2019)that training for more stepsover a larger corpus can potentially improve th

61、eperformance of pre-training.Our future work willexplore the potential of training PALM for longeron much more unlabeled text data.Ablation Study|圖片使用PALM2.0模型：Motivation PALM1.0的生成預訓練任務難度相比于其他生成預訓練難度較大，收斂后acc只有0.45 多階段多任務漸進式預訓練，進一步提升模型的生成能力，縮小預訓練和微調之間的gap?!#!$!#!$!%!%!&(a)GPT:Tokens are predicted a

62、utoregressively,meaningthat GPT can be used for generation.However,it lacks anencoder to condition generation on context.?!?!#!$?!%!#?!&!&(b)MASS:It is based on the encoder-decoder architecture,but the decoder predicts only the tokens that are masked outin the text input to the encoder.?!?!#!$?!%!$!

63、%!$!&!&!#(c)BART:Rather than masked tokens,the decoder recon-structs the original full sentence from the corrupted input tothe encoder.However,it mismatches with most downstreamgeneration which is more than reconstructing original input.?!?!#!$?!%!&?()*()*+(d)PALM:The encoder predicts masked tokens

64、by encodingcontext bidirectionally,and the decoder predicts the textsegment subsequent to the context.It forces the model tolearn to comprehend the context for generating relevant text.Figure 1:A schematic comparison of PALM with GPT,MASS and BART.autoencoder to reconstruct the original textfrom cor

65、rupted context in which random to-kens are sampled and replaced withMASKsymbols following BERTs practice(Devlinet al.,2018).The training optimizes the cross-entropy reconstruction loss between encodersoutput and original context,as Masked Lan-guage Modeling(MLM)in BERT.By pre-dicting the actual toke

66、ns in context that aremasked out,PALM forces the encoder to com-prehend the meaning of the unmasked tokensand the full context.2.The encoder and decoder are then jointlytrained to autoregressively generate text out-put out of the context representations fromthe encoder.The training maximizes the log

67、-likelihood of the text in ground truth from thedecoders output:L()=X(x,y)2(X,Y)lognYt=1P(yt|yt,x;),(1)whereXrepresents the set of context andYrepresents the set of text to be generated.By conditioning the generation on contextrepresentations,PALM forces the decoder torely deeply on the context inst

68、ead of preced-ing generated tokens in next token prediction,which facilitates context-sensitive generation.2.2Input&Output RepresentationsIn the phase of model pre-training,input and out-put representations are tailored to minimize the dis-crepancy between self-supervised pre-training andsupervised

69、fine-tuning.In a typical downstreamgeneration task(e.g.,abstractive summarizationand generative QA),context is given as a ratherlong passage,and a model is asked to generate ashorter piece of text based on the comprehensionof the context.Given a contiguous text fragment of lengthL(composed of a few

70、sentences)from an unlabeledcorpus,PALM uses the consecutive span of length80%Lfrom the beginning of the fragment as con-text input to the encoder,and uses the remainderof text span of length20%Las text output tobe generated by the decoder.This representationdesign mimics the input and output of down

71、streamtasks,with the hypothesis that human-written textis coherent and thus the subsequent text span oflength20%Lcaptures the comprehension of thepreceding context span.In this way,PALM learnsto infer the subsequent text content from the pre-ceding content.The collection of text fragments are constr

72、uctedfrom a corpus by following the practice of BERT.In our experiments,we set the maximum length ofa fragment to be 500,i.e.,L 500.Therefore,thecontext input consists of at most 400 tokens,andthe text output consists of at most 100 tokens.|圖片使用PALM2.0模型：多階段漸進式預訓練多階段多任務漸進式預訓練，從易到難，從任務無關到任務相關|圖片使用PA

73、LM2.0模型：多階段漸進式預訓練Pre-training ObjectivesDuReaderQG-robust(Bleu-4)CSL(Rouge-L)ADGEN(Bleu-4)LCSTS(Rouge-L)+Word-level Fill-mask37.10-40.98+Text infiling&Sentence Shuffle42.563.211.542.1+Auto-regressive Generation43.064.411.342.6|圖片使用PALM2.0模型：多階段漸進式預訓練ModelLCSTS（8k train,Rouge-L）LCSTS（8k train,Rouge-L

74、）PALM-Base30.3023.11+Task specific Pre-training（Similar to PEGASUS）32.0527.24|圖片使用PALM2.0實驗ModelDuReaderQG-robust(Bleu-4)CSL(Rouge-L)ADGEN(Bleu-4)LCSTSmT5(S)-56.710.233.5BART(B)-62.19.937.8CPT(B)-63.09.838.2PALM2.0-Base42.163.410.939.7CPM-2-10.635.9mT5(B)-61.8-36.5ERNIE-2.0 Large39.3-41.4RoBERTa Lar

75、ge37.1-41.0BART Large-64.210.040.6CPT Large-63.710.742.0PALM2.0-Large43.064.411.342.6在中文多個生成數據集上對比，PALM2.0 base/large均高于同等規模下的其他SOTA模型，mT5，BART，CPT等多模態統一生成多模態統一生成預訓練模型預訓練模型02|圖文多模態理解和生成任務 5G技術興起，豐富的多模態內容數據激增，多模態信息處理需求越來越普遍。VQA 2.0MS COCO Caption|多模態預訓練進展多模態NLU預訓練LXMERTUNITER端到端統一理解生成Pixel-BERTVLMoSi

76、mVLMCoCa|多模態統一生成預訓練模型|提出了一個高效的多模態統一框架，針對圖文特征拼接訓練效率低的問題，以實現圖文跨層連接，解決圖像特征序列過長帶來的文本信息淹沒和訓練速度慢的問題；多模態統一生成預訓練模型|訓練任務 MLM：文本mask預測，利用圖片信息 ITA：圖文對比學習 ITM：圖文匹配 PrefixLM：encoder輸入前半句，decoder預測后半句訓練數據 14M圖文對數據實驗結果|Ours baseOurs large實驗結果|VQA樣例展示|AliceMind生成生成預訓練業務應用預訓練業務應用03|AliceMind生成預訓練業務應用|下游應用：智能FAQ挖掘基于

77、預訓練模型PALM的問題生成模型，構建文檔到QA對生成的一整套知識自動抽取系統目前該系統已應用于電力智能企業知識庫和云小蜜FAQ挖掘|下游應用：AE電商搜索query改寫針對改寫中遺留的中長尾Query無法覆蓋的問題，提出基于類目可控和正反饋的Query改寫模型單獨上線效果：uv_l2p:+0.6%，uv_value:+1.4%改寫和向量召回等策略疊加實驗：uv_l2p+1.6%，uv_value+2.1%|下游應用：車載開機歡迎語生成人工盲選測試自動量化評測總結04|總結AliceMind純文本和多模態生成預訓練：多階段預訓練+高效訓練+理解&生成多任務結合的生成預訓練，縮小預訓練和微調之間的gap高效預訓練框架，多模態生成通過跳躍式連接提升預訓練效率純文本和多模態生成預訓練模型，都兼具理解和生成能力生成預訓練在工業界的實踐：多場景智能生成應用智能FAQ挖掘：通過生成問題，從文檔獲取FAQ對，助力文檔智能結構化電商搜索：通過query生成改寫搜索query，提高長尾搜索query的召回車載歡迎語生成：通過對天氣結構化信息輸入，生成和天氣相關的車載開機歡迎語，助力智能汽車非常感謝您的觀看|

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后，可能會被瀏覽器默認打開，此種情況可以點擊瀏覽器菜單，保存網頁到桌面，就可以正常下載了。
3、本站不支持迅雷下載，請使用電腦自帶的IE瀏覽器，或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮，下載后原文更清晰。

本文（2-2 AliceMind 純文本和多模態生成預訓練技術及應用.pdf）為本站（云閑）主動上傳，三個皮匠報告文庫僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對上載內容本身不做任何修改或編輯。若此文所含內容侵犯了您的版權或隱私，請立即通知三個皮匠報告文庫（點擊聯系客服），我們立即給予刪除！

溫馨提示：如果因為網速或其他原因下載失敗請重新下載，重復下載不扣分。

相關報告

1-2 多模態預訓練技術及在電商領域的應用.pdf

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站