《6-1 面向社交媒體的多模態屬性級情感分析.pdf》由會員分享,可在線閱讀,更多相關《6-1 面向社交媒體的多模態屬性級情感分析.pdf(55頁珍藏版)》請在三個皮匠報告上搜索。
1、Multimodal Aspect-Based Sentiment Analysis for Social Media Posts虞劍飛 南京理工大學|01Background02Multimodal ABSA 03Our Recent Works04Conclusion目錄CONTENT|01Background|Backgroundtext-driven|text-drivenmultimodal-drivenBackground|BackgroundImage credit(https:/ of Twitter in 2015TNW(https:/ contain more than o
2、ne image|BackgroundText Social Media Analytics-Multimodal Social Media AnalyticsSentiment AnalysisTextual InputOutputW/O ImageSentiment:Neutral|BackgroundText Social Media Analytics-Multimodal Social Media AnalyticsSentiment AnalysisTextual InputImageOutputWith ImageSentiment:PositiveYou,Q.,Luo,J.,J
3、in,H.,&Yang,J.Joint Visual-Textual Sentiment Analysis with Deep Neural Networks.In ACM MM,2015|BackgroundText Social Media Analytics-Multimodal Social Media AnalyticsFake News DetectionTextual InputOutputW/O ImageFake News VS Real NewsPhoto:Lenticular clouds over Mount Fuji,Japan.#amazing#earth#clou
4、ds#mountainsWang,Y.,Ma,F.,Jin,Z.,Yuan,Y.,Xun,G.,Jha,K.,&Gao,J.EANN:Event adversarial neural networks for multi-modal fake news detection.In KDD 2018.|BackgroundText Social Media Analytics-Multimodal Social Media AnalyticsFake News DetectionImageTextual InputOutputPhoto:Lenticular clouds over Mount F
5、uji,Japan.#amazing#earth#clouds#mountainsWith ImageFake NewsFake|BackgroundText Social Media Analytics-Multimodal Social Media AnalyticsSarcasm DetectionTextual InputOutputW/O ImageSarcasm?What a wonderful weather!|BackgroundText Social Media Analytics-Multimodal Social Media AnalyticsSarcasm Detect
6、ionWhat a wonderful weather!With ImageSarcasmRainingImageTextual InputOutput|02Multimodal ABSA(MABSA)Subtask 1:Multimodal Aspect Term Extraction/Named Entity RecognitionSubtask 2:Multimodal Aspect-Based Sentiment ClassificationSubtask 3:Joint Multimodal Aspect-Sentiment Analysis|MABSAMultimodal Aspe
7、ct Term Extraction(MATE)Extract all the aspects or entities in a multi-modal review or tweetHanqian Wu,Siliang Cheng,Jingjing Wang,Shoushan Li,and Lian Chi.Multimodal Aspect Extraction with Region-Aware Alignment Network.In Proceedings of NLPCC 2020.MATE:The Yangtze is so amazing!The Yangtze is so a
8、mazing!Multimodal Input|MABSAMultimodal Named Entity Recognition(MNER)Extract all the entities and classify each entity into pre-defined types,e.g.,PER,LOC,ORGMultimodal InputMNER:The Yangtze LOC is so amazing!2.Seungwhan Moon,Leonardo Neves,and Vitor Carvalho.Multimodal named entity recognition for
9、 short social media posts.In Proceedings of NAACL 2018.1.Qi Zhang,Jinlan Fu,Xiaoyu Liu,and Xuanjing Huang.Adaptive co-attention network for named entity recognition in tweets.In Proceedings of AAAI,2018.The Yangtze is so amazing!|MABSAMultimodal Aspect-based Sentiment Classification(MASC)Identify th
10、e sentiment over each given aspect or entity in a multi-modal review or tweetMultimodal InputMASC:Yangtze-Negative1.N.Xu,W.Mao,and G.Chen,“Multi-interactive memory network for aspect based multimodal sentiment analysis,”in Proceedings of AAAI,2019.2.J.Yu and J.Jiang.Adapting BERT for Target-Oriented
11、 Multimodal Sentiment Classification.In IJCAI 2019.The Yangtze is so amazing!|MABSAJoint Multimodal Aspect-Sentiment Analysis(JMASA)Jointly extract the aspects or entities and identify their sentiments in a multi-modal review or tweetMultimodal InputX.Ju,D.Zhang,R.Xiao,J.Li,S.Li,M.Zhang,G.Zhou,Joint
12、 multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection,in Proceedings of EMNLP 2021.JMASA:Yangtze,NegativeThe Yangtze is so amazing!|03Our recent works on MABSAUnified Multimodal Transformer for MNER(ACL 2020)Coarse-to-Fine grained Image-Target Matching for MABSC(IJCAI
13、2022)Vision-Language Pre-training for MABSA(ACL 2022)|BackgroundKevin Durant PER enters Oracle Arena LOC wearing off White x JordanMISC Kevin Durant enters Oracle Arena wearing off White x JordanMultimodal InputOutputJianfei Yu,Jing Jiang,Li Yang,and Rui Xia.Improving Multimodal Named Entity Recogni
14、tion via Entity Span Detection with Unified Multimodal Transformer.In ACL 2020.Multimodal Named Entity Recognition(MNER)Extract all the entities and classify each entity into pre-defined types,e.g.,PER,LOC,ORGOur Proposed ModelUnified Multimodal Transformer(UMT-BERT-CRF)Achieve bidirectional interac
15、tions with Multimodal Interaction(MMI)ModuleAuxiliary entity span detection moduleOur Proposed ModelUnified Multimodal Transformer(UMT-BERT-CRF)Achieve bidirectional interactions with Multimodal Interaction(MMI)ModuleAuxiliary entity span detection moduleExperiments Dataset Twitter-2015 from Zhang e
16、t.al,AAAI 2018 Diverse topics Twitter-2017 from Lu et.al,ACL 2018 Sports,concerts and other social eventsTable 1:The basic statistics of our two Twitter datasets|Jianfei Yu,Jing Jiang,Li Yang,and Rui Xia.Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal
17、Transformer.In ACL 2020.Experiments Main ResultsTable 2:Performance comparison on our two TWITTER datasets|Jianfei Yu,Jing Jiang,Li Yang,and Rui Xia.Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer.In ACL 2020.Short Summary|Propose a Unified
18、 Multimodal Transformer for MNERMultimodal Interaction ModuleAuxiliary Text-based Entity Span Detection ModuleAchieve the state-of-the-art on two benchmark Twitter datasetsFollow-up work from other teamsNew MNER Models Multimodal Graph Fusion Network for MNER AAAI 2021,Dong Zhang,etc Improving MNER
19、via Text-Image Relation Classification AAAI 2021,Lin Sun,etcOther Modalities for MNER(e.g.Speech)Chinese Multimodal NER with Speech Clues ACL 2021,Dianbo Sui,etcJianfei Yu,Jing Jiang,Li Yang,and Rui Xia.Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal T
20、ransformer.In ACL 2020.|03Our recent works on MABSAUnified Multimodal Transformer for MNER(ACL 2020)Coarse-to-Fine grained Image-Target Matching for MABSC(IJCAI 2022)Vision-Language Pre-training for MABSA(ACL 2022)Task DefinitionMultimodal Aspect-Based Sentiment Classification(MASC)Identify the sent
21、iment over each given aspect or entity(opinion target)in a multi-modal review or tweetNancy-PositiveSalalah Tourism Festival-NeutralNancy during the Salalah Tourism Festival;beautiful as always.|Jianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coars
22、e-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.MotivationCoarse-Grained Image-Target MatchingImage-Target RelevanceNancy during the Salalah Tourism Festival;beautiful as always.Nancy-Positiverelatedunrelated|Based on our observation,around58%of the input targets areNOTpres
23、ented in associated images.Nancy during the Salalah Tourism Festival;beautiful as always.Salalah Tourism Festival-NeutralMotivationFine-Grained Image-Target MatchingObject-Target Alignment|1.pleasant woman2.white light3.black board4.man5.head.Nancy during the Salalah Tourism Festival;beautiful as al
24、ways.Nancy-PositiveJianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Our DatasetImage-Target Matching DatasetBased on a subset of one benchmark dataset for MASC|targetImag
25、e-Target RelevanceObject-Target AlignmentNancy1#1Salalah Tourism Festival0NoneTable 1:Annotation examples of two samplesNancy during the Salalah Tourism Festival;beautiful as always.Jianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Gra
26、ined Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Our DatasetImage-Target Matching DatasetStatistics and Analysis|Table 2:Statistic of Our Image-Target Matching DatasetFigure 1:The Box/image area ratio(left)and the correlation of Image-Target(I-T)relevance and sentiment(right)in our datas
27、etMost bounding boxes are relatively smallFor targets unrelatedto the images,users tend to express neutralsentiment over themJianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2
28、022.Our Proposed MethodCoarse-to-Fine Grained Image-Target Matching Network|Jianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Our Proposed Method|Coarse-to-Fine Grained Im
29、age-Target Matching NetworkJianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Our Proposed Method|Coarse-to-Fine Grained Image-Target Matching NetworkJianfei Yu,Jieming Wan
30、g,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Our Proposed Method|Coarse-to-Fine Grained Image-Target Matching NetworkJianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Senti
31、ment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Our Proposed Method|Coarse-to-Fine Grained Image-Target Matching NetworkJianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grain
32、ed Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Experimental ResultsMain Results|Jianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Experimental ResultsResults o
33、n Our Image-Target Matching Dataset|Jianfei Yu,Jieming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.Short Summary|Manually annotate an Image-Target Matching Dataset including Image-Targ
34、et Relevance and Object-Target AlignmentWe propose a new ITM model for Multimodal Aspect-Based Sentiment ClassificationCoarse-to-Fine Grained Image-Target Matching NetworkExperiment results show that Our ITM models can consistently outperform several SOTA textual and multimodal methodsJianfei Yu,Jie
35、ming Wang,Rui Xia,and Junjie Li.Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching.In Proceedings of IJCAI-ECAI 2022.|03Our recent works on MABSAUnified Multimodal Transformer for MNER(ACL 2020)Coarse-to-Fine grained Image-Target Matching for MABSC(IJC
36、AI 2022)Vision-Language Pre-training for MABSA(ACL 2022)Motivation of Our WorkLimitations of Existing Work for MABSA 1.Unimodal Pre-trained Model Pre-trained unimodal models(e.g.,ResNet for image and BERT for text)Ignore the alignment between two modalities|Motivation of Our WorkLimitations of Exist
37、ing Work for MABSA 1.Unimodal Pre-trained Model Pre-trained unimodal models(e.g.,ResNet for image and BERT for text)Ignore the alignment between two modalities 2.No Task-specific Pre-training Tasks Existing VL Pre-training models only employ general multimodal pre-training tasks(e.g.,text-image matc
38、hing and language modeling)|Sun,Lin,et al.RIVA:a pre-trained tweet multimodal model based on text-image relation for multimodal NER.Proceedings of COLING,2020.Motivation of Our WorkLimitations of Existing Work for MABSA 1.Unimodal Pre-trained Model Pre-trained unimodal models(e.g.,ResNet for image a
39、nd BERT for text)Ignore the alignment between two modalities 2.No Task-specific Pre-training Tasks Existing VL Pre-training models only employ general multimodal pre-training tasks(e.g.,text-image matching and language modeling)3.Fail to Leverage Generative Models Unified architecture for Pre-traini
40、ng tasks and Downstream tasks BART/T5-based generative models achieve SOTA performance on ABSA|Our Proposed VL Pre-training ModelTask-specific Vision-Language Pre-training framework for MABSAA unified encoder-decoder framework based on BART(Lewis et al.,2020)Three types of pre-training tasks|Yan Lin
41、g,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Our Proposed VL Pre-training Model|Task-specific Vision-Language Pre-training framework for MABSAArchitecture for Downstream TasksJoint Multimodal Aspect-Sentiment Analysis(JM
42、ASA)Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Experiments|DatasetPre-training Dataset(MVSA-Multi,Niu et al.,MMM 2016)Table 2:The basic statistics of two Twitter datasets for MABSATable 1:The statistics of the M
43、VSA-Multi Dataset.MABSA Datasets(Twitter-2015 and Twitter-2017 from Yu et.al,IJCAI 2019)Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Experiments|Main ResultsComparison with Existing Approaches on JMASATable 3:Resu
44、lts of different approaches for JMASA.Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Experiments|Main ResultsComparison with Existing Approaches on MATE and MASCTable 4:Results of different approaches for MATE.Table
45、 5:Results of different approaches for MASC.Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Experiments|In-depth Analysis of Pre-training TasksImpact of Each Pre-training TaskTable 6:The results of pre-training tasks
46、 on two benchmarks.Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Experiments|In-depth Analysis of Pre-training TasksImpact of Each Pre-training TaskTable 6:The results of pre-training tasks on two benchmarks.Yan Li
47、ng,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Experiments|In-depth Analysis of Pre-training TasksImpact of Each Pre-training TaskFigure 1:The effectiveness of pre-training when using different number of training samples
48、for JMASA.Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.Short Summary|A unified Vision-Language Pre-training framework for MABSAA BART-based Generative Multimodal FrameworkWe introduce three task-specific pretraini
49、ng tasks to identify fine-grained aspects,opinions,and their cross-modal alignmentsTextual Aspect-Opinion ExtractionVisual Aspect-Opinion GenerationMultimodal Sentiment PredictionExperiments on two benchmark datasets show that our pre-training approach achieves the state-of-the-art performance on th
50、ree MABSA subtasks.Yan Ling,Jianfei Yu,and Rui Xia.Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis.Proceedings of ACL,2022.|04ConclusionConclusion|Multimodal Approaches for three MABSA subtaskUnified Multimodal Transformer(ACL 2020)Focus:Multimodal Interaction and Visual
51、BiasImage-Target Matching(IJCAI 2022)Focus:Coarse and Fine-Grained Image-Text MatchingUnified Vision-Language Pre-training Framework(ACL 2022)Focus:Task-specific VL Pre-trainingFuture WorkExplainability of Multimodal ABSA ModelsVisualizationAdversarial Attack(randomly replacing images)Related Multimodal Tasks(e.g.Multimodal IE)Multimodal Entity Link/DisambiguationMultimodal Relation/Event ExtractionMultimodal Knowledge Graph Construction and Completion非常感謝您的觀看|