6-2 支持用戶反饋的對話式圖像檢索.pdf

報告預覽

6-2 支持用戶反饋的對話式圖像檢索.pdf

編號：102533

PDF 55頁 4.96MB 下載積分：VIP專享

下載報告請您先登錄！

6-2 支持用戶反饋的對話式圖像檢索.pdf

1、01Background02Unstructured Feedback03Structured Feedback04Future Work目錄 CONTENT|01BackgroundBackgroundHuge economic value of fashion domain.|BackgroundNumerous online clothing data on the Internet.Precise image retrieval that meets the users search intent is a key challenge.|BackgroundConventional p

2、aradigms for item search take either text or image as theinput query to search for items of interest.a blue overcoat with a lapel collar and a belt around the waist Text QueryImage QueryUnstructured feedback+I want the dress to be black and more professional.|Background Flexible image retrieval:allo

3、w users to use reference image and modification feedbackto search items.Structured feedback|Background Application:dialog-based fashion search/conversational fashion search At the beginning,the recommended fashionproduct image may not be the desired one.Based on this reference image,the usertypicall

4、y would like to refine the retrieval byproviding feedbacks,describing the relativedifference between the current retrievedreference image and his/her desired one.|02Structured FeedbackTask A query image can be described by itsassociated attributes:=a1,1,2,The target image can be described by:=a1,1,2

5、,Attribute Manipulation1and 2are the to-be-manipulated attributes.|Related WorkCategoryRelated WorkFeature Fusion-basedMemory-Augmented Attribute Manipulation Networks for Interactive FashionSearch,In CVPR2017.Feature Substitution-basedEfficient Multi-Attribute Similarity Learning Towards Attribute-

6、based FashionSearch,In WACV2018.Automatic Spatially-aware Fashion Concept Discovery,In ICCV2017.Learning Attribute Representations with Localization for Flexible FashionSearch,In CVPR2018.|Fusion-based Method Memory-AugmentedAttributeManipulationNetworksforInteractiveFashion Search.In CVPR2017.Attri

7、buteRepresentationLearning Image Representation Learning RepresentationFusionFashion SearchFusion-based:learn the latent representation of the target item by directly fusing the visual features of thequery image and the semantic features of wanted attribute(s).:original image representation:prototyp

8、e attribute representation:binary indicator vector:memory matrix:manipulated representationAttribute Manipulation|Substitution-based MethodLearning Attribute Representations with Localization for Flexible Fashion Search.In CVPR2018.Learning Deep Features for Discriminative Localization.CVPR 2016:292

9、1-2929Attribute LocalizationAttribute RepresentationLearningOptimizationSubstitution-based:characterize the query image with multiple attributes,and the attributemanipulation can be conducted by replacing the unwished attribute features with desired ones.Class Activation Mapping isusedforlocalizingt

10、hediscriminative image regions.Afterthetraining,thefeatures extracted from thetrainingimageswiththesame attribute value areaveragedandusedforattribute manipulation.MotivationBlueGANFeature SpaceSimilar ItemsGenerated Prototype Image Existing methods ignore the potential of Generative Adversarial Net

11、works(GANs)in enhancing the visual understanding of target items.We aim to boost the performance of content-based fashion search with attribute manipulation by directly generating the target item image.|MethodPrototype Image GenerationMetric Learning for Fashion Search|Ground truth valueof the attri

12、bute Generated prototype imageMake the discriminator to learn how toaccurately classify the attributes.Encourage the generator to synthesizethe prototype image with correct attributemanipulation.The proposed AMGAN.Semantic Discriminative LearningMethodPrototype Image GenerationMetric Learning for Fa

13、shion Search|The proposed AMGAN.Adversarial Metric Learning(Pair-Based）Maximize the similaritybetween the positive pairMinimize the similaritybetween the negative pairEncourage the generator to produce similar to the positive image to fool the learned metricSimilarity Probability:+:shares the sameat

14、tribute values withMethodPrototype Image GenerationMetric Learning for Fashion Search|The proposed AMGAN.Adversarial Metric Learning(Triplet-based）Encourage to be moresimilar with+than Relative similarity Probability:Method|Prototype Image GenerationMetric Learning for Fashion Search DARN 213,636 im

15、ages,9 attributes and 179 possible valuesDatasetSome examples of online-offline image pairs in DARNAttributes and value examples of DARN.|Junshi Huang,Rogrio Schmidt Feris,Qiang Chen,Shuicheng Yan:Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network.ICCV 2015:1062-1070Attributes

16、and value examples of Shopping100K.Shopping100K 101,021 images,12 attributes and 151 possible valuesDatasetSamples in Shopping100KKenan E.Ak,Joo-Hwee Lim,Jo Yew Tham,Ashraf A.Kassim:Efficient Multi-attribute Similarity Learning Towards Attribute-BasedFashion Search.WACV 2018:1671-1679|Model Comparis

17、on（a）Top-K on Shopping100K（b）NDCGK on Shopping100K（c）MRRK on Shopping100K（d）Top-K on DARN（e）NDCGK on DARN（f）MRRK on DARN|Fig.1:Overall performance comparison on Shopping100K and DARN.Symbols and denote the statistical significance for 0.05and LAlgorithmFor testing,we will rank the gallery images by

18、jointly evaluating their cosinesimilarities to both local-wise and global-wise composed query representations.DatasetAdapting existing dataset:Creating new dataset:MIT-States Phillip Isola et al.CVPR 2015Birds-to-words Maxwell Forbes et al.EMNLP 2019 Shoes Xiaoxiao Guo et al.NeurIPS 2018CSS Nam Vo e

19、t al.CVPR 2019CIRR Zheyuan Liu et al.ICCV 2021 Fashion200k Xintong Han et al.ICCV 2017|FashionIQ.Released on ICCV 2019 workshopDatasetMIT-StatesExamples of training triplets derived from MIT-States.Contains around 60k images.Each image comes with an object/noun label and a state/adjective label(such

20、 as“red tomato”or“new camera”).unripe bananaripe bananaReplace unripe with ripeCluttered bagempty bagReplace clustered with empty(a)(b)|Phillip Isola,Joseph J.Lim,Edward H.Adelson:Discovering states and transformations in image collections.CVPR 2015:383-1391DatasetBirds-to-wordsSamples from the Bird

21、s-to-Words dataset.A dataset for relative captioning.Consists of 3,347 image pairs,annotated with 16,067 paragraphs describingthe differences between pairs of images.|Maxwell Forbes,Christine Kaeser-Chen,Piyush Sharma,Serge J.Belongie:Neural Naturalist:Generating Fine-Grained Image Comparisons.EMNLP

22、/IJCNLP(1)2019:708-717DatasetFashion200kExamples of the image-text pairs in Fashion200k.More than 200k image-text pairs,crawled from online shopping websites.Removed stop words,symbols,as well as words that occur fewer than 5 timesblue one shoulder dressblack one shoulder dressReplace blue with blac

23、kExamples of training triplets for CTI-IR.|Xintong Han,Zuxuan Wu,Phoenix X.Huang,Xiao Zhang,Menglong Zhu,Yuan Li,Yang Zhao,Larry S.Davis:Automatic Spatially-AwareFashion Concept Discovery.ICCV 2017:1472-1480DatasetShoesExamples of relative captions in Shoes dataset.A dataset for relative captioning,

24、collected in a scenario of a shopping chatting session between a shopping assistant and a customer.10,751 captions,with one caption per pair of images.AMT annotation interface.|Xiaoxiao Guo,Hui Wu,Yu Cheng,Steven Rennie,Gerald Tesauro,Rogrio Schmidt Feris:Dialog-based Interactive Image Retrieval.Neu

25、rIPS 2018:676-686DatasetAdapting existing dataset:Creating new dataset:MIT-States Phillip Isola et al.CVPR 2015Birds-to-words Maxwell Forbes et al.EMNLP 2019 Shoes Xiaoxiao Guo et al.NeurIPS 2018CSS Nam Vo et al.CVPR 2019CIRR Zheyuan Liu et al.ICCV 2021 Fashion200k Xintong Han et al.ICCV 2017|Fashio

26、nIQ.Released on ICCV 2019 workshopDatasetFashionIQRelative Caption:Relative Caption:Relative Caption:Relative Caption:Hui Wu,Yupeng Gao,Xiaoxiao Guo,Ziad Al-Halah,Steven Rennie,Kristen Grauman,Rogrio Feris:Fashion IQ:A New DatasetTowards Retrieving Images by Natural Language Feedback.CVPR 2021:11307

27、-11317The dataset contains 77,684 diverse fashion images(dresses,shirts,andtops&tees),side information in form of textual descriptions and product meta-data,attribute labels,and large-scale annotations of high quality relativecaptions collected from human annotators.|DatasetCSSExample images in CSS

28、dataset.The same scene arerendered in 2D and 3D images.Using the CLEVR toolkit for generating synthesized images.Render objects with different Color,Shape and Size(CSS)occupy.Three types of modification texts:adding,removing or changing object attributes.16K triplets for training and 16K triplets fo

29、r test.Examples of training triplets for CTI-IR.Nam Vo,Lu Jiang,Chen Sun,Kevin Murphy,Li-Jia Li,Li Fei-Fei,James Hays:Composing Text and Image for Image Retrieval-anEmpirical Odyssey.CVPR 2019:6439-6448|Dataset Limitations of previous dataset non-complex images within narrow domains contain many fal

30、se-negativesRelative Caption:Narrow Domains|Reference imageTarget imageare black with a colorful floral printPotential target images(false-negatives)Dataset Use the popular NLVR dataset for natural language visual reasoning as the source of images.Compose Image Retrieval on Real-life images(CIRR)dat

31、asetSamples in CIRR dataset(Over 36,000 pairs).Overview of the data collection process.|Zheyuan Liu,Cristian Rodriguez Opazo,Damien Teney,Stephen Gould:Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models.ICCV 2021:2105-2114Model Comparison Our method consistently surpass

32、all the baselines for all the three datasets,which reflects thesuperiority of our CLVC-Net.Performance comparison on FashionIQ,Shoes,and Fashion200k.|Case StudyIllustration of CTI-IR results obtained by our CLVC-Net on three datasets.Failure casesFailure casesgreen boxes:target items|Demo|We are the

33、 first to unify the global-wise and local-wise compositions withmutual enhancement in the context of CTI-IR.We devise two affine transformation-based attentive compositionmodules,towards the fine-grained multi-modal compositions for bothangles.Extensive experiments conducted on three real-world data

34、sets validatethe superiority of our model.Haokun Wen,Xuemeng Song,Xin Yang,Yibing Zhan,Liqiang Nie:Comprehensive Linguistic-Visual CompositionNetwork for Image Retrieval.SIGIR 2021:1369-1378Conclusion|04Future Work Pre-training TechniqueFuture Work|Using CLIP-based FeaturesUsing OSCAR as the composi

35、tion module Liu et al.ICCV 2021Baldrati et al.MMAsia 21 Limited Annotated SamplesFuture Work|Reference imagehas small straps,more plain and more revealingModification textTarget image Case1 from FashionIQ:Potential target images Case2 from Shoes:Reference imageTarget imageModification textare black with a colorful floral printPotential target images非常感謝您的觀看|

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后，可能會被瀏覽器默認打開，此種情況可以點擊瀏覽器菜單，保存網頁到桌面，就可以正常下載了。
3、本站不支持迅雷下載，請使用電腦自帶的IE瀏覽器，或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮，下載后原文更清晰。

本文（6-2 支持用戶反饋的對話式圖像檢索.pdf）為本站（云閑）主動上傳，三個皮匠報告文庫僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對上載內容本身不做任何修改或編輯。若此文所含內容侵犯了您的版權或隱私，請立即通知三個皮匠報告文庫（點擊聯系客服），我們立即給予刪除！

溫馨提示：如果因為網速或其他原因下載失敗請重新下載，重復下載不扣分。

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站