《深度學習的遷移模型(46頁).pdf》由會員分享,可在線閱讀,更多相關《深度學習的遷移模型(46頁).pdf(46頁珍藏版)》請在三個皮匠報告上搜索。
1、深度學習的遷移模型楊強香港香港科技科技大學,大學,新明新明工程學講座工程學講座教授教授計算機計算機與工程系與工程系主任主任,大大數據研究所主任數據研究所主任AlphaGo 還不會做什么?舉一反三19x1921X21深度學習的遷移模型深度學習的遷移模型遷移學習 Transfer Learning遷移學習 下一個熱點:遷移學習 Transfer Learning遷移學習 Transfer Learning遷移學習的優點 1:小數據遷移學習的優點 2:可靠性ModelDomain1Domain2Domain3Domain4遷移學習的優點 3:個性化遷移學習的難點遷移學習本質:找出不變量Driving
2、 in Mainland ChinaDriving in Hong Kong,ChinaOne Knowledge,Two Domains Yann LeCun:機器學習的熱力學模型?(百度百科)熱力學主要是從能量轉化的觀點來研究物質的熱性質,它提示了能量從一種形式轉換為另一種形式時遵從的宏觀規律,總結了物質的宏觀現象而得到的熱學理論。遷移學習 Transfer Learning深度學習遷移學習:多層次的特征學習 Bengio,Yoshua,Aaron Courville,and Pascal Vincent.Representation learning:A review and new p
3、erspectives.IEEE transactions on pattern analysis and machine intelligence 35.8(2013):1798-1828.深度學習的遷移模型:定量分析inputoutputinputoutputdomain distance lossshared weightsTCA:Transfer Component Analysis:Pan,Sinno Jialin,Ivor W.Tsang,James T.Kwok,and Qiang Yang.Domain adaptation via transfer component ana
4、lysis.IEEE Transactions on Neural Networks 22,no.2(2011):199-210.GFK:Geodesic Flow Kernel:Gong,Boqing,Yuan Shi,Fei Sha,and Kristen Grauman.Geodesic flow kernel for unsupervised domain adaptation.In Computer Vision and Pattern Recognition(CVPR),2012 IEEE Conference on,pp.2066-2073.IEEE,2012.DLID:Deep
5、 Learning for domain adaptation by Interpolating between Domains:Chopra,Sumit,Suhrid Balakrishnan,and Raghuraman Gopalan.Dlid:Deep learning for domain adaptation by interpolating between domains.ICML workshop on challenges in representation learning.Vol.2.2013.DDC:Deep Domain Confusion:Tzeng,Eric,Ju
6、dy Hoffman,Ning Zhang,Kate Saenko,and Trevor Darrell.Deep domain confusion:Maximizing for domain invariance.arXiv preprint arXiv:1412.3474(2014).DAN:Deep Adaptation Networks:Long,Mingsheng,Yue Cao,Jianmin Wang,and Michael Jordan.Learning transferable features with deep adaptation networks.In Interna
7、tional Conference on Machine Learning,pp.97-105.2015.BA:Backpropagation Adaptation:Ganin,Yaroslav,and Victor Lempitsky.Unsupervised domain adaptation by backpropagation.In International Conference on Machine Learning,pp.1180-1189.2015.深度學習模型的遷移:定量分析20112012201320142015準確率10203040506070TCAGFKCNNDDCDL
8、IDDANBATL without DLTL with DLDL without TLDeepDeep AdaptationAdaptation NetworksNetworks (DAN)(DAN)LongLong etet al.al.20152015multi-layer adaptationImageNet is not randomly split,but into A=man-made classesB=natural classes深度遷移學習的量化分析3Conclusion 1:lower layer features are more general and transfer
9、rable,and higher layer features are more specific and non-transferrable.Conclusion 2:transferring features+fine-tuning always improve generalization.What if we do not have any labelled data to finetune in the target domain?What happens if the source and target domain are very dissimilar?Transferabil
10、ity of Layer-wise Features Transferability of Layer-wise Features varying four transfer strategiesvarying similarity between domainsConclusionsFine-tuning with labeled data in a target domain always helps.Transition from general to specific in a deep neural network.Performance drops when two domains
11、 are very dissimilar.What ifNo or limited labeled dataTwo dissimilar domainsUnsupervisedUnsupervised DeepDeep TransferTransfer LearningLearning Goal:learn a classifier or a regressor for a target domain which is unlabeled and dissimilar to a source domain.General architecture:Siamese architecturesou
12、rce classifierdomain distance minimizationsource inputtarget inputtied layersadaptation layersUnsupervisedUnsupervised DeepDeep TransferTransfer LearningLearning Objectivesource classifierdomain distance minimizationsource inputtarget inputtied layersadaptation layersUnsupervisedUnsupervised DeepDee
13、p TransferTransfer LearningLearningdiscrepancy lossDirectly minimizes the difference between two domains.Tzeng et al.2014,Long et al.2015,Long et al.2017adversarial lossreconstruction lossEncourages a common feature space through an adversarial objective with respect to a domain discriminator.Ganin
14、et al.2015,Tzeng et al.2015,Liu and Tuzel 2016,Tzeng et al.2017Combines both unsupervised and supervised training.Ghifary et al.2016,Bousmalis et al.2016DiscrepancyDiscrepancy BasedBased MethodsMethods A source domains parameters=a target domains parameters Overall objectivesource domain classificat
15、ion lossdomain distance lossmethodwhere to adaptdistance betweendistance metricTzeng et al.2014a specific layermarginal distributionsMaximum Mean Discrepancy(MMD)Long et al.2015multiple layersmarginal distributionsMulti-kernelMMD(MK-MMD)Long et al.2017multiple layersjoint distributionsJoint Distribu
16、tion Discrepancy(JDD)Similarly in RNN for NLPLili Mou,Zhao Meng,Rui Yan,Ge Li,Yan Xu,Lu Zhang,and Zhi Jin.How transferable are neural networks in NLP applications?In EMNLP 2016語音識別中的遷移學習:口音遷移Yue Zhao,Yan M.Xu,Mei J.Sun,Xiao N.Xu,Hui Wang,Guo S.Yang,Qiang Ji:Cross-language transfer speech recognition
17、 using deep learning.ICCA 2014多模態學習和遷移學習source domaininputtarget domaininputcommonPaired lossreconstruction layerreconstruction layerMultimodal Transfer Deep Learning with Applications in Audio-Visual Recognition,Seungwhan Moon,Suyoun Kim,Haohan Wang,arXiv:1412.3121加入正則化 Regularizationsource domaini
18、nputoutputtarget domaininputSoft Constraints1.Determinative Distance MMD2.Learn to align:fool the domain classifier Reverse Gradient:reverse the domain classifier gradient for CNN7 and RNN8 representation layers ADDA9:Alternatively Optimize Domain classifier layer or the common feature by fixing the
19、 other3.Auxiliary Task Loss Clustering10:add interpretability and enable zero-shot learning傳遞式的遷移學習 Transitive Transfer LearningBen Tan,Yu Zhang,Sinno Jialin Pan,Qiang Yang:Distant Domain Transfer Learning.AAAI2017Ben Tan,Yangqiu Song,Erheng Zhong,Qiang Yang:Transitive Transfer Learning.KDD 2015傳遞式遷
20、移學習1.A lot of labeled Source DataUnlabeled Intermediate Data3.Some labeled Target Data0/10/1Reconstructed Input DataLogistic Regression for Labeled DataSample Selection for Intermediate and Source DataParameter Initialization+Fine-tune Transfer Learning for Poverty prediction on satellite image4 VGG
21、-Net:initialize the parameter with last domain and then finetuneImageNetMassive dataPovertyScarce data Night LightLess dataClosely related G:生成模型 generator D:判別模型 discriminator 生成對抗網絡 GANnoiseG:generated samplestrue data samplesD:discriminatorTrue/fake?generatorsamplingGaussianGoodfellow,Ian,et al.“
22、Generative adversarial nets.”NIPS 2014.Goal:Transfer style from source to target No pair-wise correspondence(CycleGAN,DiscoGAN and DualGAN)Unsupervised cross-domain instance alignmentAlignment modelDiscoGAN(Kim et al.,2017)Alignment modelFirst,learn relations between handbags and shoesThen,generate
23、a shoe while retaining key attributes of handbagsCycle GAN Model architecture G:mapping from the source to the target,F:inverse mapping Total loss=Adversarial loss+cycle-consistency lossAdversarial lossCycle-consistency lossZhu,Jun-Yan,et al.Unpaired image-to-image translation using cycle-consistent
24、 adversarial networks.arXiv preprint arXiv:1703.10593(2017).CycleGAN can fool human annotators on 25%of trialsAlignment resultsMore image translation results produced by CycleGAN(Zhu et al.,2017)target domain has no labels;find common feature space between the source and target by formulating a min-
25、max game.Two constraints:Helpful for the source domain classification task indistinguishable between the source and target domain Adversarial domain adaptationGanin,Yaroslav,et al.Domain-adversarial training of neural networks.Journal of Machine Learning Research 17.59(2016):1-35.33Minimize source l
26、abel classification errorMaximize domain classification error Four source-target domain adaptationClassification accuracies for multiple domain adaptation pairsSource only:lower bound performance,no adaptation is performedTarget only:upper bound performance,train the classifier with known target dom
27、ain labelsSubspace Alignment(SA)(Fernando et al.,2013)Domain Adversarial Neural Networks(DANN)(Ganin,Yaroslav,et al.,2016)遷移學習應用案例1:解決大額消費金融的困境(第四范式)在千萬量級微信公眾號客戶中,挖掘近期有購車意向的客戶,通過微信營銷購車分期業務??蛻艨牲c擊其中鏈接提交申請。難點:難點:新渠道,成功辦理客戶1億),幫助汽車分期貸款模型學習效果:效果:與SAS相比,營銷響應率提升2 200%+00%+Dai,Wenyuan et al.2017跨領域輿情分析:IJCA
28、I 2017:Zheng Li,Yu Zhang,et al.輿情Books(源領域)Restaurant(目標領域)輿情Great books.His characters are engaging.The food is great,and the drinks are tasty and delicious.It is a very nice and sobering novel.The food is very nice and tasty,and well go back again.A awful book and it is a little boring.Shame on th
29、is place for the rude staff and awful food.“End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification”,IJCAI 2017,Zheng Li,et al.問題:如何自動找出 Pivot 關鍵詞?Capture evidence(sentences,words)by interest via attention mechanismMemory Networks同時使用Memory Network 和 GANMN-sentimentMN-domainS
30、entiment classifierDomain classifier跨領域輿情分析結果 共享汽車:公用私用分類 GPS+Time,1/15 sec,no labels,7 Days,10,000 cars遷移學習應用案例2:上海汽車汽車的互聯網汽車分類問題共享私用?Wang,Leye,et al.20172017-7-2042CoTrans FrameworkStage 1:Source-Target Domain LinkingShared(transferable)Features:dist.,cov.Random Forest(RF)Stage 2:Target Domain Co-
31、trainingOn RF+CNN(trajectory image)Trajectory image:the brighter color,the longer stay time in that cell.Stage 2:Co-Training 1.In Feature Space 1,train new model M1 and find samples by M1(First time M1 comes from Source Domain)2.In Feature Space 2,find image features of samples from Step 1,train mod
32、el M2;Find new samples by M2遷移學習深度學習深度學習的遷移模型感謝:李正,王偉彥,魏穎,譚奔博士,張宇博士,第四范式ReferencesReferences1 Bousmalis,Konstantinos,George Trigeorgis,Nathan Silberman,Dilip Krishnan,and Dumitru Erhan.Domain separation networks.In Advances in Neural Information Processing Systems,pp.343-351.2016.2 Ganin,Yaroslav,an
33、d Victor Lempitsky.Unsupervised domain adaptation by backpropagation.In International Conference on Machine Learning,pp.1180-1189.2015.3 Ghifary,Muhammad,W.Bastiaan Kleijn,Mengjie Zhang,David Balduzzi,and Wen Li.Deep reconstruction-classification networks for unsupervised domain adaptation.In Europe
34、an Conference on Computer Vision,pp.597-613.2016.4 Liu,Ming-Yu,and Oncel Tuzel.Coupled generative adversarial networks.In Advances in neural information processing systems,pp.469-477.2016.5 Long,Mingsheng,Yue Cao,Jianmin Wang,and Michael Jordan.Learning transferable features with deep adaptation net
35、works.In International Conference on Machine Learning,pp.97-105.2015.6 Long,Mingsheng,Jianmin Wang,and Michael I.Jordan.Deep transfer learning with joint adaptation networks.In International Conference on Machine Learning,2017.7 Tzeng,Eric,Judy Hoffman,Ning Zhang,Kate Saenko,and Trevor Darrell.Deep
36、domain confusion:Maximizing for domain invariance.arXiv preprint arXiv:1412.3474(2014).8 Tzeng,Eric,Judy Hoffman,Trevor Darrell,and Kate Saenko.Simultaneous deep transfer across domains and tasks.In Proceedings of the IEEE International Conference on Computer Vision,pp.4068-4076.2015.9 Tzeng,Eric,Judy Hoffman,Kate Saenko,and Trevor Darrell.Adversarial discriminative domain adaptation.arXiv preprint arXiv:1702.05464(2017).