《TensorRT Onnx Parser 使用案例分享.pdf》由會員分享,可在線閱讀,更多相關《TensorRT Onnx Parser 使用案例分享.pdf(46頁珍藏版)》請在三個皮匠報告上搜索。
1、NVIDIABest Practices of TensorRT ONNX ParserWANG Meng,2020/12#page#OUTLINE口ONNX Introduction口 TF2ONNX Introduction口 TensorRT ONNX Parser口Optimization口 Refit口Summary#page#ONNX INTRODUCTIONONNX: Open Neural Network ExchangeTraining frameworkDeployment targetOPTcCaffehttps:/ INTRODUCTIONONNX: an open a
2、nd interoperable format for ML modelsTraining frameworkDeployment targetOPyiONNXFocus hardware innovation onpue uado ueFreedom to use toolls) ofNN optimizations for a singleinteroperablechoice compatible with ONNXformat forformat instead of manyMLmodelshttps:/ INTRODUCTIONONNX Overview口https:/ ONNX
3、is an open specification that consists of the following componentsA definition of an extensible computation graph modelDefinitions of standard data types.Definitions of built-in operators.口 Operator sets define the available built-in operators and their version (currently 6-12)口 The newest operator
4、set supports around 160 operatorshttps:/ INTRODUCTIONIntermediate Representation口 Model is top-level ONNX construct and represented in protocol buffers as the typeonnx.ModelProto.Model is consisted of graph and associated metadata.Graph defines the computational logic of a model and contains a list
5、of nodes that form adirected acyclic graph based on their inputs and outputs. The nodes in the graph are sortedtopologically.Edges in the computation graph are established by outputs of one node being referenced byname in the inputs of a subsequent node.Nodes are comprised of a name, the name of an
6、operator that it invokes,a list of named口inputs,a list of named outputs,and a list of attributesAU node output names MUST be unique within a graphhttps:/ INTRODUCTIONStructure of onnx.proto3onnx.proto3 is a general network definition protobuf.message NodeProtofmessage GraphProto frepeated string inp
7、ut=1;repeated ValuelnfoProto input = 11message ModelProtofrepeated string output=2repeated ValuelnfoProto output = 12;GraphProto graph=7;string name=3;repeated TensorProto initializer = 5;repeated OperatorSetldProtoopset_import=8;string op_type=4/ namespace Operatorstring name=2;3string domains7;/na
8、mespace Domainrepeated NodeProto node =1;3repeated AttributeProto attribute = 5;3e.g. Cony RelunVICLhttps:/ ONNX Introduction口TF2ONNX Introduction口 TensorRT ONNX Parser口Optimization口 Refit口Summary#page#TF2ONNX INTRODUCTIONConvert TensorFlow models to ONNX口https:/ Python version 2 3.6tf2onnx supports
9、 ONNX opset-6 to opset-12tf2onnx supports nearly 200 kinds of opFor those unsupported op,-continue_on_error is recommended to produce ONNX model口files口 Higher opset like 11 or 12 is recommended#page#TF2ONNX INTRODUCTIONExperience Share口 Usage examplepython3 -m tf2onnx.convert -input onnx_model/model
10、.pb -output onnx model/model.onnxverbose -opset=11-continue_on_error -inputs GRAPH_INPUTS -outputs GRAPH_OUTPUS-inputs,-outputs are the list of input/output node names in graph, with name format asnode_name:port id, typically like input0:0. If some input nodes are not actually used, we needto remove
11、 them from -inputs.AIl output names should be unique within a graph. So pleasewell-prepare your graph input and output node names.-verbose will summary abundant information, like the type and number of operators and graphoptimization results.-continue_on_error allows unsupported operators and will p
12、reserve all the informationtf2onnx optimizer might change the graph undesirably, like removing Add node if bias is zeroinitialized. So a really trained pb model is ppreferred.#page#OUTLINE口 ONNX Introduction口 TF2ONNX Introduction口TensorRT ONNX Parser口Optimization口 Refit口Summary#page#TENSORRT ONNX PA
13、RSERConvert ONNX model to TensoRT-builtin_op_importers.cppbuiltin_op_importers.hpp口https:/ parser supports more than one hundred OPS. For those口Modellmporter.cppModellmporter.hppunsupported OPs, we can implement it with TensorRT pluginsNOnnxParser.cppNOnnxParser.hInputXNNOonnx2trt.hppTensorsonnx2trt
14、_runtime.hppTensorflowonnx2trt_utils.cpponnx2trt_utils.hppCaffeOnnxAttrs.cppMetworkRuntimeBuitderEngineOnnxAttrs.hppDefinitonPythonAP1onnx_trt_backend.cpponnx_utils.hppC+APIShapedweights.cppDeseriatizeSerializeShapedWeiehts.hppMATLABShapeTensor.cppOptimizationOutputShapeTensor.hppParametersTensorsSt
15、atus.hppbytesTensororWeights.hpptoposort.hpptrt_utils.hpputils.hpphttps:/ api/parsers/onnx/pyOnnx.html#page#TENSORRT ONNX PARSERParse the ONNX model file and populate TensorRT network2Create builderCreate networkcreatelnferBuilder(glogger)iBuidercreateNtworkOiNetworkDefnilioniBuideriLoggeCreateiBuil
16、der with glogger as theinput argumentcreateNetwork()create the network43Create parser 8 parse imported modelBuild engineONNX: parser = nvonnxparser:createParser(network, glogger);Optimization phaseiCudaEngineiParserparseOiBuideriNeworDeinionbuildCudaEngine0buildCudaEngineO) of iBuilder is called to
17、createUsing parseO to read the model file andpopulate TensorRT network with model as input args andengine with networkas input argumentnetworkasoutputargshttps:/ ONNX PARSERUsage of Python API口 Usage example:#Createbuilder,networkand parserTRT_LOGGER=trt.Logger(trt.Logger.VERBOSE)EXPLICIT_BATCH=1net
18、work()-XTaddActivation(input, nvinfer1:ActivationType::kRELU)Input tensorreturnfflayer-getOutput(0)33;sindinoYT#define DEFINE_BUILTIN_OP_IMPORTER(op)Josuat ndinoNodelmportResult import#fopdTYPE CONSTRAINTSllmporterContext* ctx,T:tensor(float16),tensor(float),tensor(double):ONNX NAMESPACE::NodeProto
19、const8t node,Constrain input andoutput typesto float tensors.std:vector8 inputs);https:/ opimporters.cpp#page#TENSORRT ONNX PARSERHow to Support New Layers/oPs ?Solution 1: modify TensorRT OSS (parsers and plugins)口Implement TensorRT plugins口Add plugins to the main TensorRT repository口Add specific i
20、mporter function DEFINE_BUILTIN_OP_IMPORTER(plugin) inbuildin_op_importers.cpp口Build TensorRT-OSShttps:/ ONNX PARSERNODEPROPERTIESOneHot in ONNX:bpeOneHet口Produces a one-hot tensor based on inputs.namonehot口Inputs;ATTRIBUTEindices: input tensor containing indices.INPUTSdepth: scalar specifying the n
21、umber ofindicesnameCast4:0classes in one-hot tensor.depthmeconst fold_opt 41values: rank 1 tensor off_valuekind Initializeron_value,like O,1.type:int32116口Attributes:oncat_17:0aluesaxis:along which one-hot representationSInIOin added; Default as-1.indinoZnVDIA#page#TENSORRT ONNX PARSERDevelop Tensor
22、RT Plugin GOneHot)口A custom layer is implemented by extending the class iPluginCreator and oneof TensorRTs base classes for plugins:Table1.Baseclasses,orderedfrom least expressiveto most expressiveIntroduced in TensorRT version?Mixedinput/output formats/types Dynamicshapes?5.1LimitedNoIPluginv2Ext6.
23、0.1NoGeneralIPluginv2IOExtGeneralIPluginv2pynamicExt6.0.1Yes口 For these interfaces, we recommend to use IPluginvzioExt if you do not needto support dynamic shapes, otherwise use IPluginV2DynamicExt.https:/ api/namespacenvinfer1.html#page#TENSORRT ONNX PARSERDevelop TensorRT Plugin (OneHot)./OnehotPl
24、ugin/OnehotPlugin.hclass onehotEncoder:public IPluginv2IOExt.implement al class methods for your plugin3class onehotEncoderCreator: public IPluginCreator.implement all creator methods here3口./OnehotPlugin/OnehotPlugin.cuint onehotEncoderenqueue(int batchsize,const void*const inputs,void *outputs,voi
25、d* workspace,cudaStream_t stream)provide parameters and call onehotEncoderKernel3REGISTER_TENSORRT_PLUGIN(onehotEncoderCreator)口./OnehotPlugin/CMakeLists.txtfile(GLOB SRCS*.cu)https:/ ONNX PARSERAdd plugin to the main TensorRT repository (OneHot)口 Add a new folder OnehotPlugin and the source code un
26、der the STRT/plugins directoryOnehotPluginCMakelists.txtOnehotPlugin.cuOnehotPlugin.h口Add the folder to STRT/plugins/CMakeLists.txtSet(PLUGIN LISTSOnehotPlugin#page#TENSORRT ONNX PARSERAdd specific importer function DEFINE BUILTIN_OP_IMPORTER(OneHot)DEFINE_BUILTINOP_IMPORTER(OneHot)nvinfer1::Tensor*
27、 indices= aconvertToTensor(inputs.at(0),ctx);autoweight=inputs.at(1)weights(;int depth=static_castint*(weight.values)0OnnxAttrsattrs(node,ctx)/ Populate OneHot plugin properties.const std:string pluginName= onehotEncoderconst std::string pluginVersion=“1”std:vectorf;f.emplace_back(depth,adepth,nvinf
28、er1:PluginFieldType:kINT32,1);nvinfer1:Pluginv2* plugin = createPlugin(node.nameO,importPluginCreator(pluginName,pluginVersion),f);ASSERT(plugin l= nullptr 88 “OneHot plugin was not found in the plugin registry!”ErrorCode::kUNSUPPORTED_NODE)nvinfer1::ITensor *input_datal= findices3RETURNFIRST_OUTPUT
29、(ctx-network(-addPluginv2(inputt_data,1,plugin)#page#TENSORRT ONNX PARSERBuild TensorRT-OSS口Steps:export LD_LIBRARY_PATH=SLD_LIBRARY_PATH:STRT/TensorRT-7.2.1.6/libcd STRT/buildcmake.-DTRT_LIB_DIR=STRT/TensorRT-7.2.1.6/lib -DTRT_BIN DIR= pwd/outmake -jS(nproc)m make install口 Verify that new plugins h
30、ave been integrated to TensorRT successfully.https:/ ONNX PARSERHow to Support New Layers/oPs ?Solution 2: utilize Fallback mechanismImplement TensorRT pluginsBuild a standalone library for individual plugins口Pre-load the library and ONNX parser will automatically attempt to importunsupported Layers
31、/oPs as plugins (FallbackPluginlmporter)nVIDIA#page#TENSORRT ONNX PARSERTips口 Implement TensorRT pluginsThe inputs/outputs of the plugin layer in the ONNX graph should be the same as your TensorRTpluginThe name/version of the plugin layer in the ONNX graph should be the same as thename/version retur
32、ned by the getPluginName/Version function of the PluginCreator classThe attributes set for the custom layer in ONNX must match with the plugin attributes ofPluginCreator classRemember to implement iPluginCreator:getFieldNamesO groupNormalizationPlugin is a goodexample to learn.口 Build a standalone l
33、ibrary for individual pluginsUse makefile to build a standalone library (./lib/onehot.so)口 Preload the plugin library and parsePython APl: ctypes.cdl.LoadLibrary(./lib/onehot.so)Command line tool: trtexec -onnx=model.onnx -plugins=./lib/onehot.so#page#TENSORRT ONNX PARSERAlign ONNX Layer/op with Ten
34、sorRT plugin口 OneHot in ONNX口 AlignmentInputs:indices, depth,valuesModify OneHot to customized OP in ONNXAttributes: axisfor node in graph.node:if node.op_type= “OneHot:口 onehotEncoder plugin in TensorRTonehot = onnx.helper.make_node(onehotEncoder,Inputs: indicesname = node.name,depth=depth,Attribut
35、es: depthinputs= Inode.inputrojoutputs = node.outputfoAssume axis is-1(the last1dimension),values are O,1nodes_remove.append(node)nodes_extend.apppend(onehotl)#page#TENSORRT ONNX PARSERCase 1: unsupported OPs and implemented by plugins口 Example: OneHot口 Solution 1: modify TensorRT OSSvery complicate
36、d since we need to set up the building environment andmodify parser and plugin folderno strict restrictions on plugins as long as the importer function is welwritten, very flexible口 Solution 2:utilize Fallback mechanismeasy since we only build a standalone librarysuitable for importing unsupported O
37、Ps as plugins, restrictedrequired to modify the definition of ONNX OPs when the inputs, outputs orattributes of ONNX OPs dont match pluginsnVIOIA#page#TENSORRT ONNX PARSERCase 2: unsupported OPs and implemented by TensorRT layers口 Example: Sign, CudnnRNNV3,when we want to import these OPs withTensor
38、RT layers since plugin implementation is complicated and prone toerrors for beginners口 Solution 1: modify TensorRT OSSadd specific importer function for unsupported OPs= for example, use IRNNv2Layer to parse CudnnRNNV3 op口 Solution 2: utilize Faltback mechanismm not suitablenVIOL#page#TENSORRT ONNX
39、PARSERCase 3: supported OPs and but inefficient口 Example: Resize口 Solution 1: modify TensorRT OSSmodify the importer function to call plugins instead of original TRT layers口Solution 2:utilize Fallback mechanismrequired to modify the op_type of ONNX OPs like appending a tag “Plugin”(from Resize to Re
40、sizePlugin) since falback mechanism is only used forunsupported OPsnVIOL#page#TENSORRT ONNX PARSERCase 4: required to select TRT layers or plugins based on conditions口 Example: Reduce口 Solution 1: modify TensorRT OSS sometimes we only implement a plugin for specific caseadd specific importer functio
41、n to do selection based on conditions like inputshape or axis口 Solution 2: utilize Fallback mechanism modify the op_type of ONNX OPs based on conditionsnVIOL#page#TENSORRT ONNX PARSERCase 5:required to fix the issues of ONNX parser口 Example: ONNX parser v6.0 doesnt support bool weights口 Solution 1:
42、modify TensorRT OSSfix issues of ONNX parser, for example,add support for importing boolweights口 Solution 2: utilize Faltback mechanismnot suitablenVIDIA#page#TENSORRT ONNX PARSERComparison based on my own experienceCasesSolution 1:Solution2:modify TensorRT OSSfallback mechanism1.Unsupported OPsComp
43、licatedPrefered, much simplerImplemented by plugins2. Unsupported OPsModify parserImplemented by TensorRT Layers3.Supported OPs but inefficientModify parser to call pluginsModify op_type to utilizeImplemented by pluginsinstead of original TRT layersfaltback plugin importer4.Required to select TRT la
44、yers orModify parserModify op_type based onplugins based on conditionsconditions5.Required to fix the issues ofModify parserONNX parser#page#OUTLINE口 ONNX Introduction口 TF2ONNX Introduction口 TensorRT ONNX Parser口Optimization口 Refit口 Summary#page#OPTIMIZATIONIntroduction口 Kernel optimization口 Graph f
45、usionONNX graph level:ONNX Python APIONNX GraphSurgeon is a tool that allows you to easily generate new ONNX graphs, ormodify existing ones. Please refer to this excellent NVIDIA developer blog for more details.TennsorFlow graph level:graphsurgeon-tf allows you to transform TensorFlow graphs. lts ca
46、pabilities are broadlydivided into two categories: search and manipulation. Search functions allow you to findnodes in a TensorFlow graph. Manipulation functions allow you to modify,add, or removenodes.https:/ api/graphsurgeon/graphsurgeon.html#page#OPTIMIZATIONKernel OptimizationCUDA(Tesla V100-PCI
47、E-32GB99.9%Defaultstream(17)¥93.2%Kernels18.3%ResizeBilinearKernel13.8%ttvolta_scudnnwinograd.128x128ldgg4.re11.9%ttvoltascudnn.128x3.relu.smaln_y009.9%implicitconvolve_sgemm9.8% copyPackedKemnelI16.3%trt volta scudnn 128x64relu interior nnv6.1%initArray2Val4.5%trtvolta_scudnn_128x32relumediumnn_v14
48、.3%launchPointwise3.4%op_generic_tensor kernel26kemelgroupshidden6.89MemoryNVTX(TensorRT)#page#OPTIMIZATIONKernel Optimization口 Solution 1:Implement your resize pluginAdd the plugin to TensorRTaulayaugazsaz neyp Jo peasu uonid au o lasied XNNO ApOWBuild TensorRT-OSS口 Solution 2 (preferred)Implement
49、your resize plugin named by ResizePluginModify the op_type from original Resize in ONNX graph to ResizePluginBuild a standalone libraryhttps:/ dilated convIn ONNX graph, dilated conv isSpaceToBstcNDconverted to even 12 OPs.(2=2)However both ONNX and TRTsupport dialated conv.Corv2DCon2DBetchToSpceNDB
50、etchToSpOptimization planMerge these small OPs to dilatedBisAddiwAddconv in ONNX-graph level00In TensorFlow graph, dilatedconv is implemented by 5small OPs.#page#OPTIMIZATIONGraph Fusion with ONNX Python APIfornodeingraph.nodesiffirstandfirst.op_type=Padandsecondandsecond.op_type=“Transpose”andthird
51、 and third.op type = SpaceToDepth and fourth and fourth.op_type = Transpose“Transposex=first.inputfodilations=third.attributeo.ikernel_shape=fifth.attribute2.intsstrides=fifth.attribute1.intsweights=fifth.input1biases=/.join((fifth.outputo).split(/):-1)+/biases/read:0y=/.join((fifth.outputo).spiit(/
52、:-1)+/BatchTospaceND:0dilated_conyonnx.helper.make_nodeConv,name=fifth.name,inputs=x,weights,biases,outputs=y,kernelshape=kernel_shape,strides=strides,dilations=dilations,dilations,#Defaultvaluesfor otherattributes:strides=1,1,dilations=1,1,groups=1auto_pad=SAME_UPPER,first= secondsecond=thirdthird=
53、 fourthfourth=fifthfifth=nodenVIOL#page#OPTIMIZATIONGraph Fusion with ONNX Python APIfor node in graph.nodes:#Search for pattern for modification in ONNX graphiffirst and first.op_type=“DepthToSpaceand second andsecond.op_type=“Transposeand thirdand third.op_type =slice and fourth and fourth.op_type
54、 =Addandfifth and fifth.op_type=“Relu and node.op_type=Transpose:relu=onnx.helper.make_nodelRelu,inputs=fourth.input,outputs=node.output.nodes_remove.extend(first,second,third,fourth,fifth,node)nodes_extend.append(irelu)first= secondsecond= thirdthird= fourthfourth=fifthfifth=nodefor node in nodes_r
55、emove:graph.node.removenode)for node in nodes_extend:graph.node.extend(node)model_def=onnx.helper.make_model(graph)onnx.save(model def/onnx model/modeldilated_conv.onnx)https:/ ONNX Introduction口 TF2ONNX Introduction口 TensorRT ONNX Parser口Optimization口 Refit口Summarry#page#REFITIntroductionul pitngal
56、 o Suleu nouam sulem Meu qIM eusua ue lJel ue Lalosuel口engine must be built as “refittable”Set builder.refittable = TrueCreate a refitter object with trt.Refitter(engine,TRT_LOGGER)as refitter:Use refitterget aO to get a list of all refittable layers (ayer name,string) and associated weightRoles in
57、the networkUpdate the weights that you want to update. refitter.set_weightsCMyLayer,trt.WeightsRole.KERNEL,trt.Weights(tf_weight)Update the engine with all the weights that are provided.refitter.refit_cuda_engine0https:/ api/inffer/Core/Refiitterhtm#page#REFITRefit Map口 Target: update TRT refittable
58、 weights with new trained TF weight valuesaupu uBM Hl o aueu Jare) YIOMN IL deu O MOu :Kiny口口 Solution: ONNX parser maintains refit map to record the mapping relation of ONNX weightname and TRT Network layer name.口Refit map:weightNames,layerNames,roles = parser.get_refit_mapO)forw,l,rin zip(weightNa
59、mes,layerNames,roles):print(layerName:8,weightName:8,weightRole:8.format(l,w,r)口TF-ONNX-TRTTF weight name -s ONNX weight name s TRT Network layer namenVIOL#page#REFITRemaining Issues口TensorRT does not support refitting together with dynamic shapes right now.Although ONNX parser provides refit map fo
60、r easier name mapping, sometimes we stilhave to determine mapping rules manually because tf2onnx might change layer or weightnames.口The refitting time might be too long when one weight is repeatedly refit. For example,ifone weight is used by five layers,we have to do refitting five times with this w
61、eight.口We have some workarounds to deal with these issues and are working positively to solvethem.anvohttps:/ api/infer/Core/Refiitterhtml#page#OUTLINE口 ONNX Introduction口 TF2ONNX Introduction口 TensorRT ONNX Parser口Optimization口 Refit口Summary#page#SUMMARYup u Aueau XHomau Losual aeindod pup pou XNNO
62、 asJed ue lasied XNNO口automaticway.ONNX parser only supports full-dimensions mode, meaning that your network definitionmust be created with the EXPLICIT_BATCH flag set. Please refer to dynamic shapes formore details.ONNX parser might cause some performance drop compared with a wellbuilt TensorRTnetwork through TensorRT API. But the performance gap can be reduced with differentkinds of optimization,like kernel optimization and graph fusion.pup udes epns uolsald paxlui Au o papuauuosal ale noK uoez!uldo Jauiny JOmulti-streams,Welcome to contribute to ONNX parser!RVIOIhttps:/