《黃興勃-基于FFI的PyFlink下一代Python運行時介紹.pdf》由會員分享,可在線閱讀,更多相關《黃興勃-基于FFI的PyFlink下一代Python運行時介紹.pdf(20頁珍藏版)》請在三個皮匠報告上搜索。
1、黃興勃(斷塵)-Apache Flink Committer 阿里巴巴高級開發工程師基于基于 FFI FFI 的的 PyFlinkPyFlink 下一代下一代 Python Python 運行時介紹運行時介紹PyFlinkPyFlink最新功能最新功能PyFlinkPyFlinkRuntimeRuntime基于基于FFIFFI的的JCPJCPPyFlinkPyFlinkRuntimeRuntime 2.02.0FutureFutureWorkWork#1#2#3#4#5#1#1PyFlinkPyFlink最新功能最新功能PyFlinkPyFlink 1.141.14的新功能的新功能#1性能性能
2、Operator FusionState 序列化/反序列化優化Finish Bundle優化功能功能State TTL config易用性易用性支持上傳tar.gz依賴包ProfilePrintLocal Debug#2#3#2 2PyFlinkPyFlink RuntimeRuntimePyFlink Architecture OverviewPython Table API&SQLPython DataStreamAPIPy4JTable API&SQL(Declarative)DataStream API(Imperative)CommonRulesOptimizerPythonRule
3、sJobGraphJavaOperatorsPythonOperatorsRuntimeJavaOperatorsPythonOperatorsDataServiceStateServiceDataServiceStateServiceUDFPyFlink RuntimeJava OperatorPython Workercheckpoint handlingwatermarkhandlingstaterequesthandlingJVMPVMPyFlink Runtime WorkFlow性能瓶頸性能瓶頸1.計算(Call UDF 環節的耗時)3.通信(JVM和PVM的進程間通信)2.序列化
4、/反序列化(輸入數據和UDF返回結果)codegen functioncython自定義序列化器generatorcythonJava/PythonJava/Python互相調用的問題互相調用的問題#3 3基于基于FFIFFI的的JCPJCPJava/PythonJava/Python互相調用的方案互相調用的方案#1基于基于FFIFFI的方案的方案IPCIPC通信方案通信方案PythonPython運行在運行在JVMJVM的方案的方案#2#32.共享內存 Shared MemoryPySpark Runtime py4jPyFlink&PySpark ClientAlink Runtime s
5、ocketTensorflow On FlinkPyArrow Plasma1.將Python 轉成 Java p2j voc2.基于Java實現的Python解釋器 Jython Graalvm grpcPyFlink On BeamIPCIPC性能問題性能問題1.網絡通信兼容性問題兼容性問題FFIFFI#1什么是什么是FFIFFIA foreign function interface(FFI)is a mechanism by which a program written in one programming language can call routines or make use
6、 of services written in another.This can be done in several ways:Requiring that guest-language functions which are to be host-language callable be specified or implemented in a particular way,often using a compatibility library of some sort.Use of a tool to automatically wrap guest-language function
7、s with appropriate glue code,which performs any necessary translation.Use of wrapper libraries對應的解決方案對應的解決方案JNI(Java Native Interface)Python/C API(CPython)CythonCtypes#2基于基于FFIFFI的方案的方案利用JNI和Python/C API,打通Java和PythonJavaCPythonJNIPython/C API各種實現對比各種實現對比Project描述缺點JPype通過PVM啟動JVM的方式,實現了Python調用Java
8、的功能。不支持Java調用PythonJEP通過JVM啟動PVM的方式,實現了Java調用Python的功能。1.不支持Python 調用 Java2.只能源碼包安裝(對環境有要求)3.不支持插拔使用(只能啟動時配置)4.框架層的性能開銷比較大JCP通過JVM啟動PVM的方式,實現了Java調用Python的功能,同時也支持Python對象回調Java的功能。幾種方案性能性能對比幾種方案性能性能對比278023906000050000120000042000032000003500000500000100000015000002000000250000030000003500000100byt
9、es String1k bytes StringPerformance of UDF(String Upper)JythonJepJcpJavaThroughput(QPS)JCPJCP架構架構JVMDamonThreadJNIPython/C APIThread 1Thread 2PythonSub InterpreterPythonSub InterpreterPythonMain InterpreterJNIJNIUDFUDFPython/C APIPython/C APIPVMJCP C LibraryProcess#4 4PyFlinkPyFlink RuntimeRuntime 2
10、.02.0PyFlinkPyFlink RuntimeRuntime 2.02.0GrpcServiceGrpcServiceGrpcServiceGrpcServiceUDFPyFlink RuntimeJava OperatorPython Workercheckpoint handlingwatermarkhandlingstaterequesthandlingJVMPVMJcp LibJcp LibJcp LibJcp LibUDFPyFlink Runtime 2.0Java OperatorPython Workercheckpoint handlingwatermarkhandl
11、ingstaterequesthandlingJVMPVMProcess 1Process 2One ProcessUDFUDF性能性能對比性能性能對比220000028000060000011000090000038000005000001000000150000020000002500000100bytes String1k bytes StringPerformance of UDF(String Upper)Java UDFPython UDFPython UDF On JcpThroughput(QPS)#5 5FutureFuture WorkWorkFutureFuture WorkWork#1JCPJCPJCP開源,作為第一個版本支持Java調用Python的功能JCP支持Numpy原生數據結構JCP支持Python調用Java的功能讓更多的項目使用JCPPyFlinkPyFlink OnOn JcpJcpPyFlink 1.15將會依賴JCP#220212021-1212-0505THANKS