《Databricks 上的 SQL 編程.pdf》由會員分享,可在線閱讀,更多相關《Databricks 上的 SQL 編程.pdf(42頁珍藏版)》請在三個皮匠報告上搜索。
1、2024 Databricks Inc.All rights reservedSQL SQL PROGRAMMING IN PROGRAMMING IN DATABRICKSDATABRICKSSerge Rielau&Milan StefanovicSerge Rielau&Milan StefanovicJune 2024June 202412024 Databricks Inc.All rights reserved Sorting Sorting-your wayyour waySorting points by distance using LAMBDA and SQL UDFSor
2、ting strings,properly(Sneak peek!)Short handsShort handsGROUP BY,ORDER BYSELECT*-unleashed VariablesVariablesSQL Session VariablesWhat about identifiers?ScriptingScriptingEXECUTE IMMEDIATESQL/PSM:Its like SQL,but scripted(Sneak peek!)2AgendaAgenda2024 Databricks Inc.All rights reserved Task“Sort an
3、array of points by distance from(0,0)”Need a custom sort orderarray_sort()for sortinglambda function for the math Hide complexity in a SQL UDF3Sorting Sorting-Your wayYour wayQuicksort and custom sort expressionsQuicksort and custom sort expressions2024 Databricks Inc.All rights reserved Can be pass
4、ed to a number of builtin map/array functions Operates on each element,value of the map/arrayp-expr(p)(p,.)-expr(p,.)p:One or more identifiers,as required by the host function.expr(p,.):A simple(no subqueries,or SQL UDF)expression using p.Result must comply with expectations of the host function.4LA
5、MBDA functionsLAMBDA functionsAnonymous function with one or more named parameters2024 Databricks Inc.All rights reserved SELECTSELECT array_sort(array(5,2,8,1,3),(a,b)-a-b)ASAS sorted;=1,2,3,5,8Sorting Distancesd=sqrt(x*x+y*y)d1 d2 x1*x1+y1*y1 0=a blambda(a,b)0=a bdxy2024 Databricks Inc.All rights
6、reserved SELECTSELECT array_sort(points,(p1,p2)-(p1.x*p1.x+p1.y*p1.y)-(p2.x*p2.x+p2.y*p2.y)ASAS pointsFROMFROM point_arrays;=,6Sorting by distanceSorting by distancelambda(p1,p2)0 (x1*x1+y1*y1)(x2*x2+y2*y2)lambda(p1,p2)0 (x1*x1+y1*y1)(x2*x2+y2*y2)dxy2024 Databricks Inc.All rights reserved Stored in
7、UC as a reusable asset Support named parameter invocation and defaultingScalarScalar Encapsulate(complex)expressions,including subqueries May contain subqueries Return a scalar value Can be used in most places builtin functions go.TableTable Encapsulate(complex)correlated subqueries aka a parameteri
8、zed view Can be used in the FROM clause7SQL UDFSQL UDFScalar and Table UDFs written in SQLScalar and Table UDFs written in SQL2024 Databricks Inc.All rights reservedCREATECREATE FUNCTIONFUNCTION points_sort(points arraystruct)RETURNRETURN array_sort(points,(a,b)-(a.x*a.x+a.y*a.y)-(b.x*b.x+b.y*b.y);S
9、ELECTSELECT points_sort(points)ASAS pointsFROMFROM point_arrays;=,8SQL UDF sorting by distance SQL UDF sorting by distance Hiding all that complexityHiding all that complexity2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved9AnnouncingAnnouncingCollation Collation Suppor
10、tSupport2024 Databricks Inc.All rights reserved Associate columns,fields,array elements with a collation of choice Case insensitive Accent insensitive Locale aware Supported by many string functions such as lower()/upper()substr()locate()like Supported by Delta and Photon GROUP BY,ORDER BY,compariso
11、ns,.10ANSI SQL COLLATE(private preview)ANSI SQL COLLATE(private preview)Sorting and comparing strings according to localeSorting and comparing strings according to localePrivate Preview2024 Databricks Inc.All rights reservedSELECTSELECT name FROMFROM names ORDERORDER BYBY name;Name AnthonyBerthaanth
12、onyberthanthnIs this really what we want here?Is this really what we want here?11A look at the default collation A look at the default collation A Z a z A Z a z 2024 Databricks Inc.All rights reservedSELECTSELECT name FROMFROM namesORDERORDER BYBY name COLLATECOLLATE unicode;Name nthnanthonyAnthonyb
13、erthaBerthaRoot collation with decent sort order for most locales12COLLATE UNICODE COLLATE UNICODE One size,fits mostOne size,fits most2024 Databricks Inc.All rights reservedSELECTSELECT nameFROMFROM namesWHEREWHERE startswith(name COLLATECOLLATE unicode_ci,a)ORDERORDER BYBY name COLLATECOLLATE unic
14、ode_ci;Name anthonyAnthonyCase insensitive is not accent insensitive:We lost nthn13COLLATE UNICODE_CI COLLATE UNICODE_CI Case insensitive comparisons have entered the chatCase insensitive comparisons have entered the chat2024 Databricks Inc.All rights reservedSELECTSELECT nameFROMFROM namesWHEREWHER
15、E startswith(name COLLATECOLLATE unicode_ci_ai,a)ORDERORDER BYBY name COLLATECOLLATE unicode_ci_ai;Name nthnanthonyAnthony100s of supported predefined collations across many locales14COLLATE UNICODE_CI_AI COLLATE UNICODE_CI_AI Equality from a to Equality from a to 2024 Databricks Inc.All rights rese
16、rved2024 Databricks Inc.All rights reserved15SQL ShorthandsSQL Shorthands2024 Databricks Inc.All rights reserved BeforeSELECTSELECT last,first,id,mgr,extract(year FROMFROM workday),sum(hours),sum(pay)FROMFROM empsGROUPGROUP BYBY last,first,id,mgr,extract(year FROMFROM workday)ORDERORDER BYBY last,fi
17、rst,id,mgr,extract(year FROMFROM workday)AfterSELECTSELECT last,first,id,mgr,extract(year FROMFROM workday),sum(hours),sum(pay)FROMFROM empsGROUPGROUP BYBY 1,2,3,4ORDERORDER BYBY 1,2,3Is that the best we can do?!16GROUP BY and ORDER BYGROUP BY and ORDER BYHumble beginnings Humble beginnings 2024 Dat
18、abricks Inc.All rights reserved ExpectationGROUP BY all column in the select list which are not aggregated!ORDER BY all columns left to right(or enough to guarantee uniqueness)Let Databricks figure it out SELECTSELECT last,first,id,mgr,extract(year FROMFROM workday),sum(hours),sum(pay)FROMFROM empsG
19、ROUPGROUP BYBY ALLALLORDERORDER BYBY ALLALL Better,.17GROUP BY and ORDER BYGROUP BY and ORDER BYJust do it!Just do it!2024 Databricks Inc.All rights reserved SELECT*is the bad boy of SQL!What if the schema changes?No one knows what the SQL is doing!We all hate it But we all use it why?Carry over fro
20、m OLTP where schema evolution is tightly controlled.In the Lakehouse,schema evolution is expected!18SELECT*to ALLSELECT*to ALLHold my beer!Hold my beer!2024 Databricks Inc.All rights reservedSELECTSELECT*FROMFROM t,s;Select all columns available in FROMSELECTSELECT t.*FROMFROM t,s;Select all columns
21、 available in tSELECTSELECT*EXCEPTEXCEPT(col1,col2)FROMFROM t,s;Select all column except col1 and col2 Can also exclude fields in a struct!19SELECT*SELECT*The early years The early years 2024 Databricks Inc.All rights reservedWITHWITH person(name,first,address)AS(VALUESVALUES(Coyote,Wiley,named_stru
22、ct(street,123 Canyon Rd,city,Grand Canyon,zip,12345)SELECTSELECT*EXCEPTEXCEPT(address),address.*EXCEPTEXCEPT(street)FROMFROM person;namefirst cityzipCoyote Wiley Grand Canyon 1234520*unleashed*unleashedUnnesting fields in a structUnnesting fields in a struct2024 Databricks Inc.All rights reservedWIT
23、HWITH person(name,first,street,city,zip)AS(VALUESVALUES(Coyote,Wiley,123 Canyon Rd,Grand Canyon,12345)SELECTSELECT name,first,struct(*EXCEPTEXCEPT(name,first)AS addressFROMFROM person;namefirst addressCoyote Wiley street:123 Canyon Rd,city:Grand Canyon,zip:1234521*unleashed*unleashedNesting columns
24、into a structNesting columns into a struct2024 Databricks Inc.All rights reservedWITHWITH contact(name,work,home)AS(VALUESVALUES(Coyote,905-555-1234,408-555-1234)SELECTSELECT name,array(*EXCEPT(name)AS numbersFROMFROM contact;namenumbersCoyote 905-555-1234,408-555-123422*unleashed*unleashedTransposi
25、ng columns into an arrayTransposing columns into an array2024 Databricks Inc.All rights reservedFor fixed&variable length argument functions LEAST(*)GREATEST(*)COALESCE(*)CONCAT(*),CONCAT_WS(*)and with UDFs,tooEven in the WHERE clause e.g.IN(*)23*unleashed*unleashedCan be used just about anywhere.Ca
26、n be used just about anywhere.2024 Databricks Inc.All rights reservedWITHWITH person(name,first,street,city,zip)AS(VALUESVALUES(Coyote,Wiley,123 Canyon Rd,Grand Canyon,12345)SELECTSELECT concat_ws(,*)AS resultFROMFROM person;resultCoyote,Wiley,123 Canyon Rd,Grand Canyon,1234524*unleashed*unleashedFi
27、nally:Serializing a row into a stringFinally:Serializing a row into a string2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved25Variables et al.Variables et al.2024 Databricks Inc.All rights reserved Uniquely named placeholder for a typed literal Safe from SQL injection A
28、djust type and input through automatically generated notebook widget Reference most places literals go.Cannot fill value from SQL 26Named parametersNamed parametersThe mustache is dead,long live the colon:!The mustache is dead,long live the colon:!2024 Databricks Inc.All rights reserved Declarative,
29、with type and default.DECLAREDECLARE VARIABLEVARIABLE name STRINGSTRING DEFAULTDEFAULT anthony;orDECLAREDECLARE name=anthony;Reference anywhere a query literal can go.SELECTSELECT*FROMFROM namesWHEREWHERE name=session.name COLLATECOLLATE unicode_ci;nameanthonyAnthony27Session variablesSession variab
30、lesFlowing data through a session,using SQL only.Flowing data through a session,using SQL only.2024 Databricks Inc.All rights reservedSet using SQL expressionsSETSET VARVAR name=(SELECTSELECT min(name)FROMFROM names);Set multiple variables at onceSETSET VARVAR(first,last)=(SELECTSELECT first,last FR
31、OMFROM person WHEREWHERE id=:id)Reset to defaultSETSET VARVAR name=DEFAULTDEFAULT;Private to the session(like a temp view)28Session variablesSession variablesFlowing data through a session,using SQL only.Flowing data through a session,using SQL only.2024 Databricks Inc.All rights reserved29How about
32、 table parameters/variables?How about table parameters/variables?Can I pass a table name as a parameter?Can I pass a table name as a parameter?Values are not names Need to teach Databricks to Evaluate value during parsing Turn values into a name2024 Databricks Inc.All rights reservedIDENTIFIER(const
33、Str)constStr:an expression that can be evaluated as a string before query runs.Applies to Most identifiers in a query,or DML statementfunction/column/table name Subject of many DDL statementsALTER/CREATE/DROP Subject of auxiliary statements USE30Parameterizing namesParameterizing namesUsing the IDEN
34、TIFIER clauseUsing the IDENTIFIER clause2024 Databricks Inc.All rights reservedDECLAREDECLARE table_name=names;DECLAREDECLARE col_name =name;DECLAREDECLARE func_name =count;SELECTSELECT IDENTIFIERIDENTIFIER(func_name)(IDENTIFIERIDENTIFIER(col_name)FROMFROM IDENTIFIERIDENTIFIER(table_name)ASAS tWHERE
35、WHERE IDENTIFIERIDENTIFIER(t.|col_name)=Anthony;result131Parameterizing namesParameterizing namesUsing session variablesUsing session variables2024 Databricks Inc.All rights reservedSELECTSELECT IDENTIFIERIDENTIFIER(:func_name)(IDENTIFIERIDENTIFIER(:col_name)FROMFROM IDENTIFIERIDENTIFIER(:table_name
36、)ASAS tWHEREWHERE IDENTIFIERIDENTIFIER(t.|:col_name)=Anthony;result132Parameterizing namesParameterizing namesUsing named parametersUsing named parameters2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved33SQL ScriptingSQL Scripting2024 Databricks Inc.All rights reservedS
37、QL is nice but I need python for procedural stuffNot anymoreNot anymoreSupport for control flow,iterators&error handlingNatively in SQLControl flow IF/ELSE,CASELooping WHILE,REPEAT,ITERATEResultset iterator FORException handling CONTINUE/EXITParameterized queries EXECUTE IMMEDIATEFollowing the SQL/P
38、SM standard34SQL ScriptingSQL ScriptingIts SQL,but with control flow!Its SQL,but with control flow!Private Preview2024 Databricks Inc.All rights reservedI have a common column in many tables that has a spelling error and I want to rename it in all tables.35SQL Scripting SQL Scripting-real world exam
39、plereal world exampleRenaming all columns with spelling errorsRenaming all columns with spelling errorscolour2024 Databricks Inc.All rights reserved Use the information schema-Fetch all tables in desired catalog and schema-and store them into arraySELECTSELECTarray_agg(table_name)FROMFROM INFORMATIO
40、N_SCHEMA.columnsWHEREWHERE column_name=oldColName36SQL Scripting SQL Scripting-real world examplereal world exampleHow do I find all my tables?How do I find all my tables?2024 Databricks Inc.All rights reservedIterate with WHILE loopWHILE WHILE i array_size(tableArray)DODO.END WHILE;END WHILE;SQL Sc
41、ripting SQL Scripting-real world examplereal world example37Loop through the tablesLoop through the tables2024 Databricks Inc.All rights reservedBut you cannot rename column in VIEWsSolution:NEED IF branchIF IF tableType!=VIEWTHENTHENELSE ELSE-its a viewEND IF;END IF;SQL Scripting SQL Scripting-real
42、 world examplereal world example38Conditional logic to special case viewsConditional logic to special case views2024 Databricks Inc.All rights reservedFinally we need to construct alter statement based on table and column names.Solution:EXECUTE IMMEDIATEEXECUTE IMMEDIATEEXECUTE IMMEDIATEALTER TABLE|
43、tableName|RENAME COLUMN|oldColName|TO|newColNameSQL Scripting SQL Scripting-real world examplereal world example39Dynamically generate SQLDynamically generate SQL2024 Databricks Inc.All rights reserved-parametersDECLAREDECLARE oldColName=ColoUr;DECLAREDECLARE newColName=color;BEGINBEGINDECLAREDECLAR
44、E tableArray ArrayArray;DECLAREDECLARE tableType STRINGSTRING;DECLAREDECLARE i INTINT=0;DECLAREDECLARE alterQuery STRINGSTRING;SETSET tableArray=(SELECTSELECT array_agg(table_name)FROMFROM INFORMATION_SCHEMA.columnsWHEREWHERE column_name COLLATECOLLATE UNICODE_CI=oldColName);SQL Scripting SQL Script
45、ing-real world examplereal world example40Glueing it all together!Glueing it all together!WHILEWHILE i array_size(tableArray)DODOSETSET tableType=(SELECTSELECT table_typeFROMFROM INFORMATION_SCHEMA.tablesWHEREWHERE table_name=tableArrayi);IFIF tableType!=VIEW COLLATECOLLATE UNICODE_CITHENTHENSETSET
46、alterQuery=ALTER TABLE|tableArrayi|RENAME COLUMN|oldColName|TO|newColName;EXECUTEEXECUTE IMMEDIATEIMMEDIATE alterQuery;ENDEND IFIF;SETSET i=i+1;END WHILEEND WHILE;ENDEND;2024 Databricks Inc.All rights reservedLambda functionsSQL UDFString collationIn Private PreviewNamed Parameter MarkersSQL Session
47、 variablesIDENTIFIER clauseEXECUTE IMMEDIATESQL ScriptingIn Private PreviewSummarySummary41Databricks supports interesting SQL features with many more to comeDatabricks supports interesting SQL features with many more to come2024 Databricks Inc.All rights reserved42Q&AQ&APrivate Preview Signup formPrivate Preview Signup formhttps:/forms.gle/qXMG2NKj3DbHg1Lh7