《Unlock Insights—Similarity Search Algorithm with Python and Oracle Database 23ai [LRN3407].pdf》由會員分享,可在線閱讀,更多相關《Unlock Insights—Similarity Search Algorithm with Python and Oracle Database 23ai [LRN3407].pdf(39頁珍藏版)》請在三個皮匠報告上搜索。
1、Unlocking Insights:Exploring Similarity Search with Python and Oracle Database 23aiVeronica DumitriuAnthony Tuininga September,2024Oracle CloudWorld Copyright 2024,Oracle and/or its affiliates1Anthony TuiningaTechnical AdvisorVeronica DumitriuPrincipal Product ManagerOracle CloudWorld Copyright 2024
2、,Oracle and/or its affiliates2SpeakersTodays AgendaIntroduction to AIVector EmbeddingsVector DatabasesSimilarity Search TechniquesReal-world ApplicationsDemoOracle CloudWorld Copyright 2024,Oracle and/or its affiliates3IntroductionAI conceptsOracle CloudWorld Copyright 2024,Oracle and/or its affilia
3、tes4Nov22:chatGPTApr23:BedrockDec23:PaLM&Gemini2024:Diffusion 3,Vlogger,Claude 3,Devin AI2018:GPT-1 Jan-Mar23:Llama,ChatGPT-4,BardJun23:Cohere2024:Oracle 23ai and GenDev5Oracle CloudWorld Copyright 2024,Oracle and/or its affiliatesLLM&GenAI TimelineGenAI=AI systems capable of creating content,includ
4、ing text,images,and video,by learning from existing dataUsers input prompts to guide the AI in tailoring the output AI tools can integrate with existing presentation or document softwareLLM(Large Languages Models)Conversational AI,data-driven responses,billions of parameters for engaging content tra
5、ined with extensive public vocabularies Specialized Use Cases:not exact if the context of specialized use cases Build domain-specific models and data or fine-tune models and train them on business-specific dataImport LLMs to enrich users data promptsVectors=numerical representations of unstructured
6、dataVector Embeddings=models that transform data into mathematical equationsVector Database =stores vectors and feeds vector data to language models Vector Search=enables large-scale similarity searches rather than only searches for exact matchesbased on semantics,not on precise match/keywordsa fast
7、 and effective way to discover the needed data for LLMsConceptsIntroductionOracle CloudWorld Copyright 2024,Oracle and/or its affiliates6Vector Embeddings-Oracle CloudWorld Copyright 2024,Oracle and/or its affiliates7Vectors in AIOracle CloudWorld Copyright 2024,Oracle and/or its affiliates8 a seque
8、nce of numbers,called dimensions,used to capture the important”characteristics of the data represents the semantic content of data generated using embedding modelsVector1623819254Vectors in AIOracle CloudWorld Copyright 2024,Oracle and/or its affiliates9broccoliegg-plantgarlicceleryzucchinionioncatk
9、ittendogpuppywolflionMaineVermontNew Yorklettuced1d2dinosaurtigerEmbeddings for ImagesOracle CloudWorld Copyright 2024,Oracle and/or its affiliates10Convert an image to a vector embedding and search vector databaseCompare these against other vectors in the database to find similar content0.5,1.5,2.6
10、,-1.1,Generating EmbeddingsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates11Use an embedding modelText documentsNatural LanguageEmbedding ModelImagesImage Embedding ModelText Vector TableidvectorImage10.5,1.5,2.6,-1.1,21.0,0.9,1.6,-1.3,30.6,1.1,1.3,-0.9,Image Vector Tableidvectortext10
11、.3,0.2,1.8,-6.7,“Delaware is a state in the United States”21.3,5.3,0.4,-8.3,“Effects of the immune system and common anti-inflammatory drugs on emotions”31.8,0.3,0.1,-1.1,“The VECTOR data type is a homogeneous array of 8-bit signed integers.”Vector Databases-Oracle CloudWorld Copyright 2024,Oracle a
12、nd/or its affiliates12Role of Vector Databases with LLMsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates13Help address the hallucination problem inherent in LLM responsesAugment prompt with enterprise-specific content to produce better responsesAvoid exceeding LLM token limits by using
13、the most relevant contentBroad range of data from the internetSnapshot of data from a point in timePrivate enterprise dataCurrent data-frequently updatedLLMVector DatabaseBetter business outcomesOracle as a Vector DatabaseOracle CloudWorld Copyright 2024,Oracle and/or its affiliates14AI Vector Searc
14、h capabilities are supported in 23aiHandle vector and other workloads in the same databaseQuery business data alongside AI vector searchDesigned to be simple to use and easy to understandNew VECTOR data type for storing vector embeddingsNew SQL syntax and functions express similarity search with eas
15、eNew Approximate search indexes packaged and tuned for high performance and quality 7149Oracle AI Vector SearchEmbedding Models with Oracle DatabaseOracle CloudWorld Copyright 2024,Oracle and/or its affiliates15Embedding models within Oracle DatabaseImport pre-trained or own embedding models into Or
16、acle Database if compatible with the Open Neural Network Exchange(ONNX)standardUse the DBMS_DATA_MINING package which has been augmented to import ONNX embedding modelsUse VECTOR_EMBEDDING()operator to leverage ONNX models to generate vector embeddings for unstructured data within the databaseEmbedd
17、ing models outside of the Oracle Databasepre-trained open-source embedding models proprietary embeddings modelsOracle Vector Data TypeOracle CloudWorld Copyright 2024,Oracle and/or its affiliates16New VECTOR data type VECTOR(,)create table ecomm_catalog(id number,image BLOB,img_vec VECTOR(235,FLOAT6
18、4)Format for dimension values can be INT8,BINARY,FLOAT32,FLOAT64Other examples for Vector declaration(flexible number of dimensions and format type)create table ecomm_catalog(id number,image BLOB,img_vec VECTOR);Why is this useful?Embedding models are changing with technology,but your schema can sta
19、y as isOracle Vector OperationsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates17VECTOR_DISTANCE(VECTOR1,VECTOR2,)Using metrics like Euclidean,cosine similarity,Manhattan,etc.VECTOR_AVG(VECTOR)E.g.,a sentences vector is computed as the average vector of all wordsVECTOR_DIMENSIONS_COUNT(
20、VECTOR)Count the number of dimensions in the vectorVECTOR_NORM(VECTOR)Compute the Euclidean norm/length of a vectorMany other vector operations Similarity Search Techniques-Euclidean distance,Manhattan distance,Hamming,Cosine,Dot productOracle CloudWorld Copyright 2024,Oracle and/or its affiliates18
21、Oracle CloudWorld Copyright 2024,Oracle and/or its affiliates19Vector Search a critical component of GenAI developmentability to make unstructured data discoverablelocate similar data points among potentially billions enables large-scale similarity searches rather than only searches for exact matche
22、sno need to define any item characteristicsKeyword SearchVector SearchSearch ApproachMatch exact keywordsFind related objects sharing similar characteristicsAmbiguityStruggles with synonyms/ambiguous languageUses ML to handle synonyms/ambiguous languageRelevanceGreat for precise queriesBetter for fu
23、zzy queriesSpeed/ScalabilityFast-using indexes Slower and less scalableStruggles due to complex query calculations Search QualityDepends on exact keyword matchingUses semantic relationships,Better for broad queriesVector Distance Functions and Operations Oracle CloudWorld Copyright 2024,Oracle and/o
24、r its affiliates20Vector Distance MetricsVector distance:F(,distance_metric)Distance Metrics:Euclidean&Non-Euclidean Distances:Euclidean and Euclidean Square DistancesManhattan DistanceCosine SimilarityDot Product SimilarityHamming SimilaritySimple Vector SearchOracle CloudWorld Copyright 2024,Oracl
25、e and/or its affiliates21PipelineEmbedding ModelInput DataObjectData Object RetrievalVector ID MatchesRelevant contentVector SearchembeddingWhat were looking forText description,image,pareAI Vector SearchOracle CloudWorld Copyright 2024,Oracle and/or its affiliates22SQL SyntaxSimple Syntax for exact
26、 search(Optional EXACT keyword in FETCH clause)Select elementID from vector_table order by vector_distance(data,:query_vector)fetch exact first 20 rows onlySimple Syntax for approximate search(Requires APPROXIMATE keyword in FETCH clause)Select elementID from vector_table order by vector_distance(da
27、ta,:query_vector)fetch approximate first 20 rows only using vector indexes for high speed but trades-off accuracy inmemory neighbor graph vector index neighbor partition vector indexMulti-vector Similarity Search(using grouping criteria as PARTITIONS)AI Vector SearchOracle CloudWorld Copyright 2024,
28、Oracle and/or its affiliates23Similarity Search Over JoinsTEACHERCOURSECOURSE MATERIALReturn the top 5 courses containing material similar toquality of LLM-generated textsand the teacher is teaching Undergrad level at MIT UniversitySELECT courseID FROM teacher,course,course_materialWHERE teacher.tea
29、cherID=course.teacherIDAND course.coursed =course_material.courseIDAND course.type.=UndergradAND teacher.university=MITORDER BY vector_distance(coursemat,:queryMat)FETCH APPROXIMATE FIRST 3 ROWS ONLY;Real-World ApplicationsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates24Use CasesOracl
30、e CloudWorld Copyright 2024,Oracle and/or its affiliates25Similarity SearchImage and video retrievalMedical image analysisE-commerce product recommendationsContent-Based FilteringPersonalized recommendationFind e-commerce items from imageNatural Language ProcessingText Classification and ClusteringS
31、QL generationData AnalyticsPattern recognitionAnomaly detectionComputer VisionObject detectionFace recognitionFacial expressions analysisBiomedical ResearchGene/DNA similarity analysisMolecular structure searchGeographic Information SystemsSpatial analysisMap renderingIndustrial ApplicationsQuality
32、controlPredictive maintenanceMachinery malfunctionpython-oracledbOracle CloudWorld Copyright 2024,Oracle and/or its affiliates26Need to KnowOracle Database Driver for PythonRenamed,major upgrade of cx_OraclePython 3.8 3.13New,default Thin mode:no Oracle Client librariesRuntime choice to use Thick Mo
33、deSingle step install 20MBSupport new platforms:Alpine,Apple M1/M2,IoTDual Apache 2 or UPL open source licenseBinary module for performancePython Database API V2 supportpython-oracledb driverPythonpython-oracledb 2.3 driver support for Vector data type in both Thin and Thick modes:bindfetchVector su
34、pport in python-oracledb Vectors in python-oracledbOracle CloudWorld Copyright 2024,Oracle and/or its affiliates271.Python array.array()values -bound directlyint8(signed 8-bit integers,type code”b”),binary(unsigned 8-bit integer,type code“B”,groups of 8 dimensions),float32(float,type code“f”),float6
35、4(double,type code”d”)2.Python lists 3.NumPy ndarrays4.Use constant oracledb.DB_TYPE_VECTOR as typeVector data columns are represented as:Query VECTOR column:cursor metadata FetchInfo.type_code attribute contains constant oracledb.DB_TYPE_VECTORNew Attributes for VECTOR column:FetchInfo.vector_dimen
36、sionsnumber of dimensions of the VECTOR column None,for non-VECTOR columns or flexible number of VECTOR columns FetchInfo.vector_formatoracledb.VECTOR_FORMAT_INT8 -for 8-bit signed integer numbersoracledb.VECTOR_FORMAT_BINARY -for 8-bit unsigned integersoracledb.VECTOR_FORMAT_FLOAT32 -for 32-bit flo
37、ating point numbersoracledb.VECTOR_FORMAT_FLOAT64 -for 64-bit floating point numbers None -for non-VECTOR columns or flexible storage formatVector supportVectors in python-oracledbOracle CloudWorld Copyright 2024,Oracle and/or its affiliates28ExamplesVectors in python-oracledbOracle CloudWorld Copyr
38、ight 2024,Oracle and/or its affiliates29CREATE TABLE vector_table(v32 vector(3,float32),v64 vector(3,float64),v8 vector(3,int8);import array vector_data_32=array.array(f,1.625,1.5,1.0)vector_data_64=array.array(d,11.25,11.75,11.5)vector_data_8=array.array(b,1,2,3)cursor.execute(insert into vector_ta
39、ble(v32,v64,v8)values(:1,:2,:3),vector_data_32,vector_data_64,vector_data_8)Insert FLOAT32,FLOAT64 and INT8 vectorsFetch FLOAT32,FLOAT64 and INT8 vectorscursor.execute(select*from vector_table)for row in cursor:print(row)Output(array(f,1.625,1.5,1.0),array(d,11.25,11.75,11.5),array(b,1,2,3)DEMOOracl
40、e CloudWorld Copyright 2024,Oracle and/or its affiliates30E-commerce Product RecommendationsGoalsConnect customers to their perfect product based on an image of their desired itemFactors Considered in the AnalysisImages of e-commerce catalog products(i.e.:shorts,shoes,purses,t-shirts,etc.)Desired it
41、em picture uploadThe embedding model used to vectorize the e-catalogSimilarity search distance metric for calculating the distance to most similar products based on the desired itemResultsMore personalized product suggestions from the e-catalogActionable Insights to Expedite DecisionsE-commerce Prod
42、uct RecommendationsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates31Image Similarity SearchOracle CloudWorld Copyright 2024,Oracle and/or its affiliates32Embedding ModelQuery ImageRetrieve item thumbnails for top 6 matchesTop 6 similar rowsTop 6 similar items displayedAI Vector Searche
43、mbeddingcompareRelational Predicates(optional)E-commerce DemoE-commerce Product RecommendationsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates331.Create Oracle Database 23ai schema2.Create a Python environment3.Establish a connection from Python to Oracle Autonomous Database using the
44、python-oracledb driver4.Create a table with a VECTOR column5.Insert vectorized images in the VECTOR column6.Run similarity search for the query image to get TOP N matchesActionable Insights to Expedite DecisionsOracle CloudWorld Copyright 2024,Oracle and/or its affiliates34 Enhance the shopping expe
45、rience by simplifying the search process and increasing customer satisfactionHelpful Resources/Additional ReadingOracle CloudWorld Copyright 2024,Oracle and/or its affiliates35Oracle Database 23ai Vector Searchpythonpython-oracledb APIoracledb APIpython-oracledb on GitHubPythonPython https:/www.pyth
46、on.org/OtherOtherNumPyPillowTensorpython-oracledb Learning ResourcesOracle CloudWorld Copyright 2024,Oracle and/or its affiliates36Visit the sites below to learn morepython-oracledb Homepagepython-oracledb DocumentationGitHub Source Code RepositoryTry Oracle LiveLabs Started with Python and Oracle D
47、atabase LiveLab37Oracle CloudWorld Copyright 2024,Oracle and/or its affiliatesTry Everythingfor FREEAI Solutions HubOracle LiveLabsOracle Database F these steps to earn your free swag!1.Scan the QR code and complete our survey2.Visit us at the Demogrounds3.Show your survey confirmation4.Claim your FREE SWAG!We value your feedback!Oracle CloudWorld Copyright 2024,Oracle and/or its affiliates38Veronica Dumitriu,Principal Product MThank youAnthony Tuininga,Technical A