《董菲_AI Workshop.pdf》由會員分享,可在線閱讀,更多相關《董菲_AI Workshop.pdf(46頁珍藏版)》請在三個皮匠報告上搜索。
1、Copyright 2023 PingCAP.All Rights Reserved.TiDB AI 實戰講師:董菲 微信:utopiadf001Copyright 2023 PingCAP.All Rights Reserved.我們今天要做什么?一個智能聊天機器人!Copyright 2023 PingCAP.All Rights Reserved.我們會用到什么?TiDB ServerlessLlamaIndexTiDB Vector StoreTiDB lab controlAWS EC2智譜 LLMDSPyCopyright 2023 PingCAP.All Rights Reser
2、ved.訪問 TiDB lab control請打開瀏覽器范訪問:Copyright 2023 PingCAP.All Rights Reserved.注意:如果郵箱名的 字符前是全數字,則請在郵箱名前面加一個小寫的x訪問 TiDB lab control(微信報名并且提供郵箱)密碼統為:Copyright 2023 PingCAP.All Rights Reserved.注意:請使用社區用戶名,如果用戶名中含有中文,請用漢字拼音首字母代替漢字,字母和符號不變即可。例如:開源軟件好-KYRJH癡迷_TiDB-CM_TiDB七夕_快樂-QX_KLKYRJH訪問 TiDB lab control(
3、TUG 社區報名未提供郵箱)密碼統為:tidbworkshop810Copyright 2023 PingCAP.All Rights Reserved.訪問 TiDB lab control選擇:使用 TiDB 向量搜索增強你的 AI 應用點擊:創建實驗Copyright 2023 PingCAP.All Rights Reserved.訪問 AWS EC2 注意:請按照您的操作系統來選擇Copyright 2023 PingCAP.All Rights Reserved.我們從第 1 個項目開始請選擇:演示1Copyright 2023 PingCAP.All Rights Reserve
4、d.訪問 AWS EC2 點擊:復制命令行即可!Copyright 2023 PingCAP.All Rights Reserved.訪問 TiDB Serverless點擊:復制命令行即可!Copyright 2023 PingCAP.All Rights Reserved.感謝 TiDB Serverless BranchingCopyright 2023 PingCAP.All Rights Reserved.感謝 TiDB Serverless BranchingTiDB Serverlessbranch for user 1branch for user 2branch for us
5、er N.Copyright 2023 PingCAP.All Rights Reserved.TiDB Serverless 中的向量Copyright 2023 PingCAP.All Rights Reserved.關于 Vector embeddingSource:將對象(text、Image、Video.)轉化為數字.參考:https:/ 2023 PingCAP.All Rights Reserved.TiDB Serverless 向量的距離ManhattanEuclideanCosinetidb select VEC_L1_DISTANCE(0,0,3,4);+-+|VEC_L
6、1_DISTANCE(0,0,3,4)|+-+|7|+-+tidb select VEC_COSINE_DISTANCE(1,1,-1,-1);+-+|VEC_COSINE_DISTANCE(1,1,-1,-1)|+-+|2|+-+tidb select VEC_L2_DISTANCE(0,3,4,0);+-+|VEC_L2_DISTANCE(0,3,4,0)|+-+|5|+-+Copyright 2023 PingCAP.All Rights Reserved.TiDB Serverless 中的向量檢索Copyright 2023 PingCAP.All Rights Reserved.我
7、們需要理解的是.內容(content)文本/聲音/圖像/視頻.應用(application)Embedding 模型vector Embedding返回結果搜索輸入1.0,1.0,1.0Copyright 2023 PingCAP.All Rights Reserved.請進入第 2 個項目請選擇:演示2Copyright 2023 PingCAP.All Rights Reserved.TiDB Serverless 中的 Vector indexingCopyright 2023 PingCAP.All Rights Reserved.向量索引(vector index)用于提升向量檢索的
8、效率支持向量的近似查詢-Approximate Nearest Neighbor(ANN)在創建表時候就要選擇 vector indexCopyright 2023 PingCAP.All Rights Reserved.TiDB Serverless 中的 HNSWHNSW 來自于 跳表(probability skip list)和 NSW(Navigable Small World)https:/ pointquery vectornearest neighborsource:Copyright 2023 PingCAP.All Rights Reserved.TiDB Serverle
9、ss 中的 HNSWentry pointquery vectornearest neighborlayer 2layer 1layer 0Copyright 2023 PingCAP.All Rights Reserved.請進入第 3 個項目請選擇:演示3參考:https:/ 2023 PingCAP.All Rights Reserved.第一代聊天機器人研發https:/ Serverless聊天機器Copyright 2023 PingCAP.All Rights Reserved.LlamaIndex 會作為我們開發機器人的框架。使用 LlamaIndex Copyright 20
10、23 PingCAP.All Rights Reserved.第一代聊天機器人研發-TiDB vector storetidbvec=TiDBVectorStore(connection_string=tidb_connection_url,table_name=llama_index_rag_test,distance_strategy=cosine,vector_dimension=1536,drop_existing_table=False,)初始化 TiDB vector store:tidb-chat.pyCopyright 2023 PingCAP.All Rights Reser
11、ved.第一代聊天機器人研發-RAG 框架聊天機器人框架搭建:#連接到 TiDB serverless 數據庫獲得 TiDB Vector Store Indextidb_vec_index=VectorStoreIndex.from_vector_store(tidbvec)#獲得 query enginequery_engine=tidb_vec_index.as_query_engine(streaming=True)mand()def start_chat():while True:#輸提示 question=click.prompt(Enter your question)#基于 R
12、AG 的模型檢索 response=query_engine.query(question)#返回結果 click.echo(response)參考:https:/docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_guide/tidb-chat.pyCopyright 2023 PingCAP.All Rights Reserved.#獲得上下(context),可能包含 Document、Index、Vector、KVstorage_context=StorageContext.from_defaults(vec
13、tor_store=tidbvec)第一代聊天機器人研發-生成 vector embeddingdef prepare_data(url):#讀取 urls 中的配置 documents=SimpleWebPageReader(html_to_text=True).load_data(url,)#將轉化為 TiDB vector index tidb_vec_index.from_documents(documents,storage_context=storage_context,show_progress=True)urls=https:/ vector embedding,并將它們存儲在
14、 TiDB 向量存儲中。tidb-chat-v2.pyCopyright 2023 PingCAP.All Rights Reserved.最后:評價實驗在實驗后,請給 lab-control 免費實驗平臺打分,感謝幕后支持的工程師們Copyright 2023 PingCAP.All Rights Reserved.請進入第 4 個項目請選擇:演示4Copyright 2023 PingCAP.All Rights Reserved.第二代聊天機器人研發https:/ Serverless聊天機器Copyright 2023 PingCAP.All Rights Reserved.第代聊天機
15、器研發-構建知識圖譜使 DSPy 庫進節點及邊的定義class Entity(BaseModel):List of entities extracted from the text to form the knowledge graph name:str=Field(description=Name of the entity,it should be a clear and concise term )description:str=Field(description=(Description of the entity,it should be a complete and compreh
16、ensive sentence,not few words.Sample description of entity TiDB in-place upgrade:Upgrade TiDB component binary files to achieve upgrade,generally use rolling upgrade method )class Relationship(BaseModel):List of relationships extracted from the text to form the knowledge graph source_entity:str=Fiel
17、d(description=Source entity name of the relationship,it should an existing entity in the Entity list )target_entity:str=Field(description=Target entity name of the relationship,it should an existing entity in the Entity list )relationship_desc:str=Field(description=(Description of the relationship,i
18、t should be a complete and comprehensive sentence,not few words.Sample relationship description:TiDB will release a new LTS version every 6 months.)class KnowledgeGraph(BaseModel):Graph representation of the knowledge for text.entities:ListEntity=Field(description=List of entities in the knowledge g
19、raph )relationships:ListRelationship=Field(description=List of relationships in the knowledge graph )定義了 Entity、Relationship 和 KnowledgeGraphbuild-graph.pyCopyright 2023 PingCAP.All Rights Reserved.第代聊天機器研發-構建知識圖譜TiDBApache 2.0TiDB Data Migration(DM)PingCAPEntity:Entity:Entity:Entity:build-graph.pyC
20、opyright 2023 PingCAP.All Rights Reserved.第代聊天機器研發-構建知識圖譜TiDBrelationshipApache 2.0TiDB Data Migration(DM)PingCAPEntity:Entity:Entity:Entity:relationshiprelationshipbuild-graph.pyCopyright 2023 PingCAP.All Rights Reserved.第代聊天機器研發-加知識圖譜(Graph)#初始化智譜 AIzhipu_ai_client=dspy.OpenAI(model=glm-4-0520,api
21、_base=https:/ DSPydspy.settings.configure(lm=zhipu_ai_client)#從 Wiki 中載原始數據wiki=WikipediaLoader(query=“TiDB).load()#將 Wiki 中的原始數據展開(Extract)為知識圖譜pred=extractor(text=wiki0.page_content)#成知識圖譜對象(clean knowledge_graph)knowledge_graph=clean_knowledge_graph(pred.knowledge)#將知識圖譜成個 html 件interactive_graph
22、(knowledge_graph)#存儲圖譜(Graph)到 TiDB Serverlesssave_knowledge_graph(knowledge_graph)build-graph.pyCopyright 2023 PingCAP.All Rights Reserved.第代聊天機器研發-使智譜 AImand()def start_chat():while True:question=click.prompt(Enter your question”)#將提問 text 轉化為 vector embedding.question_embedding=get_query_embeddin
23、g(question)#從 TiDB Serverless 中獲取知識圖譜(Grap)entities,relationships=retrieve_entities_relationships(question_embedding)#結合知識圖譜訪問智譜模型獲取結果 result=generate_result(question,entities,relationships)click.echo(result)test-graph.pyCopyright 2023 PingCAP.All Rights Reserved.第代聊天機器研發-使智譜 AI#將提問 text 轉化為 vector
24、embedding.def get_query_embedding(query:str):#準備訪問智譜 AIzhipu_ai_client=ZhipuAI(api_key=os.getenv(“ZHIPUAI_API_KEY)#調智譜模型將提問 text 轉化為 vector embedding response=zhipu_ai_client.embeddings.create(model=embedding-2,input=query,)return response.data0.embeddingtest-graph.pyCopyright 2023 PingCAP.All Right
25、s Reserved.第代聊天機器研發-讀取知識圖譜#從 TiDB Serverless 中獲取知識圖譜(Grap)def retrieve_entities_relationships(question_embedding,)-(ListDatabaseEntity,ListDatabaseRelationship):with Session(engine)as session:entity=(session.query(DatabaseEntity).relationships=(session.query(DatabaseRelationship).for r in relationsh
26、ips:entities.update(r.source_entity.id:r.source_entity,r.target_entity.id:r.target_entity,)return entities.values(),relationshipstest-graph.pyCopyright 2023 PingCAP.All Rights Reserved.第代聊天機器研發-使智譜 AI#結合知識圖譜訪問智譜模型獲取結果def generate_result(query:str,entities,relationships):#準備訪問智譜 AI zhipu_ai_client=ZhipuAI(api_key=os.getenv(ZHIPUAI_API_KEY)entities_prompt=n.join(map(lambda e:f(Name:e.name,Description:e.description),entities)relationships_prompt=n.join(map(lambda r:fr.relationship_desc,relationships)#向智譜模型提問 response=zhipu_ai_pletions.cr