《AI基礎設施聯盟:AI代理、大語言模型和智能應用指南:深度探索下一代新興AI堆棧、提示工程、開源與閉源模型等領域(2023)(英文版)(54頁).pdf》由會員分享,可在線閱讀,更多相關《AI基礎設施聯盟:AI代理、大語言模型和智能應用指南:深度探索下一代新興AI堆棧、提示工程、開源與閉源模型等領域(2023)(英文版)(54頁).pdf(54頁珍藏版)》請在三個皮匠報告上搜索。
1、Agents,Large Language Models,and Smart AppsAI Infrastructure ReportTable of contentsAgents,Large Language Models,and the New Wave of Smart Apps03The New Stack for Intelligent Apps08Vector Databases35Advanced Design of AI-Driven Apps48The Future and Where It All Ends Up52Building AI-Driven Apps13The
2、Fine Art of Prompt Engineering14Agent Frameworks28LangChain29The LLMs Themselves41Open Source Models43Challenges with Running Open Source Models in Production44LlamaIndex30Haystack31Semantic Kernel32The Future of Frameworks34From Prompting to Tuning213Agents,Large Language Models,and Smart Apps01Age
3、nts,Large Language Models,and the New Wave of Smart AppsWhen you hear the word agent,you might think of 007 or Jason Bourne.They can fight crime with one hand and down a Martini with the other,and always look stylish doing it.But since the release of ChatGPT,weve seen an explosion of a new kind of a
4、gent.AI agents are intelligent programs that can interact autonomously or semi-autonomously with their environment.Actually,the definition of agents is still evolving at the moment.Traditionally,an agent is defined as that tries to achieve its goals in the digital or physical world,or both.Its got s
5、ensors that“see,”“hear,”and“sense”its environment.It has“actuators,”a fancy word for the tools it uses to interact with the world,whether thats an LLM using an API the way we use our hands and fingers,a robotic gripper picking up trash,or a self-driving car sensing the environment with LIDAR.But Lar
6、ge language models(LLMs)like ChatGPT and GPT-4,based on the ultra-popular,changed what is possible with Agent capabilities.For the first time they give us little“brains”that are capable of performing a wide range of tasks,from planning and reasoning to answering questions and making decisions,which
7、were impossible with earlier models.However,an LLM has a number of well known flaws,such as,which essentially boils down to making things up,ingesting the biases of the dataset it was trained on,all the way to having confidence in wrong answers because of a lack of grounding.Grounding means that the
8、 model cant link the text its generating to real-world knowledge.For example,it may not know for a fact that the world is round and so occasionally hallucinates that its flat.Despite these imperfections,LLMs remain powerful tools.We asked GPT-4 a logic teaser question and it gave us the right answer
9、 out of the gate,something that smaller LLMs struggle with badly and that no handwritten code can deal with on its own without knowing the question in advance.any autonomous softwareTransformer architecturehallucinations4Agents,Large Language Models,and Smart AppsA recent report from on emerging LLM
10、 stacks sees agents as purely autonomous pieces of software.This means that they can plan and make decisions totally independent of human intervention.At the (AIIA),we define agents a bit differently.We see them as both semi-autonomous software,with humans making some of the decisions(aka humans in
11、the loop),and fully autonomous systems too.We also think its essential for people to understand that an agent is not usually a singular,self-contained piece of software,such as an LLM itself.We hear the word agent,and it calls to mind a complete entity that is self-contained,mostly because we anthro
12、pomorphize them and think of them as human,since people are the only benchmark we have for true intelligence.Andreessen HorowitzAI Infrastructure AllianceUsually,agents are a system of interconnected software pieces.The from a Microsoft research team outlines a common and practical approach to moder
13、n agents where an LLM uses other models,like an image diffuser(e.g.,or a coding model,like,to do more advanced tasks.It may also use APIs the way we use our hands and legs.It uses these tools as an extension to control outside software or interact with the world.To achieve this,an LLM might train on
14、 its own API knowledge as part of its dataset or a fine-tuned dataset,or it might use another external model explicitly trained on APIs,like.At the AIIA,we see an agent as any software system that interacts with the physical or digital world and can make decisions that usually fell in the realm of h
15、uman cognition in the past.HuggingGPT paperStable Diffusion XL)WizardCoderGorillaWe call semi-autonomous agents,Centaurs.These are intelligent pieces of software with a human in the loop.Agents are fully autonomous or almost fully autonomous pieces of software that can plan and make complex decision
16、s without human intervention.We can think of a centaur as“Agents on rails,”a precursor to fully autonomous agents.Centaurs can accomplish complex tasks as long as theyre well-defined with clear guardrails and as long as someone is checking their work or intervening at various steps along the way.Age
17、nts are fully autonomous and can do their jobs with no human intervention.5 Agents,Large Language Models,and Smart AppsA good example of the levels of autonomy in agentic systems comes from the world of self-driving cars and is beautifully laid out in the book by Kai-Fu Lee and Chen Quifan.Autonomou
18、s systems are classified by the Society of Automotive Engineers into Level 0(L0)to Level 5(L5):AI 2041L0(zero automation)means that the person does all the driving,but the AI watches the road and alerts the driver against potential problems,such as following another car too closely.L1(hands on)means
19、 that the AI can perform a specific task,like steering,as long as the driver is paying close attention.L2(hands off)means that the AI can perform multiple tasks,like braking,steering,accelerating,and turning,but the system still expects the human to supervise and take over when needed.L3(eyes off)me
20、ans that the AI can take over all aspects of driving but still needs the human to be ready to take over if something goes wrong or when the AI makes a mistake.L4(mind off)is where the AI can take over driving completely for an entire trip,but only on well-defined roads and in well-developed environm
21、ents that the AI understands very well,like highways and city streets that have been extensively mapped and surveyed in high definition.L5(steering wheel optional)means that no human intervention is required at all,for any roads or environment and you dont need to have a way for humans to take over,
22、hence the steering wheel optional”.We can think of L0 to L3 as nothing but an extra option on a new car,like air conditioning,leather seats,or cruise control.They still need humans at the wheel.These are centaurs in that they need humans in the loop,like most agents today.For example,most people wou
23、ld be reluctant to let an agent compose an email to their boss or their mother without reading it before sending it.However,by the time we get to L4,the intelligence behind the car starts to feel like a true intelligence with a mind of its own,and it will have a massive impact on society.L4 cars or
24、buses might be public transports that take specific public routes confidently,while an L5 car or truck might do deliveries at all hours of the day or be a robot taxi like Uber that can take you anywhere.Since the release of GPT-3 and GPT-4,weve seen a number of attempts to build fully autonomous L5-
25、style agents for the digital world,such as and.Programmers have looked to leverage LLMs to take actions like planning complex software requirements,booking plane tickets based on user requests,picking up presents for a birthday party,or planning recruitment in a company.Unfortunately,they mostly don
26、t work yet for long-term planning,reasoning,and execution of complex tasks.We imagine AI systems that can come up with a comprehensive marketing plan for a new product,write and create a website,craft all outreach messages,get the list of people to reach out to,and then send the emails to get new cu
27、stomers.We are not there yet,but that doesnt mean we wont at some point.With so many traditional programmers getting into machine learning and applying ideas that data scientists and data engineers wouldnt think of,because these are outside their domain knowledge,were seeing constant improvements co
28、nstantly in these systems.Fully autonomous agents could be a ubiquitous facet of all our lives in the near future or over the next decade.Many of these fully autonomous projects have sparked tremendous public interest.AutoGPT racked up GitHub stars faster than almost any other project in history,but
29、 we can no longer take GitHub stars as a true measure of software prowess.The tremendous public interest in AI is often driven by sci-fi novels and Hollywood blockbusters,rather than the actual current state of the technology.Such outside interest sometimes drives brand new projects to the GitHub st
30、ar stratosphere,only to see actual developer interest in these projects crumble soon after.This happens when cognitive dissonance sets in and the software doesnt end up matching peoples expectations of a super-intelligent software avatar,like the AI in the fantastic movie.BabyAGIAutoGPTHer6Agents,La
31、rge Language Models,and Smart AppsStill,some of those projects continue to attract ardent followers who continue to add new capabilities with AI software such as BabyAGI.Not only that but reasoning and planning for agents continues to evolve with software and research projects that incorporate new t
32、echniques to help LLMs think better,such as (CoT)prompting,or giving them a history that they can recall with projects like the from a Stanford team,which.chain-of-thoughtGenerative Simulacra“.extends a large language model to store a complete record of the agents experiences using natural language,
33、synthesize those memories over time into higher-level reflections,and retrieve them dynamically to plan behavior.”Stanford Team“.extends a large language model to store a complete record of the agents experiences using natural language,synthesize those memories over time into higher-level reflection
34、s,and retrieve them dynamically to plan behavior.”Stanford TeamDespite all these techniques,agents still struggle with going off the rails and hallucinating and making major mistakes in their thinking,especially as the time horizon for independent decision-making increases.Short-term,on-the-rails re
35、asoning is often sound,but the longer the agents have to act and make decisions on their own,the larger are their chances of breaking down.Even with all these limitations and caveats,why have agents suddenly gotten more powerful?The answer is simple.ChatGPT was a watershed moment in computing and AI
36、 history that shocked outsiders and insiders alike.Suddenly,we had a system that delivered realistic and free-flowing conversations on any subject at any time.Thats a radical departure from the past where chatbots were brittle and not even vaguely human.The first chatbot,was created in the 1960s at
37、MIT.Weve had Clippy,the famous paperclip in Microsoft Office products in the late 90s and early 2000s,which was known for being slow and virtually useless at answering any questions at all.Weve had Alexa and Siri,which can do things like play songs or answer questions by doing lookups in a database.
38、But none of these have really worked all that well.ChatGPT and GPT-4 just feel different.ELIZAThats because most of these bots of the past were often brittle rule-based systems.They were glorified scripts that were triggered based on what you said or wrote.They couldnt adapt to you or your tone or w
39、riting style.They had no real context about the larger conversation youd had with them.They felt static and unintelligent.Nobody would mistake them for humans.The architecture of GPT-4 is a secret,although we know it is based on transformers.Wed had speculation that its a massive transformer with a
40、trillion parameters or that its not a big model at all but 8 smaller models,known as a,which leverages a suite of smaller expert models to do different tasks.Whatever the actual architecture of the model is,which we will only know when it is officially made public,it is more powerful and capable tha
41、n any other model on the market and remains the highest watermark as of the time of this writing.Even models like Metas open source marvel,which was launched a year later,cant replicate its performance,though they approach it.That said,its only a matter of time before other teams create a more power
42、ful model.By the time you read this report,the arms race to create ever more powerful models by open source teams like and or any of the proprietary companies piling up GPUs to build their own models,like,and,may already have produced such a model.With more powerful software brains powering todays a
43、gents,we have much more powerful systems at hand.Theyre the engine that drives centaurs and agents to much more useful capabilities.Unlike the relatively limited capabilities of enterprise (RPA)of the past,which were typically limited to well-defined processes and structured data,we have agents and
44、AI-driven applications that can work in the unstructured world of websites,documents,and software APIs.These agents can summarize websites with ease,understand whats going on the text,offer an opinion,act as language tutors and research assistants,and much more.Mixture of Experts(MoE)Llama 2Eleuther
45、AIMetas AI research divisionGoogle Anthropic Cohere Inflection Aleph Alpha MistralAdeptrobotic process automation7Agents,Large Language Models,and Smart AppsIts really only the beginning.ChatGPT was the starting point but not the end game.Since GPT,weve had a surge of capable open source models.Hugg
46、ing Face tracks these models with an open suite of tests on a leaderboard for models.It seems like every week a new open source model takes the crown.Weve seen Metas and,along with,and not to mention specialized models like,which specializes in working with APIs.Venture capital is pouring into found
47、ation model companies so that they can spin up massive supercomputers of GPUs.OpenAI attracted over 10B USD in investments,and recently Inflection AI announced 1.3B USD in funding to create a 22,000 strong Nvidia H100 cluster to train their latest models.With all this capital,OpenAI will not remain
48、the only game in town.At the AIIA,we expect a massive flurry of capable models to power the intelligence apps of today and tomorrow.Agents offer a potentially new kind of software thats beyond the capabilities of traditional hand-coded software written by expert programmers.The power of these LLMs a
49、nd the middleware thats rising up around them makes it possible for very small teams to build highly capable AI-driven applications with one to ten people.Its an extension of the,where a small team of 50 developers was able to reach 300M people with their application because they could leverage an e
50、ver-increasing stack of sophisticated prebaked software to build their platform,everything from readymade UIs to secure encryption libraries.The power of LLMsalong with a suite of models that do a particular task very well,like,and,coupled with a new generation of middlewareis making it possible for
51、 even smaller teams to reach a wider audience.The bar to building great software has lowered again,and history shows that whenever that happens,we see a flurry of new applications.Its also possible to build smaller,more focused apps now,like a bot that can ingest a series of legal documents and answ
52、er questions about what jurisdictions a company might face lawsuits in,an app that can research a massive number of companies and tell you which ones are good for your marketing team to contact,or an app that can ingest news articles,write summaries of them and create a newsletter.Stacking these age
53、nts together holds the potential of creating intelligent microservices that can deliver new kinds of functionalities.With state-of-the-art LLMs behind the scenes that are broadly capable of powering agents whose sensory input from keystrokes,web pages,code,external models,and knowledge repositories,
54、we now have agents that can do things we only saw in the movies,like automatically upscaling photos and adding hidden or missing details to them or reasoning about what is on a web page or in a PDF document and making complex decisions.An old writers trope in every detective show is where the police
55、 officers find some grainy VHS footage and their computer team enhances that footage to get the next big clue in the case.That was impossible before,but now we have systems that are a lot like where Detective Deckard,played by Harrison Ford,takes an old photo,puts it into an analysis machine,talks t
56、o the machine and tells it what to do,and the machine enhances the photo to bring out the hidden spots.Weve gone from only robotics researchers and data scientists building agents to traditional programmers building agents to do complex tasks that were,only a short time ago,impossible with handwritt
57、en code and heuristics.Despite all these amazing new capabilities,none of this is without its challenges.LLMs are nondeterministic systems,and they dont always behave or act in a way thats predictable.A traditional handwritten piece of software can only fail in so many ways.If we have a subroutine t
58、hat logs a user into a website,there are only so many ways it can go wrong.But LLMs and other models can produce wildly unpredictable results from task to task.A diffusion model like Stable Diffusion XL might excel at creating photorealistic portraits but fail miserably at making a cartoon-style pai
59、nting of a cute robot.Even worse,because these systems are so open-ended,there is no real way to test all of the possibilities that someone might use them for on a given day.One user might ask an LLM simple questions about how to make a good dinner for their wife,another might try to trick it into r
60、evealing security information,while still another might ask it to do complex math.Wrangling these systems to create useful software is an ongoing challenge.So lets dive in and look at the promises and perils of LLMs,generative AI,and agents for both small business and enterprises.Well start with the
61、 new emerging stack thats enabling people to create these applications,and then move on to challenges every company or person looking to adopt or build these systems might face.open sourceLLaMALlama 2Vicuna Orca FalconGorillaWhatsApp effectSAM(segment anything model)Stable Diffusion Gen1Gen2the Blad
62、erunner sceneUp Next:The New Stack for Intelligent Apps8Agents,Large Language Models,and Smart Apps02The New Stack for Intelligent AppsIn 2022,the AIIA published a report on,which functions at the data science level of AI development.It was created for data scientists,data engineers,and systems admi
63、nistrators to handle the complex challenges of gathering and cleaning data,labeling it,training models,and deploying them into production.the state of MLOps softwareThe MLOps industry was created by various engineers and data scientists whod done machine learning projects at big software companies l
64、ike Google,Meta,Airbnb,and Amazon.These companies helped take machine learning out of the universities and bring it into the commercial enterprise.Because these teams were working at the cutting edge of a new kind of software development,they had to build all their software infrastructure from scrat
65、ch to support those efforts.Many of those engineers learned those lessons and then spun out companies of their own to solve those problems for traditional enterprises that wanted to leverage the power of machine learning in their own businesses.The basic premise of the MLOps revolution was the under
66、lying assumption that every business would have a fleet of 100 or 1000 data scientists and data engineers and be doing advanced machine learning,training their own models from scratch and deploying these models to production.That world is looking less and less likely now.While many advanced companie
67、s do train their own proprietary models,we are increasingly seeing people move“up the stack”to deal with machine learning at a higher level of abstraction.This fits well with the pattern of history and technology where we“abstract up the stack,”which means we hide away the complexity of something pr
68、eviously complex,which allows more people to do that thing well.Hammers and nails make it easier to build houses of multiple stories,as do precut boards of standard sizes.The LAMP stack,where we had Linux,Apache,MySQL,and PHP,made it easier for people to build complex websites.But it still wasnt eas
69、y and required a tremendous amount of programming and design expertise.Later,we had WordPress built on top of that stack,which brought in an even wider array of people who could build simple websites much more easily without any programming expertise.Later we had complex themes like,which made desig
70、ning a beautiful website incredibly easy.The same thing is now happening in the world of AI.Were moving up the stackwhere most teams needed to gather a dataset,clean it and label it,and then train a model from scratch,test it,deploy it,and run itto a world where most teams will take a base model or
71、foundation model created by an outside team and deploy it as in,fine tune it,or simply call to it via API as it runs on a cloud service.Divi9Agents,Large Language Models,and Smart AppsWere moving from a world where data scientists train their own models from scratch for every use case to one where f
72、oundation models and base models are becoming the default.More teams are connecting via API to proprietary models like and GPT-4,or are using open source models as a base rather than building their own models.If these teams can use the models with no retraining whatsoever,all the better,but if not,t
73、hen with some instruct tuning or fine tuning,the model is ready to do the job.At the moment,were seeing a flood of fine tuning companies that help speed up the process of rapidly sculpting a prebaked model to your needs,many of which are quite good.However,its likely that in only a few years time,we
74、ll abstract up the stack even further and fine tuning wont be an essential step for most teams either.As soon as a team can pick between teaching a prebaked medical model with a few custom examples versus the more complex and time-consuming task of fine tuning,they will jump at the chance to take th
75、e path of least resistance.The release of ChatGPT marked a sea change,shifting us from pure data science to the dawn of AI-driven applications.Increasingly were seeing many traditional coders and applied AI engineers build these apps either without data science teams or with smaller supporting data
76、science teams.Supporting these new applications are traditional and nontraditional infrastructure companies,GPU cloud providers,foundation model providers,open source model makers,NoSQL,vector and traditional databases,model hosting and serving companies,fine tuners,and more.So lets take a look at t
77、his emerging stack.But before we go further,we need to keep one thing clearly in mind.The keyword here is“emerging”stack.At the beginning of any technological shift,we see a massive eruption of new tools,frameworks,and ideas.Most of them die on vines.Others grow and become incredibly popular,only to
78、 get swept away later by better emerging technologies.Lets take the example of Docker.As Docker caught on,multiple companies and projects rushed to build large-scale Docker management engines.Docker itself created the ill-fated application and other management projects.VMWare built a proprietary man
79、ager which also worked with their virtual machines.Mesos grew very popular,very fast.What do all these applications have in common?Theyre all dead or mostly dead.All of them were replaced by,which became the default way to manage large clusters of containers.Even VMWare,which vowed never to adopt th
80、e platform and continue pushing forward with its own proprietary engine,eventually jettisoned its offering and went all in on Kubernetes.Kubernetes itself was a.It came from inside Google,which had a decade of using pre-Docker containers and managing them at scale.Google built Borg in 20032004 and t
81、hen followed that up with Omega,its successor,in 2013.Finally,they created Kubernetes,an open source successor to Omega that learned from all the hard lessons of a decade of managing containers.This is how technology evolves.You cant solve problems before they happen,and when you solve one problem,n
82、ew,unexpected problems emerge.The most robust frameworks are the ones that learn the lessons of the past,deliver incredibly meaningful abstracts of common tasks,are tremendously scalable,and attract a constellation of plugins and support tools around themselves.At the AIIA,we fully expect much,if no
83、t all,of the stack to evolve over the coming decade,as developers work more and more with AI and learn the problems and pitfalls they need to overcome.When it comes to generative AI and agents,several major components are emerging as part of the stack and a secondary set of components that are still
84、 coming to fruition.ClaudeSwarmKubernetesthird-generation softwareLLMsTask-specific ModelsFrameworks/Libraries/AbstractionsExternal Knowledge RepositoriesDatabases(Vector,NoSQL,Traditional)Front Ends(APIs,UI/UX)10Agents,Large Language Models,and Smart AppsBeyond the basics of the software stack,ther
85、e are several critical components that this stack gets built on top of and cant live without:AI ChipsApplication Infrastructure HostingWere also seeing the rise of a secondary set of components that will likely become more important over time.AI App HostingFine Tuning PlatformsMonitoring and Managem
86、entMiddleware(Security)DeploymentFinally,we see some missing components that simply dont exist yet that well talk about later.Lets start with the key components.First up are LLMs,which are the key to this new kind of application.Theyve emerged as the brains of the applications,and they use tools to
87、do tasks and make decisions in a semi-autonomous or fully autonomous way.They are the only truly general-purpose models,capable of a wide range of tasks,from question answering,to summarizing,to text generation and multi-media generation,to logic,reasoning,and more.Its crucial to note that they almo
88、st certainly wont remain the brains of AI forever as researchers discover new kinds of architectures and ways to model or mimic intelligence better.For now,theyre the workhorses and the best generalized form of intelligence weve ever created.But they are not perfect and need external tools to really
89、 get the job done.These tools come in many forms.Now lets look at the constellation of frameworks,tools,code,and infrastructure you need to make LLMs work in production.The first is the code itself.It could be traditional,handwritten code by programmers,automatically generated code created by the LL
90、M on the fly,code generated collaboratively by the programmer and the model,or any combination of the above.The second major component is frameworks,like,or,which abstract common tasks like fetching and loading data,chunking the data into smaller bites so that they can fit the LLMs context window,fe
91、tching URLs,reading text,receiving prompts from the user,and more.These libraries might also be more focused and niche,like from Metas AI research lab,which speeds up vector and semantic searches.Most of these are Python libraries that have evolved over time to be more comprehensive in their capabil
92、ities and turned into“frameworks,”which are a more comprehensive set of libraries or tools.These frameworks do the heavy manual lifting for an agent,such as fetching and streaming data back and forth and taking in and outputting prompts and answers.Of course,not everyone uses these frameworks and li
93、braries.and developers often either love it or hate it.Its very popular with coders who are figuring out how to work with language models.However,many advanced developers still prefer to write their own abstractions or libraries at this point.We expect that to change over time as AI applications bec
94、ome more ubiquitous and we get better abstractions in many different libraries.Today,it would be almost absurd to write your own Python web scraping library when you can use or to write your own scientific and numerical calculator instead of using.It is incredibly rare for anyone not to use well-wri
95、tten and performant libraries to save time and avoid reinventing the wheel.The third major component is task-specific models.LLMs are not good at everything,but app designers can augment their capabilities by calling other models that are built to do a specific task really well.These are models that
96、 do one thing and one thing only well.was trained from the ground up on APIs,so it interacts with APIs incredibly well.is a well-known open source image generator.delivers automatic speech recognition.excels at code interpretation and development.LangChain LlamaIndex HaystackSemantic KernelFaissLang
97、Chain itself is controversialScrapyNumPyGorillaStable DiffusionWhisperWizardCoder11Agents,Large Language Models,and Smart AppsThe fourth major component is external knowledge repositories,like,which uses symbolic logic and various algorithms to give clear,structured answers to specific kinds of ques
98、tions,like doing math or giving the correct population of Papua New Guinea.LLMs are notorious for making up information.They basically work by predicting what the next word should be when responding to someone,but that prediction may change if they have a low threshold of confidence in their answer.
99、They also dont have the ability to say“I dont know”or“I dont feel confident,”so they just make up something that sounds plausible but might be total nonsense.External knowledge repositories ground the model in the real world,giving them facts and data they can pull from to give clear,crisp,precise a
100、nswers.The fifth major component is databases:traditional SQL-style databases,like Postgres;NoSQL databases,like;and vector databases,like Pinecone.A program may use any or all three types of databases.Vector databases are new for many organizations and developers.There are a flurry of them hitting
101、the market,like,(which combines the concept of a data lake and vector database in one),and.They dont store information directly but encode it as vectors.This delivers some unique advantages,like the ability to cluster data that are roughly similar and find them through semantic searches.That lets de
102、velopers to use it as a kind of long-term memory of LLMs,because it can retrieve similar prompts and answers without finding exact matches.It could also be used to cluster all similar functions in a code repository for easy search and retrieval.NoSQL databases excel at large-scale document managemen
103、t that would be hard or impossible to store in traditional databases.That lets developers load up huge unstructured data repositories of documents,like legal archives or web articles.Finally,we have the old workhorses of the database world,row-and column-based databases like Postgres,which store sim
104、pler information that can be extracted during the application work.An app might read lots of documents in a NoSQL database,only to extract the key learnings and put those bite-size learnings into a vector database.Of course,its not as cut and dried as Postgres,which can be adapted to store vectors w
105、ith projects like.We expect that many teams will simply opt for a database that can store traditional data and vectors as long as this overlay becomes highly performant and well developed.Time will tell whether we need specialized vector databases or a database that combines functions on a single di
106、stributed and scalable platform.Next up,we have front ends.These come in two traditional flavors and are already a mainstay of web applications and mobile apps.The first is APIs,and frameworks like and have rapidly gained mindshare as a way to build quick and responsive APIs to interact with any int
107、elligent applications that developers put together.APIs maintain a consistent way for third-party programs to interface with a program or platform without having to go through an ever changing front web page.UI frameworks are intended to help developers rapidly put together a usable front-end GUI fo
108、r an application.Here,were seeing a repurposing of popular and powerful web app frameworks,as there is not much need to reinvent the wheel.Developers are building with,and,to name a few.Without AI chips,none of this can run.By far,the most dominant player is,with their A100 and H100 lines.Rivals lik
109、e recently rolled out their Instinct MI200 series to compete with H100s from Nvida.Weve also seen big techs roll out their own custom training and inference chips,with Google delivering their advanced or tensor processing units and Azure and Amazon rolling out their own chips as well.Weve also seen
110、new chip creators focused totally on AI with brand new architectures,like and,and various edge computing engines.Traditional infrastructure providers and cloud providers,like AWS,Google,and Azure,have raced to build architectures that allow people to quickly spin up GPUs/TPUs/ASICs.Theyre also spinn
111、ing up new software infrastructure to support AI,plus they already have key infrastructure in place for web applications that end up serving AI apps as well,like Docker and Kubernetes,software load balancers,highly available database clusters,and routing.Were also seeing datacenter providers focused
112、 completely on GPU infrastructure at scale,like.Together,all these components make up the core stack of next-generation applications.But the story doesnt end there.Some of the components,particularly AI Middleware,are still emerging at the time of this reports writing.We expect this new suite of add
113、itional components to take off in the coming years,as more and more AI applications come to market and face new challenges,from scalability to security to compliance.Lets turn to these now.AI App hosting platforms run inference for AI-driven applications that have their own models and host the vario
114、us databases and frameworks needed to run those applications.Typically,these are containerized workloads.We see traditional cloud providers in the mix here,like Azure,Amazon,and Google,as well as GPU specialization companies,like and.Wolfram AlphaRedisPinecone Chroma DB Weaviate ActiveloopFeatureBas
115、e Qdrant MilvusVectarapgvectorFastAPIExpress.js React Next.js Vue.jsFlutterNvidiaAMDTPUsCerebrasGraphcoreCoreWeaveCoreWeaveRunpod12Agents,Large Language Models,and Smart AppsMiddleware is any software that helps the program function or that manages some aspect of the applications functioning.Its a b
116、road category of applications that sits adjacent to or in the middle of applications to give them additional powers,protect the applications,or keep them on the rails.It could be a security and logging system,prompt versioning,authentication,security,document ingestion and processing,or more.An exam
117、ple is,which can enforce correctness on LLM outputs.Security deserves its own subcategory of middleware here.As these apps grow and proliferate,they will become one of the primary attack surfaces.A demonstrated automated attacks against LLMs that jailbreak them and cause them to ignore their safegua
118、rds to give out information like how to make a bomb.A great example of security middleware is,which hardens an application against prompt injection attacks.The Bosch team has AIShield and a secondary product that focuses on protecting LLMs from attacks called,which can act as a wrapper around LLMs t
119、o prevent disclosure of PII or other jailbreaks and attacks.We expect there to be a robust suite of middleware tools that mirror the capabilities of the enterprise and cloud era of computing.For instance,these days,its easy to find fantastic antivirus software thats highly advanced and good at stopp
120、ing new and emerging threats.Software like ESET antivirus uses rules,heuristics,and neural nets to stop viruses dead in their tracks.We expect to see the emergence of a robust set of agent-specific protection systems that stop prompt injections and other emerging attacks,as well as tools that monito
121、r logic errors and more.Fine tuning platforms help people take base models,feed them additional training data,and tune them to a specific use case.We see traditional platforms for training and creating models from scratch,like,(purchased by),(purchased by HPE),along with big cloud MLOps stacks like
122、Amazons suite,Googles,and Microsofts running fine tuning pipelines,as well as newer companies like,and,which specialize in fine tuning LLMs.We expect more and more companies to offer foundation models that can be easily fine-tuned and for the process to become much more automated and swift.Monitorin
123、g and management tools have been with us from the dawn of MLOps and are starting to pivot to include meaningful metrics for LLMs(such as hallucination detection or conversation logging),and we have companies like,and (which include an LLM benchmarking tool and building tools to monitor smart apps be
124、tter),(for data quality management),and.Were also starting to see the beginning of benchmarks coming from the open source community.The to test agent capabilities on various tasks.We expect to see many more in the coming months and years.Deployers are the last category,which include any software tha
125、t deploys prototype or production apps to the cloud or to serverless backend providers.A good example is,which deploys baked LLM apps to any major cloud and looks to stay agnostic to the cloud backend.It could also be an agent-based builder platform like,which looks to wrap together app development
126、and deployment in one place.Model deployers like,and fit in here,too,as many teams use a suite of custom models,in addition to LLMs,to get the job done.Now that weve provided a high-level overview of the various parts of the emerging stack for AI-driven applications and agents,lets zero in on some o
127、f the unique aspects of this layer of the stack.Guardrailsrecent research reportRebuffAIShield.GuArdIanClearML HPEs Ezmeral MosaicMLDatabricksPachydermAnyscaleSageMaker VertexAzure Machine LearningHumanloop Entry Point AIScales LLM fine tunerArize WhyLabs TruEraArthurInfuse AIs PipeRiderManot AIAuto
128、GPT team just released a benchmarkSkyPilotSteamshipSeldon MakinaRocks RunwayIguazioModzyUp Next:Building AI-Driven Apps13Agents,Large Language Models,and Smart Apps03Building AI-driven apps is more akin to traditional programming versus the pure data science that we see in the MLOps world.While MLOp
129、s software is built for data scientists and data engineers to develop and train models from scratch,the AI-driven app layer is a higher level of abstraction where prompt engineering,traditional coding and systems deployment,and monitoring and management all play outsized roles.Perhaps the biggest di
130、fference at this layer of the stack is that most teams will never train a model from scratch.Instead,theyll take an existing model or models and try to use them as is or attempt to instruct them or fine tune to their needs.Only if there is no model that does exactly what they want or if they have a
131、highly unique dataset will teams train their own models.These new models tend to be used in concert with existing base or foundation models to augment their capabilities.For instance,a company like,which generates beautiful blog posts from educational videos with perfectly formatted code,uses both G
132、PT models and their own proprietary model to complete the task.We can look at the ways of building AI-driven applications anchored by an LLM in order of complexity and cost as outlined by machine learning pioneer.ContendaAndrew Ng in his blog at the Batch“Prompting.Giving pretrained LLM instructions
133、 lets you build a prototype in minutes or hours.Earlier this year,I saw a lot of people start experimenting with prompting,and that momentum continues unabated.Several of our teach best practices for this approach.without a training setshort coursesOne-shot or few-shot prompting.In addition to a pro
134、mpt,giving the LLM a handful of examples of how to carry out a task the input and the desired output sometimes yields better results.Fine-tuning.An LLM that has been pre-trained on a lot of text can be fine-tuned to your task by training it further on a small dataset of your own.The tools for fine-t
135、uning are maturing,making it accessible to more developers.Pretraining.Pretraining your own LLM from scratch takes a lot of resources,so very few teams do it.In addition to general-purpose models pretrained on diverse topics,this approach has led to specialized models like BloombergGPT,which knows a
136、bout finance,and Med-PaLM 2,which is focused on medicine.”Building AI-Driven AppsBuilding AI-driven apps is not a matter of simply picking the right stack.There are a number of key differences between AI-driven applications and traditional applications,such as prompt engineering,not to mention the u
137、nique parts of the AI app stack that dont usually exist in classical applications,such as vector databases.14Agents,Large Language Models,and Smart AppsThe Fine Art of Prompt EngineeringMost teams will simply try to prompt an existing model to get what they want out of it.Its the fastest way to get
138、an application up and running.The quality of the prompts,understanding of what you want,and asking for it correctly all go into the art and science of prompting.Many people are under the mistaken belief that prompting is easy and anyone can do it.At times,it is a challenge to get what you want from
139、these models.You only need to look at the wide range of outputs from diffusion models like Midjourney and Stable Diffusion to see that some people are much stronger at prompting than others.One of the key aspects of prompting is knowing what you want and how to ask for it.A great example comes from
140、the AIIA itself,when we were building an app to.Initially,the prompts were written by our lead coder,but the resulting articles were stilted,stiff,and essay-like.The problem was one of domain knowledge.Our coder is not a professional writer who can sling words as easily as she can code.We rewrote th
141、e prompts with the help of a writer who knew what to ask for because of his domain knowledge about writing.He included things like making the writing colloquial and using contractions,as well as varying the paragraph sizes so that they dont resemble a college essay with big blocks of text,which make
142、s the eye tired and makes the reader drift off.These prompts delivered much better writing,which was easier to massage into a workable newsletter,whereas the original prompts would have delivered a baseline that required extensive rewriting.There are two variants of basic prompting:write our newslet
143、terZero-shot prompting(aka basic prompting)Few-shot promptingLarge-scale language models,such as GPT-4,have the ability to achieve zero-shot learning.It sounds fancier than it is,but it basically means that the model is smart enough to understand examples or tasks beyond its original training so tha
144、t you dont need to do anything special to make it understand what you want or need.In essence,the model can understand examples that are not part of its training data.Zero-shot prompting means that you just give the model a descriptive prompt or context that provides guidance on the task.Its really
145、nothing more than vanilla prompting.To understand this concept better,lets break down the terminology:Zero-shot:This term originates from the classification domain of machine learning.In zero-shot classification,a model is expected to correctly classify new,unseen classes without having been explici
146、tly trained on them.The term“zero-shot”in the context of models like generative pretrained transformers(GPT)means that youre trying to get the model to perform a task it hasnt been directly fine-tuned on.Prompting:This refers to the use of a carefully crafted input or series of inputs to guide a mod
147、el into producing a desired output.In the context of language models like GPT,this often means providing a clear,descriptive sentence or paragraph that instructs the model on what is expected in its response.When it comes to LLMs,zero-shot prompting can allow someone to present the model with tasks
148、or questions without any prior fine tuning on that specific task,relying solely on the general pretrained knowledge of the model and the guiding influence of the prompt.For instance,if you want to use GPT-3 for a math problem it is not specifically trained on,you might prompt it with a clear stateme
149、nt like“Solve the following algebraic equation.”and provide the equation.Even though GPT-3 is not specifically tuned for this math problem,its vast training data and understanding of language and context enable it to attempt the problem based on the given prompt.15Agents,Large Language Models,and Sm
150、art AppsIn practice,achieving the desired results can sometimes require an iterative refinement of the prompts to better guide the models outputs.This has led to research and techniques on effective prompting and few-shot learning,where a few examples are provided to help steer the models behavior.N
151、ow lets say that you want the model to do sentiment analysis,where the model labels text as positive,negative,or neutral.In traditional machine learning,you might take a recurrent neural net(RNN)and train it to take an input paragraph and classify its output.However,if you add a new class to the cla
152、ssification,such as asking the model not only to classify the outputs but also to summarize them,youd need to retrain the model or youd need a new model.In contrast,an LLM doesnt need retraining to do a new class of tasks.You could ask the model to classify the sentiment of a paragraph and summarize
153、 it.You could ask it to tell you whether the sentiment of the paragraph is positive or negative,because the LLM has learned the meaning of those words.Take this example from GPT-4:We instructed GPT-4 with exactly what we wanted,leveraging its deep understanding of words like negative and positive,an
154、d got great results.We even gave it a challenging example that many classical models struggled to understand:“The player is on fire.”Thats a hard one to understand because if the model takes it to mean that the player is actually burning,then its most definitely negative.Instead,the model recognizes
155、 the common sports idiom“the player is on fire,”meaning that the player is hot and doing extremely well,as positive.16Agents,Large Language Models,and Smart AppsHowever,zero-shot prompting may fail as app designers run up against the limits of the models pretrained capabilities.For example,a model m
156、ay be able to do some basic reasoning consistently but fail badly at more complex and long-range reasoning.To get better results,people may move up to few-shot prompting,where they provide a number of key examples to the model.It still relies on the models pretrained knowledge but guides the model t
157、o better understand what you want from it.Few-shot prompting gives the model a few examples(or“shots”)of a task to guide its response for a new instance of that task.The idea is that,by seeing a small number of examples,the model can better understand and generalize the task at hand.For instance,con
158、sider the task of converting a sentence from the present tense to the past tense.Heres an example of few-shot prompting with GPT:Prompt:Convert the following sentences from present tense to past tense:“I play basketball.”-“I played basketball.”“She watches a movie.”-“She watched a movie.”“They swim
159、quickly.”-“They swam quickly.”Convert:“He plays video games.”Expected Response:He played video games.In the above example,the three provided conversions serve as the“few shots”to help the model understand the desired transformation.By seeing these examples,we expect the model to infer the right patt
160、ern and correctly transform the new sentence from present to past tense.Few-shot prompting can be useful in situations whereA specific task or domain might not have been a large focus during the models original training,so a few examples help nudge the model in the right direction.Youre unsure of ho
161、w to phrase a prompt for best results in zero-shot mode.Providing examples can clarify the task for the model.There are also a slew of more advanced prompting examples that continue to come to us from researchers trying different methodologies and putting out papers to teach them to others.of some o
162、f these concepts,but more are coming out seemingly every week as more and more people figuring out how to get LLMs to reason much better.We wont detail them all here,but lets look at a few examples,like Chain of Thought(CoT)prompting.OpenGenus has a good breakdown17Agents,Large Language Models,and S
163、mart AppsCoT prompting comes to us from Wei et al.s(2022)paper,titled“.”It delivers more complex reasoning capabilities through prompting via intermediate steps.This yields better results for more difficult tasks that require reasoning before spitting out an answer.Chain-of-Thought Prompting Elicits
164、 Reasoning in Large Language Models(Image Source:Wei et al.,2022)(Image Source:)Wei et al.,2022In the CoT example,the logical steps to get to the answer are explained to the model(in the blue highlighted section on the right)so that it can follow that logic and then use similar logic to solve a simi
165、lar type of problem better.Self-consistency from aims to improve the reasoning of the CoT technique.The paper notes that it“replaces the greedy decoding strategy used in chain-of-thought prompting,”which is a convoluted way of saying that it basically prompts the model multiple times and then picks
166、the answer that is most frequentin essence,averaging out wrong answers.This seems to work better for math and common-sense reasoning.The site has a good example.a paper by Wang et al.(2022)Learn Prompting(Source:Learning Prompting)(Source:)Learning Prompting18Agents,Large Language Models,and Smart A
167、pps(Source:Learning Prompting)(Source:)Learning PromptingAveraging out the wrong answer(NOT IMPORTANT)produces more consistent results.Unfortunately,it also slows down the response time and creates more round-trip time to cloud-based models and more cost to the developer,so we hope that such techniq
168、ues wont be necessary in more advanced models in the coming years.Some more advanced models are detailed in the by AIIA COO Mariya Davydova.Weve reproduced portions of her analysis here,with her permission,but you are encouraged to read the entire blog for more detailed information and a wider range
169、 of examples.Lets have a closer look at them here.Centaur Life blogMulti-Model ApproachesSome teams have replaced a single LLM with multiple models or LLMs,so that they can reason together or share information.These techniques are promising,but of course,raise the cost and round-trip time for the de
170、velopers once more,so you need a strong reason to pursue these approaches.The first approach uses multiple agents to work together on a complex problem.A number of papers have detailed this approach:Improving Factuality and Reasoning in Language Models through Multiagent DebateLM vs LM:Detecting Fac
171、tual Errors via Cross-ExaminationDERA:Enhancing Large Language Model Completions with Dialog-Enabled Resolving AgentsThe basic idea is to get multiple LLMs into a dialogue to overcome the shortcomings of reasoning or giving truthful or fact-based answers in one of them.19Agents,Large Language Models
172、,and Smart AppsThe most straightforward technique to grasp is the multi-agent debate(MAD)from the paper.The idea is as follows:get two or more equivalent agents to address the same question,exchange information,and polish their answers based on the insights from their peers.Each round has all agents
173、 answers shared among themselves,nudging them to revise their prior responses until they eventually hit a consensus.So,essentially,each agent puts in the same effort here.Liang et al.(2023)(Source:Improving Factuality and Reasoning in Language Models through Multiagent Debate)(Source:)Improving Fact
174、uality and Reasoning in Language Models through Multiagent Debate20Agents,Large Language Models,and Smart AppsAnother approach uses the cross-examination technique;we have two types of LLMs:the examiner and the examinee.The examinee poses a question,and the examiner follows up with additional questi
175、ons,eventually deciding if the examinees response is correct.from the Cohen et al.(2023)paper(Source:LM vs LM:Detecting Factual Errors via Cross Examination)(Source:)LM vs LM:Detecting Factual Errors via Cross Examination21Agents,Large Language Models,and Smart AppsFrom Prompting to TuningIf the mod
176、el isnt delivering the kinds of results you want,you may need to move to more advanced techniques,such as fine tuning a model.The base model may have a tremendous range of capabilities,but fine tuning the model on a finely curated dataset that provides a wealth of new examples can dramatically impro
177、ve the models performance on a specific task.is a repository of fine-tuned models for the open source Stable Diffusion model family.The model is fine-tuned on images of Legos,and it does remarkably better at creating good Lego models versus the baseline Stable Diffusion 1.5 model.It allows people to
178、 output complex Lego designs,whereas the base model struggles with those designs.Fine tuning can apply to any kind of base model.For LLMs,we might want to train the model to write in a specific style,so we create or curate a dataset that fits that style to teach the model to better emulate it.We mig
179、ht want to build stronger medical or legal knowledge in a model,and so we train it on a well-designed dataset of the specific examples were interested in to get much stronger results in that domain.CivitaiLegoAIThen theres the dialog-enabled resolving agents(DERA)approach,from where we distribute sl
180、ightly different roles.In this case,we have a decider LLM,whose mission is to complete the task(in this medically oriented paper,it is making a diagnosis based on patient data),and a researcher LLM,which debates with the decider LLM to tweak the solution.Their dialog resembles less of an exam and mo
181、re of a thoughtful exchange between two professionals.Nair et al.(2023),(Source:DERA:Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents)(Source:)DERA:Enhancing Large Language Model Completions with Dialog-Enabled Resolving AgentsWhats so appealing about the multi-agent m
182、ethod is that its easy to program and highly adaptable to domain-specific applications.The last paper uses it for medical advice,but we could easily see these agents engaging in spirited discussions on a host of subjects,from legal cases and historical contexts to marketing studies and system archit
183、ecture.At the AIIA,we encourage teams to start simply and only move to more advanced techniques if their application requires it.Its also essential that teams take into consideration the round-trip time and costs for hosting a model or token-based pricing.A multi-model approach,or an approach that a
184、verages response,can really drive up the response time and token-based costs.If your team is hosting a model that you created yourself or an open source foundation model,then you may have a fixed cost,and it might then be worth maximizing this fixed cost by using the multi-model or multi-prompt appr
185、oach.22Agents,Large Language Models,and Smart AppsFine tuning requires significantly more skills and time.There is an art to fine tuning.Go too far,and you can kill the original model or cause it to overfit your specific examples.It may require many iterations of labeling data,curating examples,and
186、training and testing a model before you start to get results that can be generalized to the real world.There are numerous approaches to fine tuning a model,as detailed in this blog by:Label StudioTransfer Learning-Transfer learning is a widely used methodology in fine-tuning,where the knowledge gain
187、ed from one task is utilized to solve a different but related task.This approach reduces the need for extensive data and computational power,as the model can leverage the pre-existing understanding of language and patterns.Transfer learning is particularly effective when the new task shares similari
188、ties with the task the model was initially trained on,allowing for efficient adaptation and improved performance.Sequential Fine-Tuning-Sequential fine-tuning involves training a model on multiple related tasks one after the other.This approach enables the model to understand nuanced language patter
189、ns across various tasks,enhancing performance and adaptability.Sequential fine-tuning is advantageous when there are multiple related tasks that the model needs to learn,as it allows for accumulating knowledge and fine-tuning specific aspects of language understanding.Task-Specific Fine-Tuning-Task-
190、specific fine-tuning aims at adapting the pre-trained model to excel at a particular task.Although this approach requires more data and time,it can lead to high performance on the task.Task-specific fine-tuning focuses on optimizing the models parameters and architecture to enhance its capabilities
191、in a targeted manner.This methodology is particularly valuable when a specific tasks performance is paramount.Multi-Task Learning-Multi-task learning involves simultaneously training a model on multiple tasks.This approach improves generalization and performance by leveraging shared representations
192、across different tasks.The model learns to capture common features and patterns,leading to a more comprehensive language understanding.Multi-task learning is most effective when the tasks are related,and the shared knowledge enhances the models learning and adaptability.Adapter Training-Adapter trai
193、ning is a methodology that enables fine-tuning a specific task without disrupting the original models performance on other tasks.This approach involves training lightweight modules that can be integrated into the pre-trained model,allowing for targeted adjustments.Adapter training is a great option
194、when the need to preserve the original performance of the pre-trained model is high,providing flexibility and efficiency in adapting to task-specific requirements.However,when it comes to AI-driven applications,teams tend to focus on a smaller subset of fine tuning methods that are geared toward LLM
195、s in particular.These include three major branches of fine tuning:Instruct TuningAlignment TuningAdapter Training,aka Parameter Efficient Fine TuningLets take a look at each in detail.Instruct TuningThis is perhaps the most popular method of fine tuning a base LLM.It is often used right out of the g
196、ate to get the model to respond to questions in a more human-like way or to perform specific tasks much better.For instance,Meta released their Llama 2 model as both a and an instruct-tuned“”version that makes it much better at holding conversations,answering questions,and the like.This approach has
197、 been widely applied to state-of-the-art LLMs,such as and,and continues to gain traction with new providers.base modelchatInstructGPTGPT-423Agents,Large Language Models,and Smart AppsThe concept is simple in theory but challenging in practice because it requires gathering a number of high-quality ex
198、amples.Model-tuning teams create or source a large dataset of example question-and-answer responses.A great example is the,which was authored by more than 5,000 Databricks employees between March and April 2023.It was based on the ideas from OpenAI and their that outlined the method for creating the
199、 InstructGPT model that was trained on a dataset comprising 13,000 demonstrations of instruction-following behavior.They wanted to train the model to be better at Q&A,extracting information from Wikipedia,brainstorming,classification,and creative writing,among other things.describing the project,the
200、y listed the following instructions for their employees:Databricks Dolly 15K datasetpaper On their blogOpen Q&A:For instance,Why do people like comedy movies?or What is the capital of France?In some cases,theres not a correct answer,and in others,it requires drawing on knowledge of the world at larg
201、e.Closed Q&A:These are questions that can be answered using only the information contained in a passage of reference text.For instance,given a paragraph from Wikipedia on the atom,one might ask,What is the ratio between protons and neutrons in the nucleus?Extract information from Wikipedia:Here an a
202、nnotator would copy a paragraph from Wikipedia and extract entities or other factual information such as weights or measurements from the passage.Summarize information from Wikipedia:For this,annotators provided a passage from Wikipedia and were asked to distill it into a short summary.Brainstorming
203、:This task asked for open-ended ideation and an associated list of possible options.For instance,What are some fun activities I can do with my friends this weekend?Classification:For this task,annotators were asked to make judgments about class membership(e.g.are the items in a list animals,minerals
204、 or vegetables)or to judge the properties of a short passage of text,such as the sentiment of a movie review.Creative writing:This task would include things like writing a poem or a love letter.They created a series of examples,such as the following:Example of Open QA in Databricks-dolly-15kHow do I
205、 build a campfire?Safety should always come first when starting a campfire.Ensure you have water or another way to extinguish your fire readily available.Use a campfire ring or pit,or clear a large area around your campfire location.Also,make sure campfires are permitted in your area.With safety out
206、 of the way,all a fire needs are heat,fuel,and oxygen.Any combustible material will start a fire,but good campfire fuel sources include tinder,kindling,and dry logs.Use stuff like dry pine needles,dry leaves,grasses,and wood shavings for tinder.You could also use newspaper(or any paper),dryer lint,o
207、r cardboard.For kindling,use smaller pieces of wood,such as twigs and sticks no thicker than 1 inch in diameter(about the thickness of your thumb).Dry or well-seasoned logs will ensure a good burn.Stack the wood in a way that guarantees oxygen can easily reach your flame.Many prefer a teepee or log
208、cabin stacking design.Now,use a heat source to create a flame or spark and apply this to your tinder.Once the tinder combusts,it should light the kindling,and the kindling should eventually ignite your logs.If all goes well,you should now be enjoying your campfire.24Agents,Large Language Models,and
209、Smart AppsExample of Brainstorming in Databricks-dolly-15kInstead of making a peanut butter and jelly sandwich,what else could I combine peanut butter with in a sandwich?Give five ideas.Instead of jelly,try one of the following with peanut butter in a sandwich:1.Honey2.Raisins3.Fruit preserves4.Bana
210、nas5.Sliced applesInstruction tuning enhances or unlocks latent capabilities in LLMs in a specific domain or area or with a specific task.This is now common for LLMs immediately after training the base model,and teams will usually release an instruct-tuned model alongside a base model,as we saw with
211、 Metas Llama 2,which includes a base model and an instruct-tuned version.Alignment TuningAlignment tuning is a kind of reinforcement learning,specifically (RLHF).LLMs show tremendous capabilities across a wide and diverse set of tasks.But typically,these models are not aligned to their creators pref
212、erences.An LLM generally doesnt align to any human preferences because of the way its trained.Todays LLMs involve a pretraining step,where the model is fed large corpuses of text and then taught to do word prediction without any consideration of human values or preferences.The model may exhibit unde
213、sirable behaviors like giving false information,making wrong observations,or creating dangerous answers,such as detailing how to make a pipe bomb or methamphetamines.Alignment attempts to fix this by fine tuning the model to be more,to use the words of the alignment-focused company Anthropic.This ki
214、nd of tuning may prove essential for large enterprises who can face lawsuits,public blowback,and real-world problems such as monetary losses when their models misbehave.If your LLM is advising kids to commit suicide,then you could have a major lawsuit on your hands.Thats where alignment tuning comes
215、 into the picture.reinforcement learning through human feedbackhelpful,honest,and harmlessEssentially,this boils down to taking outputs from the model and having humans rate them along a set of criteria and then training the model with a reward function to tune its answers to be closer to those the
216、human wants from it.Alignment tuning is particularly challenging and generally outside the range of most teams that dont have a very strong data science team with experience in RLHF.In particular,alignment tuning may end up hurting the overall ability of the LLM,which is often called the“alignment t
217、ax.”Thats when the model may refuse to give an answer for something it considers negative when its actually essential for the task at hand.In this example from,the Llama 2 models chat version,which is both instruct-and alignment-tuned,refuses to answer how to kill a Linux process because it doesnt u
218、nderstand that kill is the correct terminology in the world of command line Linux to stop a malfunctioning process.Instead,the model assumes that the user wants to hurt someone or something and refuses to answer.Reddit25Agents,Large Language Models,and Smart AppsRLHF is notoriously tricky to get rig
219、ht.It combines reinforcement learning with human feedback.The goal is to train agents who not only optimize a given reward function but also behave in a way that aligns with human values or intentions.Traditional RL involves an agent who interacts with an environment and learns to take actions that
220、maximize a cumulative reward over time.The agent starts with little or no knowledge about the environment and learns through trial and error.The environment gives the agent rewards(or penalties)based on the actions it takes,and the agent uses this feedback to adjust its behavior.The problem with tra
221、ditional RL is that specifying the reward function can be very challenging.Small oversights can lead to unwanted behaviors.For instance,if a robotic vacuum cleaner is rewarded only for picking up dirt and not penalized for knocking things over,it might be overly aggressive in its cleaning and break
222、items in its path.RLHF looks to address these limitations by incorporating human preferences into the mix.This is done in a number of ways,such asDemonstrations:A human demonstrates the correct behavior,and the agent learns from observing these demonstrations.Comparisons:Given two or more trajectori
223、es(sequences of actions),a human can rank or compare them based on their desirability.Corrective Feedback:While an agent is acting or after it has acted,a human can provide feedback by telling the agent what it did right or wrong.Typically,human feedback is used to create a reward model.The agent th
224、en optimizes its behavior based on this model.There can be several iterations of this process:the agent acts based on the current reward model,receives more feedback,updates the model,and so on.Dataset Examples:Atari Games with Human Feedback:Traditional RL has been applied to Atari games.Researcher
225、s can use human feedback by having humans rank different game trajectories or provide corrective feedback during gameplay.Dexterous Manipulation Datasets:Tasks that involve manipulating objects with robotic hands are notoriously difficult.Human demonstrations or feedback can help train agents to per
226、form these tasks with more finesse.Autonomous Driving Datasets:While many of these datasets focus on supervised learning,they can be adapted for RLHF.Human drivers can provide demonstrations of correct driving or feedback on simulated driving trajectories.When it comes to fine tuning LLMs,model tune
227、rs typically use comparison-based feedback to fine tune models like ChatGPT.Here,different model responses are ranked by humans based on their appropriateness.There are essentially four core steps to RLHF:Pretraining the modelGathering data and labeling themTraining the reward modelFine tuning the L
228、M26Agents,Large Language Models,and Smart AppsAs noted here,most teams are not going to do the first step themselves.They will take an existing base model or foundation model and tune it to their needs.Training advanced LLMs from scratch is incredibly challenging and expensive,from a people,compute
229、and time perspective.Thats why most teams will start from the second step,gathering the data,typically as output from the LLM itself.Alternatively,they will pull from an existing RHLF dataset,such as that they can use in their approach.This is then used for the next step,which is training a reward m
230、odel.The underlying goal of creating the reward model is to train a model to take a sequence of text and then return a scalar reward that numerically represents the human preference.The output of the scalar reward is crucial for the current state-of-the-art RLHF processes.The style of the reward mod
231、el varies.It could be another fine-tuned LM or an LM trained from scratch.The training dataset of prompt generation pairs for the reward model comes from sampling a set of prompts from the dataset the team collected or downloaded.The prompts are fed to the language model to generate new text.Human a
232、nnotators then rank the generated text outputs from the LM.You might think that you could have humans just assign a scalar reward score to each piece of text directly,but it doesnt work in practice,as each person brings different values and judgments to the task.The differing values of human labeler
233、s end up causing the scores to be all over the place and noisy.Instead,the rankings are used to compare the output of multiple models,which creates a much more regularized dataset.There are a number of ways to rank the outputs.A popular method thats worked well to fine tune models is to compare the
234、output from two different models on the same prompt.By comparing the outputs in head-to-head matchup,it becomes a simpler,binary choice of this one or that one for the human scorer.An system can be used to generate a ranking of the models and outputs in relation to each other.In general,training a r
235、eward model can be as challenging and cost-sensitive as training an LLM from scratch,so most teams will not do this unless they are a large enterprise with significant penalties for harmful output to their business and bottom line or to real-world safety.The successful reward models to date are all
236、very large,with Anthropic using models as big as 52B parameters and DeepMind using a variant of the 70B Chinchilla model as both the LLM and the reward model.The most likely reason for these big models is that the reward model needs to have as good a grasp of the text as the LLM itself to effectivel
237、y evaluate whether the output meets the preferences.Its likely that many of the reward models at the large proprietary LLM foundation model providers,such as OpenAI,Anthropic,Cohere,and Inflection,are using some variant of their most advanced model trained to be a reward model.We are starting to see
238、 the beginning of alignment-focused companies and platforms right now.recently moved into rapid alignment for models,and we expect to see more.We need an order of magnitude speed-up in alignment tuning.For instance,well need models rapidly tuned to get rid of undesirable behavior,and the process can
239、t take days or weeks or months as it does now.At the AIIA,we see a process developing over the next few years,where alignment tuning is almost entirely automated and then completely automated.Imagine a model that advises a young person to commit suicide.This is harmful behavior that we want to be tu
240、ned out of the model.One way is to build in external middleware safeguards but better to have the model aligned to what we want,so we dont need to anticipate every possible challenging question we dont want the model answering.A rapid fine tuner might allow a bug fix team to ask the model itself for
241、 synthetic data or generate the synthetic data with dedicated platforms like.For instance,imagine again that the model told the young person why its a good idea to commit suicide.In this case,the tuning platform would ask the opposite question of the model:“why is it never a good idea to commit suic
242、ide?”Then,they would check the response and pair it with the original questions and have a fine tuning platform rapidly iterate on variations of both the question and answers,generating thousands of more variants.A small percentage of the variants would get surfaces to a human in the loop tester to
243、check for accuracy,clarity,and correctness,and then the fine tuner would go to work in the background,producing a new model or adaptor(detailed in the next section).While weve looked at two very specific types of fine tuning,lets look at a tuning method thats much broader and more about efficiency v
244、ersus a specific kind of output in the way that alignment tuning is specific to aligning to human preferences.Anthropics RLHF dataset,for building helpful and harmless modelsconstitutional AIEloKognicYData27Agents,Large Language Models,and Smart AppsAdapters and Parameter Efficient Fine TuningFine t
245、uning large pretrained models is an effective transfer learning method for natural language processing(NLP)tasks,but as you begin to add new downstream tasks,you may start destroying the original model because you are adjusting its weights.This is known as.In essence,the model forgets what it previo
246、usly understood as it learns new information because its neural network is adjusted away from the original weights toward a new configuration.A small amount of fine tuning is only moderately destructive for the original model,but as we continue to train the model,it can cause the model to collapse a
247、nd its performance to suffer on tasks it was originally good at before the fine tuning process.Fine tuning is also tremendously inefficient,as it requires loading up the entire model and making changes across all its weights with new training.As an alternative,researchers have started developing mor
248、e efficient fine tuning methods that freeze most or all of the weights of the original model and attach an adapter with an additional set of weights that modifies the output of the original model.The paper titled outlined the methodology in 2019.Since then,weve had a wealth of new parameter efficien
249、t adapter methods,such as low-rank adaptation of large language models and,which have been adapted by open source and proprietary research teams alike.Theyve also been adapted beyond LLMs to diffusion models like Stable Diffusion.Adapter modules are compact and extensible.They only add a few trainab
250、le parameters per task,and new tasks can be added without destroying proficiency in early tasks.Since the parameters of the original neural net remain fixed,adapters yield a high degree of parameter sharing.In addition,adapters can be stacked or swapped,and they are much more memory efficient than l
251、oading up another copy of the entire model,since they contain only a smaller subset of parameters.Full fine tuning is incredibly expensive,especially if you have lots of different tasks that you want the model to be good at doing.Adaptor fine-tuned models are the same size as the original model plus
252、 the additional size of the adaptor versus a fully fine-tuned model,which might be much larger than the original model.The term was popularized by Hugging Face and has gained traction as the common method of referring to adapter-based approaches to fine tuning.PEFT,aka an adapter-based approach,only
253、 trains a small number of additional model parameters,while freezing most or all of the original LLM parameters.This effectively overcomes the issue of catastrophic forgetting,in which a model trained on a new task begins to forget how to do its original tasks.For instance,a model trained to recogni
254、ze dogs is over-fine-tuned on cats and begins to misidentify dogs.PEFT/Adapters can be applied to various models,not just LLMs,and theyve been readily adapted by community programmers and researchers,particularly in the Stable Diffusion community,for diffusion models.Its much easier to have a 200MB
255、adapter than a fully fine-tuned 8 GB base model.Most teams are better off training an adapter for an existing model,if they can,versus fine tuning the entire model.Unfortunately,this generally only applies to open source models,as teams need access to the model weights in order to create an adapter.
256、If you are fine tuning a commercial model such as GPT-4,you are restricted to one of their models,and the magic of how they do it exists behind the curtain.They also charge extra for using a fine-tuned version of the model versus running the base model,as much as 3 per token,as of the time of this w
257、riting,although some providers allow fine-tuned versions to run for the same token pricing.Adapters can be applied for just about any use case,such as adding medical or legal knowledge to an existing LLM,getting it to create a specific art style with something like Stable Diffusion,or with teaching
258、it a brand new task,such as classifying a new kind of hate speech that you want to filter on your corporate forums.As we noted earlier,there are a number of platforms for doing fine tuning.You have traditional MLOps-based platforms,which can be easily adopted to training a base model that you have c
259、omplete access to,such as Llama 2.These are platforms like,(purchased by),(purchased by HPE),and,along with the big cloud MLOps stacks like Amazons suite,Googles,and Microsofts running fine tuning pipelines,as well as newer companies like,and,which specialize in fine tuning LLMs.We expect more and m
260、ore companies to offer foundation models that can be easily fine-tuned and the process to become much more automated and swift.In addition,your team will likely need the help of labeling platforms like,(for multi-modal image data),(which uses Label Studio Enterprise),and especially companies that sp
261、ecialize in RHLF work,like.catastrophic forgettingParameter Efficient Transfer Learning for NLP(LoRA)QLoRAparameter efficient fine tuning(PEFT)LoRAOpenAIs platform for fine tuning ClearML HPEs Ezmeral MosaicMLDatabricksPachydermAnyscaleWeights and BiasesSageMakerVertexAzure Machine Learning Lamini H
262、umanloop Entry PointScales LLM fine tunerArgilla V7,Scale Label Studio Superb AI Human SignalToloka Enlabeler SnorkelSurge AIUp Next:Agent Frameworks28Agents,Large Language Models,and Smart Apps04Agent FrameworksThough this paper has laid out a number of major components for building successful AI-d
263、riven applications and agents,the vast majority of the work falls to four different parts of the stack,which we call the“big four”:LLMs,which act as the brains of the appThe code,which augments the LLM and connects it to the real worldDatabases,which act as the memory and knowledge repository of the
264、 modelOther models,which provide additional capabilities to the LLMs and the codeOur thesis is that,over time,some of the other parts of the stack,like middleware and security tooling,will become more important as these apps mature,but for now,the big four are what are driving AI-driven applications
265、 like,which can do localization in 60 languages in the voice of the original content creator,and,which can do research on any topic and provide a detailed report.In addition,we expect the testing software to become much more important.With traditional software,its easy to write unit tests and regres
266、sion tests to ensure that new features or bug fixes dont break the software in unexpected ways.This is currently challenging with agents and AI-driven applications because it often involves a human being looking at the results and deciding if its any good,which is just not scalable.Today,the vast ma
267、jority of production-grade application developers that weve spoken to are using their own custom written code and frameworks as opposed to one of the major agent frameworks that currently exist,such as,and.While these frameworks have a tremendous following for new developers in the space and when pe
268、ople are in the prototyping phase,weve discovered that many sophisticated teams end up writing their own frameworks that are specific to their applications as they get further along.RaskAomniLangChainHaystack Semantic KernelLlamaIndex29Agents,Large Language Models,and Smart AppsThats not as much of
269、a surprise as it seems.Agents,generative AI,and AI-driven applications are still incredibly new.Developers and designers are simply figuring out how to build them effectively.At the AIIA,we believe that modularity and true abstraction will be the key to whoever wins in the long run and becomes the d
270、efault way to write the code that powers the agents of tomorrow.This tracks with developers weve connected with in the community and with public posts on places like Reddit,.with people talking through their experiences“My workflow primarily involved querying text from Pinecone and then using either
271、 models on Hugging Face or Lllama.cpp versions of the models.I had metadata stored along with the embeddings in Pinecone and wanted to filter based on that.As the number of filters and conditions increased,it just became very cumbersome to be able to manage the text retrieval using Pinecone.Eventual
272、ly,I rewrote the entire code without the LLM chains by breaking up the code into Query/Retriever Classes,Prompt Creator functions and Text Generation Classes.This way,the code was modular.The prompts and text generation performance could be checked without modifying the complete chain and passing al
273、l the metadata filters every time.”“My workflow primarily involved querying text from Pinecone and then using either models on Hugging Face or Lllama.cpp versions of the models.I had metadata stored along with the embeddings in Pinecone and wanted to filter based on that.As the number of filters and
274、 conditions increased,it just became very cumbersome to be able to manage the text retrieval using Pinecone.Eventually,I rewrote the entire code without the LLM chains by breaking up the code into Query/Retriever Classes,Prompt Creator functions and Text Generation Classes.This way,the code was modu
275、lar.The prompts and text generation performance could be checked without modifying the complete chain and passing all the metadata filters every time.”They are very different kinds of applications than traditional deterministic applications.It will take time for the best design patterns,abstractions
276、,and solutions to present themselves.This is typical of any early ecosystem.It took ten years for the industry to get to the ideal way to manage containers,with many solutions vying for supremacy,and along the way,Google developer Borg,then Omega,and finally Kubernetes building on what they learned
277、along the way.Over time,these frameworks will mature and get better at delivering the right abstractions to developers,saving them time and energy,but at this point,it is way too early to call it a winner.In addition,many of these frameworks have shown good enough traction to attract venture money,s
278、o expect their code bases to evolve rapidly in the coming years.For now,lets look at each of the most well-known frameworks briefly in turn,with the understanding that we wont be able to cover them comprehensively here but that we will outline their basic baseline capabilities.LangChainLangChain is
279、currently one of the most popular frameworks.The team behind it recently raised a and another.,originally developed by Harrison Chase,is a Python and JavaScript library for interacting with OpenAIs GPT APIs,and the framework was later extended to include more models.The idea for LangChain came from
280、the paper written by researchers at DeepMind,Google Brain et al.,which is generally called the ReAct paper.The paper showcases a prompting technique that lets the model do better reasoning and then take better actions by using predefined sets of tools,such as searching the internet.The one-two punch
281、 of reasoning and action has turned into a popular workflow that often improves the output and lets LLMs solve problems more effectively.$10 million seed round$20$25 million at a$200 million valuation Series ALangChainReAct:Synergizing Reasoning and Acting in Language Models30Agents,Large Language M
282、odels,and Smart Apps(Source:ReAct paper)(Source:)ReAct paperThe ReAct workflow was effective for the InstructGPT/text-davinci-003 model,aka GPT-3,but it hasnt proven as effective or necessary for GPT-4.Time will tell if the surge of funding helps the ecosystem develop in a smart and balanced approac
283、h that is of real value to agents and smart app developers.For now,the LangChain community is massive,and the network effect can often steer a project to greatness if its able to attract sound developers who are experts in abstraction.At its core,LangChain allows developers to create a“chained”appli
284、cation,a sequence of calls to components,which can include other chains.LangChain continues to develop and may change tremendously over the next few years now that they have extensive backing and funding,but the basic components as they stand today are as follows:Model I/O=The primary interface with
285、 language models for input/outputIndexes=The interface for retrieving application-specific data like PDFs,web pages,databases and moreChains=Allows developers to construct a sequence of hard-coded calls to external tools and LLMsAgents=Gives the LLM more autonomy to choose the best way to accomplish
286、 a high-level objective,like using an API,as opposed to hard coding the chainMemory=Allows the developer to keep the application state between chain runsCallbacks=Log and stream intermediate steps of any chain31Agents,Large Language Models,and Smart AppsLangChain was one of the first to embrace agen
287、t-style approaches to building models,meaning that the LLM does much of the logic and planning or it figures out the right sequences of events when interacting with an API.Now most frameworks are pivoting toward that rapidly.Semantic Kernel from Microsoft already has these capabilities,and the team
288、is leaning into it more and more.The main idea of agents versus chains is that instead of the programmer picking the sequence of actions,the LLM chooses the sequence of actions.In chains,a sequence of actions is hard-coded(in code).In agents,a language model is used as a reasoning engine to determin
289、e which actions to take and in which order.The agent component continues to evolve beyond the original ReAct concepts.This is basically a prompting strategy at this point and can include the following:The personality of the agent(useful for having it respond in a certain way)Background context for t
290、he agent(useful for giving it more context on the types of tasks its being asked to do)Prompting strategies to invoke better reasoning(the most famous/widely used being)ReActLlamaIndexLlamaIndex is another popular framework that developers have experimented with or woven into their projects.LlamaInd
291、ex is more firmly focused on acting as a“data framework”to help you build LLM apps.Its GitHub summarizes it nicely:ReadmeOffers data connectors to ingest your existing data sources and data formats(APIs,PDFs,docs,SQL,etc.)Provides ways to structure your data(indices,graphs)so that this data can be e
292、asily used with LLMsProvides an advanced retrieval/query interface over your data:Feed in any LLM input prompt,get back retrieved context and knowledge-augmented outputAllows easy integrations with your outer application framework(e.g.,with LangChain,Flask,Docker,ChatGPT,anything else).HaystackHayst
293、ack has been around for a bit longer than the others,and it has mostly specialized in extractive QA,as noted by.Much of the early development went into question answering and retrieval,whereas LangChain went more into agents and put their energy there early on.Haystacks focus was originally on makin
294、g the best use of local transformer models for app builders.It allows people to build elaborate NLP pipelines for summarization,document similarity,semantic search,etc.Theyve also recently added more agentic capabilities,allowing the agent to use prompt-defined controls to find the best underlying p
295、ipeline or tool for the task.one of the developers32Agents,Large Language Models,and Smart AppsCurrently,Haystack supports the following,per:its documentationEffortless deployment of models from Hugging Face or other providers into your NLP pipelineCreate dynamic templates for LLM promptingCleaning
296、and preprocessing functions for various formats and sourcesSeamless integrations with your preferred document store(including many popular vector databases like Faiss,Pinecone,Qdrant,or Weaviate):keep your NLP-driven apps up-to-date with Haystacks indexing pipelines that help you prepare and maintai
297、n your dataThe for a faster and more structured annotation processfree annotation toolTooling for fine tuning a pretrained language modelSpecialized that use different metrics to evaluate the entire system or its individual componentsevaluation pipelinesHaystacks REST API to deploy your final system
298、 so that you can query it with a user-facing interfaceSemantic KernelSemantic KernelOpenAIAzure OpenAIHugging Facepluginsfew lines of codeplanners(SK),developed by a team at Microsoft,is“a lightweight SDK to enable integration of LLMs like,and with conventional programming languages like C#,Python,a
299、nd Java.Semantic Kernel achieves this by allowing you to define that can be chained together in just a.”It has a tighter focus on reasoning and planning than the other frameworks discussed here.For instance,it has that can generate a plan to reach a goal and then execute that plan.Such advanced abst
300、raction will likely become a cornerstone of all agent frameworks in the future,but currently its Semantic Kernel that leans most heavily onto it now.The main downside to it currently is uneven support across the three languages that the project supports,which you can see in this chart.In particular,
301、the Python code is lacking at the moment,which is a shame,as that is the most commonly used language in machine learning and in agent development and smart apps.(Source:SK documentation)(Source:)SK documentation33Agents,Large Language Models,and Smart AppsSK supports,as well as function chaining,vec
302、torized memory,and intelligent planning,though all of these are work in progress.One of the biggest goals of SK is to support design patterns from the latest AI research so that developers can infuse their apps with complex skills like recursive reasoning.Theyve also adopted the.With native function
303、s,you can have the kernel call C#or Python code directly so that you can manipulate data or perform other operations.In this way,native functions are like the hands of your AI app.They can be used to save data,retrieve data,and perform any other operation that you can do in code that is ill-suited f
304、or LLMs(e.g.,performing calculations).The SK framework allows you to create two kinds of functions:prompt templatingOpenAI plugin specification as their own standardSemantic functionsNative functionsSemantic functions allow your AI app to listen to users and responds back with a natural language.SK
305、uses connectors to get those asks and responses back and forth to the LLM.Native functions allow the kernel to call C#or Python code directly so that you can manipulate data or perform other operations.According to.the documentation“.in this way,native functions are a bit like the hands of your AI a
306、pp.They can be used to save data,retrieve data,and perform any other operation that you can do in code that is ill-suited for LLMs(e.g.,performing calculations).”“.in this way,native functions are a bit like the hands of your AI app.They can be used to save data,retrieve data,and perform any other o
307、peration that you can do in code that is ill-suited for LLMs(e.g.,performing calculations).”The planner is probably the most unique part of SK,and it will continue to receive considerable attention from the team,as they see it as the key to making their framework widely used and special.The planner
308、is a function that takes a users ask and returns a plan to them on how it will accomplish the request.It allows the LLM to mix and match plugins that are registered to the kernel so that it can create a series of steps,much like LangChains agents function.It allows developers to create atomic functi
309、ons that they might not have thought about yet.They use the following example:“If you had task and calendar event plugins,planner could combine them to create workflows like remind me to buy milk when I go to the store or remind me to call my mom tomorrow without you explicitly having to write code
310、for those scenarios.”“If you had task and calendar event plugins,planner could combine them to create workflows like remind me to buy milk when I go to the store or remind me to call my mom tomorrow without you explicitly having to write code for those scenarios.”34Agents,Large Language Models,and S
311、mart AppsThe planner is extensible.This means that there are several planners to choose from,and there will likely be more over time as new papers and concepts on how to elicit the best reasoning from LLMs come to light.Developers can also write their own custom planner.The documentation gives some
312、examples of:how to use planners here“Behind the scenes,a planner uses an LLM prompt to generate a plan.You can see the prompt that is used by Sequential Planner by navigating to the skprompt.txt file in the Semantic Kernel repository.You can also view the prompt used by the basic planner in Python.”
313、“Behind the scenes,a planner uses an LLM prompt to generate a plan.You can see the prompt that is used by Sequential Planner by navigating to the skprompt.txt file in the Semantic Kernel repository.You can also view the prompt used by the basic planner in Python.”The Future of FrameworksAt the AIIA,
314、we recommend that developers experiment with many different frameworks and not be afraid to go their own route at this stage in the ecosystem development but be prepared to potentially toss out homegrown solutions as the frameworks mature.Its important to realize that all of these frameworks are ver
315、y new and they may disappear,change dramatically,or get overtaken by a completely new framework.None of them can be considered completely production-ready at the moment,though they are developing in that direction rapidly.From talking extensively with developers and from an analysis of similar softw
316、are technological developments in history,we see these frameworks as the most likely to evolve or be replaced in the coming years.Learning how best to abstract solutions to well-known problems in a space is a complex problem that happens over time.It takes many engineers to code over an extended per
317、iod and learn from each other.As that happens,more and more accepted solutions and ways of dealing with similar problems solidify over time.These frameworks are primarily written in Python,but SK also supports C#and Java.We expect more languages to get better traction in the coming years,especially
318、flexible modern languages like Rust,though,for now,the dominant tooling in such languages is fine tuning or at the moment.inference Up Next:Vector Databases35Agents,Large Language Models,and Smart Apps05VectorDatabasesOne of the other big new tools in the arsenal of AI app designers is vector databa
319、ses,a category that previously did not get much fanfare.Thats probably because databases have been around since the early days of computing,and theyre well-known foundations to many kinds of applications.But the familiarity and longevity of databases obscure the fact that theyve evolved considerably
320、 over that time with the needs of new kinds of applications.Today,databases are evolving once again to handle the needs of machine learning with vector databases.Initially,databases were all about neat tables filled with rows and columns.They served the applications of the desktop and early enterpri
321、se era well and made tech titans out of companies like Oracle.With the rise of the cloud and big data came NoSQL databases like Cassandra or MongoDB with their JSON documents,which can scale better than traditional databases for certain kinds of workloads that started to crop up as hundreds of milli
322、ons and then billions of people came online.Vector databases are one of the latest iterations of the database family.They store vector embeddingsthe unique,critical data meant for AI and machine learning applications.Vector embeddings are simply numerical representations of data.They could be images
323、 or videos or the words/sentences used in NLP.Some datasets are more structured and have columns of numeric values,and others might have more unstructured text like an entire legal document,a novel,or an article online.But any data can be converted down to a vector,whether it is a whole document or
324、just a few words or the pixels in an image.Essentially,any other object can be reduced down to a vector easily.Even numerical data can be turned into vectors.In the world of LLMs and agents,vector databases are the hidden workhorses.They make it possible to sort and store and search embeddings by se
325、mantic similarity,represented by their proximity in a vector space.Thats a useful superpower when it comes to natural language queries.This means that a query doesnt need to be exact to be a match.36Agents,Large Language Models,and Smart AppsTraditional databases deliver exact matches to everything,
326、such as finding me this record of John Nash who lives at 125 Springfield Road and giving me his last orders.However,vector databases use similarity metrics to find any vectors that are close to the query.To do that,they use approximate nearest neighbor(ANN)search algorithms that are optimized for se
327、arch,like product quantization,hierarchical navigable small world,or random projection.Basically,these algorithms compress the original vector,making the query process much faster.They can also do other kinds of similarity comparisons,such as Euclidean distance and dot product comparisons,which iden
328、tify the most useful results.In a traditional database like Postgres,were usually querying for rows where the values are exact matches for our query.In vector databases,the database backend applies those similarity metrics to find a vector that is the most similar to our query.Its various algorithms
329、 are put together into a pipeline that provides fast retrieval of the neighbors of a queried vector.Since the vector database provides approximate results,as opposed to exact results,the main trade-offs we face are total accuracy versus speed.If we need more accuracy,it can make the query slower,so
330、there is always a tradeoff between accuracy and speed.However,a well-designed vector database can provide a very fast search with high-quality accuracy.How does this translate to the real world?All these similarity searches might let you easily find questions that are very similar to the question so
331、meone is asking,even though they used a different language to ask it.This means that the app designer can pull back prebaked responses rather than waiting for an answer from the cloud LLMs API and save on round-trip time and cost.These embeddings are like a compact snapshot of meaning and can functi
332、on as a filter for new data during inference.If youre just pulling up answers in a database by exact match,that works fine if the range of questions is highly structured and limited,but when the range of questions can be virtually infinite,that falls apart fast.Vector databases can also be useful to
333、 store the kind of fuzzy knowledge that were used to dealing with as human beings.Picture a pair coding LLM that understands the question youre asking and looks up similar code in previous answers,providing a shortcut to solving the same problem multiple times.Lets take a look at an example.One user might ask an LLM“Whats the best way to restore hair as I age?”and another might ask“How do I get my