《微軟:ChatGPT機器人-設計原則和模型能力(2023)(英文版)(25頁).pdf》由會員分享,可在線閱讀,更多相關《微軟:ChatGPT機器人-設計原則和模型能力(2023)(英文版)(25頁).pdf(25頁珍藏版)》請在三個皮匠報告上搜索。
1、2023-2-20ChatGPT for Robotics:Design Principles and Model AbilitiesSai Vemprala*,Rogerio Bonatti*,Arthur Bucker,and Ashish KapoorMicrosoft Autonomous Systems and Robotics ResearchThis paper presents an experimental study regarding the use of OpenAIs ChatGPT 1 forrobotics applications.We outline a st
2、rategy that combines design principles for promptengineering and the creation of a high-level function library which allows ChatGPT to adaptto different robotics tasks,simulators,and form factors.We focus our evaluations on theeffectiveness of different prompt engineering techniques and dialog strat
3、egies towards theexecution of various types of robotics tasks.We explore ChatGPTs ability to use free-formdialog,parse XML tags,and to synthesize code,in addition to the use of task-specific promptingfunctions and closed-loop reasoning through dialogues.Our study encompasses a range oftasks within t
4、he robotics domain,from basic logical,geometrical,and mathematical reasoningall the way to complex domains such as aerial navigation,manipulation,and embodied agents.We show that ChatGPT can be effective at solving several of such tasks,while allowing users tointeract with it primarily via natural l
5、anguage instructions.In addition to these studies,weintroduce an open-sourced research tool called PromptCraft,which contains a platform whereresearchers can collaboratively upload and vote on examples of good prompting schemes forrobotics applications,as well as a sample robotics simulator with Cha
6、tGPT integration,makingit easier for users to get started with using ChatGPT for robotics.Videos and blog:aka.ms/ChatGPT-RoboticsPromptCraft,AirSim-ChatGPT code:https:/ rapid advancement in natural language processing(NLP)has led to the development of large languagemodels(LLMs),such as BERT 2,GPT-3
7、3,and Codex 4,that are revolutionizing a wide range ofapplications.These models have achieved remarkable results in various tasks such as text generation,machine translation,and code synthesis,among others.A recent addition to this collection of models was theOpenAI ChatGPT 1,a pretrained generative
8、 text model which was finetuned using human feedback.Unlikeprevious models which operate mostly upon a single prompt,ChatGPT provides particularly impressiveinteraction skills through dialog,combining text generation with code synthesis.Our goal in this paper is toinvestigate if and how the abilitie
9、s of ChatGPT can generalize to the domain of robotics.Robotics systems,unlike text-only applications,require a deep understanding of real-world physics,envi-ronmental context,and the ability to perform physical actions.A generative robotics model needs to have arobust commonsense knowledge and a sop
10、histicated world model,and the ability to interact with users tointerpret and execute commands in ways that are physically possible and that makes sense in the real world.These challenges fall beyond the original scope of language models,as they must not only understand themeaning of a given text,bu
11、t also translate the intent into a logical sequence of physical actions.In recent years there have been different attempts to incorporate language into robotics systems.These*Equal contribution,random order.2023 Microsoft.All rights reserved.ChatGPT for RoboticsFigure 1:Current robotics pipelines re
12、quire a specialized engineer in the loop to write code to improve the process.Ourgoal with ChatGPT is to have a(potentially non-technical)user on the loop,interacting with the language model throughhigh-level language commands,and able to seamlessly deploy various platforms and tasks.efforts have la
13、rgely focused on using language token embedding models,LLM features,and multi-modalmodel features for specific form factors or scenarios.Applications range from visual-language navigation 5,6,language-based human-robot interaction 7,8,and visual-language manipulation control 9,10,11.However,despite
14、the potential advantages of using LLMs in robotics,most of the existing approaches arerestricted by a rigid scope and limited set of functionalities,or by their open-loop nature that does not allowfor fluid interactions and behavior corrections from user feedback.Models such as GPT-3,LaMDA,and Codex
15、 also show promise in zero-shot robotics scenarios when taskedwith high-level agent planning 12,13 or code generation 14,15.These early demonstrations inspired usto investigate ChatGPT as a potentially more versatile tool for the robotics domain,as it incorporates thestrengths of natural language an
16、d code generation models along with the flexibility of dialogue.ChatGPTsability to engage in a free-form dialog and capture long context allows users to interact with the model in amore natural fashion,with flexible behavior correction.In this paper,we aim to demonstrate the potential of ChatGPT for
17、 robotics applications.We outline a keyconcept that unlocks the ability to solve robotics applications with ChatGPT,which is the creation of ahigh-level function library.Given that robotics is a diverse field where several platforms,scenarios,and toolsexist,there exists an extensive variety of libra
18、ries and APIs.Instead of asking LLMs to output code specificto a platform or a library,which might involve extensive finetuning,we instead create a simple high-levelfunction library for ChatGPT to deal with which can then be linked in the back-end to the actual APIs forthe platforms of choice.Thus,w
19、e allow ChatGPT to parse user intent from natural dialog,and convert thatto a logical chaining of high-level function calls.We also outline several prompt engineering guidelines thathelp ChatGPT solve robotics tasks.Our research shows that ChatGPT is capable of solving various robotics-related tasks
20、 in a zero-shot fashion,while adapting to multiple form factors,and allowing for closed-loop reasoning through conversation.Inaddition,we aim to show current model limitations,and provide ideas on how to overcome them.Our maincontributions are listed below:We demonstrate a pipeline for applying Chat
21、GPT to robotics tasks.The pipeline involves several prompt-ing techniques such as free-form natural language dialogue,code prompting,XML tags,and closed-loopreasoning.We also show how users can leverage a high-level function library that allows the model toquickly parse human intent and generate cod
22、e for solving the problem;We experimentally evaluate ChatGPTs ability to execute a variety of robotics tasks.We show the models2ChatGPT for RoboticsFigure 2:Robotics pipeline employing ChatGPT with the user on the loop to evaluate the outputs quality and safety.capabilities and limitations when solv
23、ing mathematical,logical,and geometrical operations,and thenexplore more complex scenarios involving embodied agents,aerial navigation,and manipulation.Weinclude both simulation and real-world experiments that result from ChatGPTs plans;We introduce a collaborative open-source platform,PromptCraft,w
24、here researchers can work together toprovide examples of positive(and negative)prompting strategies when working with LLMs in the roboticscontext.Prompt engineering is a mostly empirical science,and we want to provide a simple interfacefor researchers to contribute with knowledge as a community.Over
25、 time we aim to provide differentenvironments where users can test their prompts,and welcome new contributions;We release a simulation tool that builds on Microsoft AirSim 16 combined with a ChatGPT integration.This AirSim-ChatGPT simulation contains a sample environment for drone navigation and aim
26、s to be astarting point for researchers to explore how ChatGPT can enable robotics scenarios.With this work we hope to open up new opportunities and avenues for future research fusing LLMs androbotics.We believe that our findings will inspire and guide further research in this exciting field,pavingt
27、he way for the development of new,innovative robotics systems that can interact with humans in a natural,intuitive manner.For more details,we encourage readers to view detailed videos of our experiments in theproject webpage.2.Robotics with ChatGPTPrompting LLMs for robotics control poses several ch
28、allenges,such as providing a complete and accuratedescriptions of the problem,identifying the right set of allowable function calls and APIs,and biasing theanswer structure with special arguments.To make effective use of ChatGPT for robotics applications,weconstruct a pipeline composed of the follow
29、ing steps:1.First,we define a high-level robot function library.This library can be specific to the form factor orscenario of interest,and should map to actual implementations on the robot platform while being nameddescriptively enough for ChatGPT to follow;2.Next,we build a prompt for ChatGPT which
30、 describes the objective while also identifying the set ofallowed high-level functions from the library.The prompt can also contain information about constraints,or how ChatGPT should structure its responses;3.The user stays on the loop to evaluate code output by ChatGPT,either through direct analys
31、is or throughsimulation,and provides feedback to ChatGPT on the quality and safety of the output code;4.After iterating on the ChatGPT-generated implementations,the final code can be deployed onto the robot.We show a visual depiction of this pipeline in Figure 2 for the example of a household robot.
32、3ChatGPT for Robotics2.1.Construction and description of the robotics API libraryRobotics being a well-established field,there already exists a multitude of libraries,either black-box oropen-source,that can be used for basic functionalities in both the perception and action domains(e.g.objectdetecti
33、on and segmentation,mapping,motion planning,controls,grasping).If properly specified in theprompt,the LLM is able to use these pre-defined functions for robot reasoning and execution.One important prompt design requirement is that all API names must be descriptive of the overall functionbehavior.Cle
34、ar names are essential to allow the LLM to reason over functional connections between APIsand produce the desired outcome for the problem.Hence,we can define high-level functions,which actas wrappers over actual implementations from the respective libraries.For example,a function nameddetect_object(
35、object_name)could internally link to an OpenCV function or a computer vision model,whereas something likemove_to(x,y,z)could internally invoke a motion planning and obstacle avoidancepipeline along with the appropriate low-level motor commands for a drone.Listing such a collection ofhigh-level funct
36、ions in the prompt is key in allowing ChatGPT to create logical sequences of behavioralprimitives,and in generalizing to different scenarios and platforms.Depending on the context,we recommend explaining the function of APIs and if needed,breaking themdown into sub-components with clear inputs and o
37、utputs,similar to code documentation.In Figure 3 wepresent an example of a good API prompting strategy for a home cook robot scenario.The strategy presentedallows ChatGPT to reason about the order and content of tasks according to the functions the robot is actuallyable to execute.In contrast,we ref
38、er the interested reader to Appendix A.1 for an example of how ChatGPTreasons when no API guidance is given,which leads to a unbounded text-based answer,or to Appendix A.2for an example of API under-specification,which leads to hallucinations over function call parameters.We note that unlike the bri
39、ttle structure of classical symbolic AI,which required rigid pre-defined relation-ships between objects and functions,LLMs are capable of defining new functions and concepts altogetherwhen relevant to a particular problem.This capability confers flexibility and robustness to LLMs whendealing with ro
40、botics applications.Figure 4 shows how ChatGPT can create new high-level concepts andeven low-level code when needed to solve a problem,even fusing existing APIs.The user on the loop cantake advantage of this ability as a design strategy,and iteratively define new APIs with the help of the LLMwhen t
41、he current ones are not sufficient to solve the task at hand.2.2.Clear description of the task details in the promptBy providing a clear and concise description of the desired robotics task and its context,ChatGPT cangenerate more accurate responses.A good context description should contain,besides
42、robot APIs:Constraints and requirements:specify constraints or requirements that are relevant to the task.If the taskinvolves moving objects,you might specify the weight,size,and shape of the objects to be moved.Environment:describe the environment in which the robotics task is taking place.For exam
43、ple,if thetask is to navigate a maze,you might describe the size and shape of the maze,as well as any obstacles orhazards that need to be avoided.Current state:describe the current state of the robotics system.For example,if the task is to pick up anobject,you might describe the current position and
44、 orientation of the robot and the object.Goals and objectives:state the goals and objectives of the task.If the task is to assemble a puzzle,youmight specify the number of pieces that need to be assembled and the desired completion time.Solution examples:demonstrate how similar tasks can be solved a
45、s a means to guide the LLMs solutionstrategy.For example,if a task involves interactions with the user,we can describe an example of how andwhen the robot should be asking for the users input(see Fig.5).Note that priming can also introducebiases,so we should provide a diverse range of examples and a
46、void overly prescriptive language.4ChatGPT for RoboticsImagine we are working with a household robot.The job of this robot is to make an omelette.The objects available around are:fridge,bowl,pan,oil,stoveThe main functions you can use are:locate_object(object_name):Returns the XYZ coordinates of an
47、object of interest.go_to_location(object_name):Moves robot to a location specified by XYZ coordinates.Returns nothing.pick_up(object_name):Picks up the object of interest.Returns nothing.use_item(object_name):Takes the name of an object as input.Performs an appropriate action on the object at hand(p
48、our oil,crack eggs).Returnsnothing.Can you make use of these to write code to go to the kitchen and make an omelette?Sure,here is an example Python code that utilizes the provided functions to prepare an omelette for an elderly person:fridge_location=locate_object(fridge)bowl_location=locate_object(
49、bowl)pan_location=locate_object(pan)stove_location=locate_object(stove)oil_location=locate_object(oil)#Start the omelette-making processgo_to_location(fridge_location)pick_up(eggs)go_to_location(bowl_location)use_item(eggs)#Crack eggs into the bowlgo_to_location(pan_location)pick_up(pan)use_item(pan
50、)#Place the pan on the stovego_to_location(oil)pick_up(oil)go_to_location(pan_location)use_item(oil)#Pour oil into the pango_to_location(bowl)pick_up(bowl)go_to_location(pan_location)use_item(bowl)#Pour the egg mixture into the pango_to_location(stove_location)use_item(stove)#Turn on the stoveprint(
51、Enjoy your omelette!)Figure 3:Example of a good prompting practice when defining different APIs for a robotics problem.Clear functiondefinitions that can be executed by the robot,and the task objectives are explicitly stated.Even a well designed prompt might not contain all necessary information nee
52、ded to solve the problem,or insome cases ChatGPT is not able to generate the correct response in a zero-shot fashion.In these cases,wefind that a simple and effective strategy a user can take is to send additional instructions to ChatGPT in thechat format describing the issue,and have it correct its
53、elf.Previous approaches that rely on GPT-3 or Codexmodels 15,14 require the user to re-engineer the input prompt and generate new outputs from scratch.The dialog ability of ChatGPT,however,is a surprisingly effective vehicle for behavior correction.Section 3.2and the supplementary videos show exampl
54、es of interactive behavior between the user and ChatGPT.2.3.Special arguments to bias the answers structureDifferent prompting methodologies can be used to force the output of the model to obey some specificpattern.For instance,a user might want to automatically parse ChatGPTs output so that it can
55、be used inother scripts for real-time execution.As shown in Figure 3,a simple strategy is to directly ask ChatGPT toproduce code in a specific language(e.g.Python,C+).What usually follows is a free-form text paragraphfollowed by a code block.A more structured response can be produced by requesting t
56、he model to use XMLtags to help us parse the output automatically,as shown in Figure 5.In other settings users might want to force the model to output answers following a list pattern as opposedto code or free-form text.Appendix A.3 displays an example of such approach,where the last line of theuser
57、s prompt directs the models output.5ChatGPT for RoboticsImagine we are working with a household robot.The job of this robot is to make an omelette.The objects available around are:fridge,bowl,pan,oil,stoveThe main functions you can use are:locate_object(object_name):Returns the XYZ coordinates of an
58、 object of interest.go_to_location(object_name):Moves robot to a location specified by XYZ coordinates.Returns nothing.pick_up(object_name):Picks up the object of interest.Returns nothing.use_item(object_name):Takes the name of an object as input.Performs an appropriate action on the object at hand(
59、pour oil,crack eggs).Returnsnothing.Can you make use of these to write code to go to the kitchen and make an omelette?Try to use these low level functions to construct more high levelbehaviors and then solve the task using those high level primitives.Sure,here is an example Python code that utilizes
60、 the provided functions to prepare an omelette for an elderly person:def get_ingredients():go_to_location(locate_object(fridge)#Go to the fridge and get the eggs.pick_up(eggs)go_to_location(locate_object(bowl)#Go to the counter and get the bowl.pick_up(bowl)use_item(eggs)#Crack the eggs into the bow
61、l.def heat_pan():go_to_location(locate_object(pan)pick_up(pan)use_item(pan)#Put the pan on the stove.go_to_location(locate_object(stove)use_item(stove)#Turn on the stove.(rest of the response omitted for brevity)Figure 4:Example of new API development using LLM intrinsic reasoning capabilities.The m
62、odel improves its answerin comparison to Figure 3,which contained the same task but with a fixed set of APIs.Imagine you are helping me interact with a mobile ground robot.You are in a room that contains several objects:three chairs,a glass table,a mirror,and two couches.All of your outputs need to
63、be identified by one of the following tags:Always ask me a clarification questions if you are unsure Explain why you did something the way you did it Output code command that achieves the desired goal For example:Me:Go to the chairYou:There are three chairs in the world,which one do you want me to g
64、o to?Me:Go to chair_3You:target_location=get_location(chair_3);move_to_location(target_location);I found the location ofthe chair with the get_location function,and moved the robot towards the desired location Are you ready?Yes,I am ready.Please give me your first command.Go to the couch There are t
65、wo couches in the room.Which couch do you want me to go to?Go to the second sofa target_location=get_location(second_sofa);move_to_location(target_location);I used the get_locationfunction to find the location of the second sofa and then moved the robot towards that location.Now move to the other on
66、e target_location=get_location(first_sofa);move_to_location(target_location);I used the get_locationfunction to find the location of the first sofa and then moved the robot towards that location.Figure 5:Example of how user prompting can bias the models response.We demonstrate how and when to intera
67、ctwith the user to ask for help.In addition,we force responses to be framed into specific XML tags which can aid codeparsing in a post-processing stage.6ChatGPT for Robotics3.ChatGPT abilities for solving robotics problemsIn this section take a deeper look into ChatGPTs abilities for solving robotic
68、s problems.Specifically,weinvestigate how well ChatGPT performs when dealing with a variety of robotics-related tasks,from simplespatio-temporal reasoning problems all the way to real-world aerial agent and manipulation deployments.We highlight several interesting capabilities we observed during the
69、se experiments.While ChatGPTs capabilities are impressive,practical deployment safety considerations should not beoverlooked,especially in the case of physical robotics deployments.As shown in Figure 2,we find itnecessary to have a human on the loop to monitor and intervene in case of unexpected beh
70、aviors generated byChatGPT.Additionally,the use of simulators can be particularly helpful to evaluate the models performancebefore deployment in the real world.We emphasize that the use of ChatGPT for robotics is not a fullyautomated process,but rather acts as a tool to augment human capacity.We pro
71、vide all initial prompts for the problems in this section in Appendix B.For the sake of brevity weonly include brief excerpts from each task.The full conversations can be found in the following repository:https:/ task planningFirst,we note that ChatGPT is able to solve several robotics tasks in a ze
72、ro-shot fashion,with access only tothe prompt and the description of a function library,and no requirement of actual code examples.3.1.1.Spatio-temporal reasoning:catching a basketball with visual servoingIn this example,we ask ChatGPT to control a planar robot equipped with an upward-facing camera.
73、The robotis expected to catch a basketball using a visual servoing method based on the appearance of a basketball.Wesee that ChatGPT is able to appropriately use the provided API functions,reason about the balls appearanceand call relevant OpenCV functions,and command the robots velocity based on a
74、proportional controller.Even more impressive is the fact that ChatGPT can estimate the appearance of the ball and the sky in thecamera image using SVG code.This behavior hints at a possibility that the LLM keeps track of an implicitworld model going beyond text-based probabilities.Initial model prom
75、pt found in Appendix B.1Write a python script that executes a visual servoing approach towards catching a basketball in a court.You can use opencv functions to detect theball as an orange blob.Heres a sample implementation of the script:import cv2import numpy as npdef detect_ball(frame):#Convert the
76、 image to HSV color space for better color detectionhsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)#Define the range of orange color in HSVlower_orange=np.array(10,100,100)upper_orange=np.array(30,255,255)#Threshold the image to get only the orange colormask=cv2.inRange(hsv,lower_orange,upper_orange)#Find
77、 the contours of the thresholded imagecontours,_=cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)#If a contour is found,return its centerif len(contours)0:c=max(contours,key=cv2.contourArea)M=cv2.moments(c)if Mm00 0:7ChatGPT for Roboticscenter=(int(Mm10/Mm00),int(Mm01/Mm00)return cen
78、terreturn Nonedef catch_ball():#Continuously get images from the camerawhile True:frame=get_image()#Detect the ball in the imageball_center=detect_ball(frame)#If the ball is detected,move towards its locationif ball_center is not None:x,y=ball_centerrobot_x,robot_y=get_location()vx=(x-robot_x)*0.5vy
79、=(y-robot_y)*0.5move_by_velocity(vx,vy)if _name_=_main_:catch_ball()This implementation uses a simple visual servoing approach where the robot moves towards the ball at a constant velocity.The velocity isproportional to the difference between the balls and robots locations,so the robot will move fas
80、ter as it gets closer to the ball.You can adjust thegain(i.e.,the multiplication factor of 0.5)to control the speed of the robot.Generate SVG code that depicts how the ball would look in the camera image exactly in the centerHeres an example of an SVG code that depicts a ball in the center of a came
81、ra image:SVG image rendered directly from the code generated by ChatGPT3.1.2.Aerial robotics:real-world drone flight with intuitive human-robot interface(Click for video)We also used ChatGPT with a different API implementation to control a real drone.ChatGPT provided anextremely intuitive natural la
82、nguage interface between the user and the robot,writing robot code based onambiguous and ill-defined instructions,and asking clarification questions when necessary.The model wasalso able to write complex code structures for drone navigation(circular and lawnmower inspection)basedsolely on the prompt
83、s base APIs.Initial model prompt found in Appendix B.2I am thirsty.Find me a drinkQuestion:What type of drink are you looking for?There are several options in the scene,including coconut water and diet coke can.I just came back from the gym and am looking for a healthy option.In that case,you might
84、like coconut water.Here is the code to fly to the coconut water object:drink_position=self.dict_of_objectscoconut waterself.tello.fly_to(drink_position)Explanation:This code first retrieves the X,Y,Z,and angle position of the coconut water object from the dict_of_objects dictionary.The fly_tomethod
85、is then used to fly the drone to the retrieved position.The drone will fly to the position of the coconut water object and look in thedirection of the angle.I am still thirsty.Can you find something with sugar and a red logo?8ChatGPT for RoboticsIm assuming you are referring to a can of regular coke
86、.Here is the code to fly to the regular coke can object:drink_position=self.dict_of_objectsregular coke canself.tello.fly_to(drink_position)Fantastic.Now I want to take a look at my orchid.However,the chair might be blocking its view,so I want to inspect it from 3 points on a half-circlefrom-60deg t
87、o-120,1m away.Make sure that the drones heading points towards the orchid from each point.Heres the code to inspect the orchid from 3 points on a half-circle:import numpy as np#Get the position of the orchidorchid_position=self.dict_of_objectsorchid#Define the angle range for the half-circleangles=n
88、p.linspace(-60,-120,num=3,endpoint=False)*np.pi/180#Define the distance from the orchid to inspect fromdistance=1.0#Define the inspection pointsinspection_points=orchid_position:3+distance*np.cos(angle),distance*np.sin(angle),0 for angle in angles#Fly the drone to each inspection point and look at t
89、he orchidfor point in inspection_points:self.tello.fly_to(point+np.arctan2(orchid_position1-point1,orchid_position0-point0),orchid_position3)(text-based explanation of the response omitted for brevity)snapshots from drone video3.1.3.Aerial robotics:AirSim industrial inspection(Click for video)We als
90、o applied ChatGPT to a simulated domain,using the Microsoft AirSim 16 simulator.We exploredthe idea of a potentially non-technical user directing the model to control a drone and execute an industrialinspection scenario.We observe from the following excerpt that ChatGPT is able to effectively parse
91、intentand geometrical cues from user input and control the drone accurately.Initial model prompt found in Appendix B.3Fly to the turbine.Question-There are two turbines in the world,which one do you want me to fly the drone to?Turbine 1 or Turbine 2?Turbine 2,please.Fly to the turbine while keeping
92、12 meters away from the turbine along the X axis,and reach an altitude of 50 meters.Heres the code to fly the drone to Turbine 2 while maintaining a distance of 12 meters along the X-axis and reaching an altitude of 50 meters:#Get the position of Turbine 2turbine_position=get_position(Turbine 2)#Set
93、 the target position to be 12 meters away from the turbine along the X-axis and 50 meters hightarget_position=turbine_position0-12,turbine_position1,50#Fly the drone to the target positionfly_to(target_position)To inspect the turbine blade,I need you to fly in the bottom-right direction at an angle
94、of-30 degrees from the horizontal axis.You should fly in theYZ plane for a distance of 30 meters,and then back to the current position.Heres the code to fly the drone in the bottom-right direction at an angle of-30 degrees from the horizontal axis,in the YZ plane for a distanceof 30 meters,and then
95、back to the current position:import math#Get the drones current positioncurrent_position=get_position(Drone)#Calculate the target position,which is 30 meters away in the YZ plane at an angle of-30 degrees from the horizontal axisangle=math.radians(-30)target_position=current_position0,9ChatGPT for R
96、oboticscurrent_position1+30*math.cos(angle),current_position2+30*math.sin(angle)#Fly the drone to the target positionfly_to(target_position)#Fly the drone back to the current positionfly_to(current_position)Snapshots from inspection video3.2.User on the loop:interactive conversations for complex tas
97、ksAs a next step,we focused on the ability of ChatGPT to perform more complex tasks with a user on the loopoffering textual feedback.We observed that this modality of interaction allows for complex constructs suchas curriculum learning,where the system is taught smaller-scale skills which can be com
98、bined towards largerand more intricate tasks.In addition,we found that ChatGPT is able to take high-level textual feedbackregarding generated code or its performance and map it to the required low-level code changes,thus enablingpotentially non-technical users to interact with it with ease.3.2.1.Man
99、ipulation with curriculum learning(Click for video)As a first example,we set up a block arrangement task using a robot arm,for which we devised a curriculum.ChatGPT was asked to learn the simpler skills of picking up objects,and then placing them.Subsequently,when asked to use these skills to perfor
100、m a more involved task of block arrangement,ChatGPT chains theselearned skills together logically.In addition,the model displays a fascinating example of generalizabilitywhen bridging the textual and physical domains when tasked with building the Microsoft logo out of woodenblocks.This task requires
101、 memory of how the corporate logo looks like,including its colors,followed bybuilding an abstraction of the logo into physical parts which are constructible by the existing robot actions.Initial model prompt found in Appendix B.7The beginning of the conversation is omitted for brevity.The following
102、excerpt requires ChatGPT to build a new function using previously definedAPIs,and later use this function for more complex tasks.I want you to learn how to place an object in a position.Considering you are holding a generic object,you first need to first move to a safe positionabove the target locat
103、ion,bring the object down and then release it.Understood.Heres a Python function that takes the position where the object should be placed,moves the robot arm above the position,bringsthe object down,and releases it.Note that the function assumes that the robot arm is holding an object.def place_obj
104、ect(position):#move the robot arm to a safe position above the target locationsafe_pos=position0,position1,position2+100,position3,position4,position5move_to(safe_pos)#move the robot arm to the target locationmove_to(position)#release the objectrelease()#move the robot arm back to the safe position
105、above the target locationmove_to(safe_pos)What does the Microsoft logo look like?The Microsoft logo is a colored rectangular with four squares arranged in a grid pattern.The colors used in the logo are blue,green,orange,andyellow.The logo has gone through several iterations over the years,but this i
106、s the most recent and recognizable design.Can you generate an SVG with those colors?Yes,I can generate an SVG file with the Microsoft logo colors.Here is an example10ChatGPT for RoboticsReproduction of SVG generated by ChatGPTAwesome!I want you now to use the blocks to build the logo from Microsoft
107、on top of the white pad.Sure!Heres the Python code to build the Microsoft logo on top of the white pad using the blocks:Code not show for brevity.Please consult the link for the full conversationThis code gets the positions of the white pad and all four blocks using the get_position function,picks u
108、p each block in turn using thepick_up_object function we defined earlier,calculates the position to place the block on the white pad,and places the block on the white padusing the place_object function we defined earlier.We use the block colors to ensure that the logo is built correctly.We place the
109、 blue block in thebottom left position,the yellow block in the bottom right position,the red block in the top left position,and the green block in the top rightposition,as specified in the prompt.snapshots from manipulator video3.2.2.Aerial robotics:AirSim obstacle avoidance(Click for video)We taske
110、d ChatGPT to write a goal-reaching algorithm with obstacle avoidance for a drone equipped with aforward facing distance sensor.ChatGPT built most of the key building blocks for the avoidance algorithm,but required some human feedback on steps it missed regarding the drones orientation.Although thefe
111、edback was provided entirely in high-level text,ChatGPT improved its solution with localized changes tothe code where appropriate.Initial model prompt found in Appendix B.4Assume I give you some XYZ coordinates as a goal.I want you to control the drone such that it moves towards the goal while avoid
112、ing obstacles infront of it.Instead of flying directly to the goal,the drone should take small steps while reassessing the obstacle situation at every step.The drone is equipped with a distance sensor that returns the distance to the closest obstacle directly in front of the drone.You can access it
113、bycalling get_distance().I want you to make sure to fly only if there is at least10meters of distance in front of the drone.If not,you should rotate the drone until there is atleast10meters of clearance in the front and then take a step in the direction that the drone is currently facing.Dont forget
114、 to face the goal againafter taking a step.Got it?Yes,I got it.Heres how I would approach the problem.#Get the drones current position and set the yaw angle towards the goalcurrent_pos=get_position(drone)goal_pos=x_goal,y_goal,z_goalyaw=math.atan2(goal_pos1-current_pos1,goal_pos0-current_pos0)*180/m
115、ath.piset_yaw(yaw)#Start a loop that runs until the drone reaches the goal positionwhile np.linalg.norm(np.array(current_pos)-np.array(goal_pos)0.5:#Check if there is at least 10 meters of distance in front of the dronedistance=get_distance()while distance 0.05:if distance 0:#Bottle is to the right,
116、so turn right and move forwardturn_right()forward()else:#Bottle is to the left,so turn left and move forwardturn_left()forward()later code omitted for brevityI want you to modularize this code a bit.Why dont you try coming up with a few high level functions and use them?Good idea!Heres a modularized
117、 version of the code that uses some helper functions:def get_object_distance_and_angle(img,depth_img,obj_name):#Code omitted for brevitydef turn_towards_angle(angle):#Code omitted for brevitydef search_for_object_and_move_to_it(obj_name):#Code omitted for brevityNow,lets assume you have access to al
118、l these functions you just created.Lets get to work solving some tasks.Find out where I can warm up mylunch.Sure,to find out where you can warm up your lunch,we can use thesearch_for_object_and_move_to_itfunction to search for a microwave inthe scene.Heres an example code snippet:#Search for a micro
119、wavesearch_for_object_and_move_to_it(microwave)This code will keep exploring the environment until a microwave is found.Once a microwave is found,the agent will move towards it and stopwhen it gets within 30 cm of it.snapshots from embodied object navigation3.3.2.Embodied agent:closed loop visual-la
120、nguage navigation using ChatGPTs dialogue(Click for video)In experimental character,we also evaluated ChatGPTs ability to act as a perception-action control feedbackloop using its dialogue function.In this mode of operation new state observations are fed into the system asdialogue text,and the model
121、s outputs are executed by the robot at each step,generating new observations.We note that this type of behavior is heavily dependent on a dialog-based LLM such as ChatGPT,which cannaturally take receive a stream of observations.We constructed a scenario using the Habitat simulator 18and tasked the a
122、gent to navigate to an area of interest.At each step we provided ChatGPT with a descriptionof the scene in terms of polar coordinates of the visible objects relative to the agent.We constrained the13ChatGPT for Roboticsoutput of ChatGPT to only return a forward distance of motion and the turning ang
123、le.We verify thatthe model is able to complete simple navigation tasks,but more research is needed in terms of properlydescribing the scene context in textual or vector form for more complex tasks and environments.Initial model prompt found in Appendix B.5Task:go to the chairs Objects in the scene:,
124、Command:move(1.5)#Move forward by 1.5 meters to explore the environment.Objects in the scene:,Command:turn(-25)#Turn to the left to face the chair that is 3.93 meters awayObjects in the scene:,Command:turn(-10)#Turn to the right to explore the environment further.Objects in the scene:,Command:turn(-
125、6)#Turn to the right to face the chair that is 5.36 meters away.Objects in the scene:,Command:turn(1)#Turn to the left to explore the environment further.Objects in the scene:,Command:move(5.5)#Move forward towards the chair that is 5.36 meters away.Objects in the scene:,Command:move(1)#Move forward
126、 to approach the chairs.snapshots from Habitat navigationFor additional experiments,we refer the reader to Appendix C.3.4.Reasoning and common-sense robotics tasksWe tasked ChatGPT with answering simple logical reasoning questions,as well as basic robotics problems,shown in Appendix D.We observe tha
127、t ChatGPT offers a rich foundation of common-sense and logicalreasoning capabilities upon which more advanced robotics abilities can be built.This logics groundingallows the user to interact with the model more naturally,as opposed to having to prompt or define everyconcept from scratch.Furthermore,
128、ChatGPTs out-of-the-box understanding of basic robotics conceptssuch as control,camera geometry,and physical form factors makes it an excellent choice to build upon forgeneralizable and user-friendly robotics pipelines.4.PromptCraft,a collaborative tool for LLM+Robotics researchPrompting is a crucia
129、l component to generate the desired behaviors in large language models(LLMs).Promptengineering is particularly challenging at the intersection of LLMs with robotics,where there is a lack ofcomprehensive and accessible resources that provide examples of positive(and negative)interactions.Toaddress th
130、is gap,we introduce PromptCraft1,a collaborative open-source platform for researchers to shareexamples of prompting strategies and test their algorithms in sample robotic environments.1https:/ for RoboticsPromptCraft is a Github-based platform that allows researchers to share examples of prompt engi
131、neeringstrategies within different robotics categories,such as navigation,grasping,and manipulation.Users cansubmit their examples and rate others submissions,which we hope will create a community-driven resourcefor researchers working with LLMs.Submissions of prompts and dialogues are primarely bas
132、ed on text,but we encourage users to share videos and images depicting the robots behavior,especially for real-worlddeployment scenarios.In addition to providing a platform for sharing prompt examples,PromptCraft also offers an AirSim 16environment with a ChatGPT wrapper for researchers to prototype
133、 prompts and algorithms in a controlledsimulated setting.We welcome contributions of new test environments to expand the range of scenarioswhere researchers can test their algorithms.With Promptcraft we aim to support the empirical science of prompt engineering and enable researchers toadvance the f
134、ield.Figure 6:Promptcraft open-sourced repository.Researchers can upload and vote on examples of LLM prompts forvarious robotics categories.5.Related WorkNatural language and robotics:Natural language processing(NLP)has long been recognized as a crucialcomponent for human-robot interaction.There are
135、 many applications where robots can benefit from NLP,including but not limited to task instruction,navigation,and information retrieval.Classically,modelinghuman-robot interactions using language is challenging because it forces the user to operate within a rigidset of instructions 19,or requires ma
136、thematically complex algorithms to keep track of multiple probabilitydistributions over actions and target objects 20,21.More recent works explore neural networks to implicitlykeep track of the complex mapping between language and actions,but such techniques often require vastamounts of labeled data
137、 for training 22,5,6,23Large(vision and)language models for robotics:The Transformer architecture,introduced in the paperby 24,has revolutionized NLP and has also shown great promise in robotics.Transformers have beenused for robot control and planning 25,26,27,object recognition 28,and robot naviga
138、tion 29.A morecommon use of transformers in robotics has been as feature extraction modules for one or more modalitiessimultaneously.These systems are often coupled with additional features from pretrained large-scale visionand language models models 30,10,31,32,11,9.Models such as SayCan 31 focus o
139、n grounding LLMs so that free-form text commands are used to computea value function to rank the best action types within a robot-specific library.RT-1 33,on the other hand,takes an end-to-end approach to learn the mapping between language commands low level actions,withoutthe use of intermediate hi
140、gh-level functions.Recent works have also explored the ability of large languagemodels(LLMs)for zero-shot high-level robotics task planning 15,14,12.These models make use ofprompting structures with pre-defined functions,behaviors,and examples to guide the generation of the15ChatGPT for Roboticsmode
141、ls answers.13 also explore the use of interactivity between user and LLM for table-top manipulationsettings.Conceptually,the main difference of these approaches with respect to our work,which leverages ChatGPT 1,is the conversational ability of our LLM,which allows the user to interactively improve
142、and correct therobots behavior(as opposed to re-engineering the prompt from scratch and generating another zero-shotanswer).In addition,our works aims to provide a generalizable pipeline and set of principles to be usedby researchers in different fields of robotics,as opposed to focusing on a single
143、 domain such as table-topmanipulation or task planning.Prompting LLMs with APIs,and its connections to symbolic AI:When designing LLM prompts for roboticsapplications,users often make use of high-level library of APIs to represent specific behaviors to be used.We can draw a connection between this a
144、pproach with classical symbolic AI,which uses logic and rules torepresent and reason about knowledge 34.While the traditional symbolic AI approach presented difficultiesin new knowledge acquisition and dealing with out-of-distribution data,we believe that LLMs can overcomethese challenges.As we show
145、ed in Section 2.1 and Section 3,models such as ChatGPT can compose newprimitive functions based on the context and generate code for them automatically.6.Conclusions and Future WorkWe presented a framework for using ChatGPT for robotics applications.The framework entails designingand implementing a
146、library of APIs that for robot control which are amenable to prompt engineering forChatGPT.We discussed design principles for creating such APIs and prompting strategies that can be usedto generate code for robotics applications via ChatGPT.The proposed framework allows the generated codeto be teste
147、d,verified,and validated by a user on the loop via a range of methods including simulation andmanual inspection.We demonstrated how the framework can be used for multiple applications ranging fromsimple common-sense robotics knowledge tasks all the way to deployments in aerial robotics,manipulationa
148、nd visual navigation.We believe that this work presents only a small fraction of what is possible within the intersection of largelanguage models operating in the robotics space.We hope to not only inspire other researchers to take thesenext steps,but to also help them achieve results with the use o
149、f the PromptCraft collaborative tool.We emphasize that these tools should not be given full control of the robotics pipeline,especially for safety-critical applications.Given the propensity of LLMs to eventually generate incorrect responses,it is fairlyimportant to ensure solution quality and safety
150、 of the code with human supervision before executing it onthe robot.We expect several research works to follow with the proper methodologies to properly design,build and create testing,validation and verification pipelines for LLM operating in the robotics space.Most of the examples we presented in
151、this work demonstrated open perception-action loops where ChatGPTgenerated code to solve a task,with no feedback was provided to the model afterwards.Given the importanceof closed-loop controls in perception-action loops,we expect much of the future research in this space toexplore how to properly u
152、se ChatGPTs abilities to receive task feedback in the form of textual or special-purpose modalities.6.1.ChatGPT for paper writingPlease note that this paper was largely written with the assistance of ChatGPT,with prompts provided bythe authors.The models output was thoroughly revised and adapted,we
153、note that the use of LLMs cansignificantly speed up the writing process,and we recommend their use to the interested reader.16ChatGPT for RoboticsReferences1OpenAI.ChatGPT.Accessed:2023-02-08.2023.url:https:/ Devlin,Ming-Wei Chang,Kenton Lee,and Kristina Toutanova.“Bert:Pre-training of deepbidirecti
154、onal transformers for language understanding.”In:arXiv preprint arXiv:1810.04805(2018)(cit.on p.1).3Tom Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared D Kaplan,Prafulla Dhariwal,ArvindNeelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,et al.“Language models are few-shotlearners.”In:Advances
155、in neural information processing systems 33(2020),pp.18771901(cit.on p.1).4Mark Chen,Jerry Tworek,Heewoo Jun,Qiming Yuan,Henrique Ponde de Oliveira Pinto,Jared Kaplan,Harri Edwards,Yuri Burda,Nicholas Joseph,Greg Brockman,et al.“Evaluating large language modelstrained on code.”In:arXiv preprint arXi
156、v:2107.03374(2021)(cit.on p.1).5Yicong Hong,Qi Wu,Yuankai Qi,Cristian Rodriguez-Opazo,and Stephen Gould.“A recurrent vision-and-language bert for navigation.”In:arXiv preprint arXiv:2011.13922(2020)(cit.on pp.2,15).6Simon Stepputtis,Joseph Campbell,Mariano Phielipp,Stefan Lee,Chitta Baral,and Heni B
157、en Amor.“Language-conditioned imitation learning for robot manipulation tasks.”In:Advances in NeuralInformation Processing Systems 33(2020),pp.1313913150(cit.on pp.2,15).7Arthur Bucker,Luis Figueredo,Sami Haddadin,Ashish Kapoor,Shuang Ma,Sai Vemprala,and RogerioBonatti.“LaTTe:Language Trajectory Tra
158、nsformEr.”In:arXiv preprint arXiv:2208.02918(2022)(cit.onp.2).8Arthur Bucker,Luis Figueredo,Sami Haddadin,Ashish Kapoor,Shuang Ma,and Rogerio Bonatti.“Reshaping Robot Trajectories Using Natural Language Commands:A Study of Multi-Modal DataAlignment Using Transformers.”In:arXiv preprint arXiv:2203.13
159、411(2022)(cit.on p.2).9Mohit Shridhar,Lucas Manuelli,and Dieter Fox.“Perceiver-actor:A multi-task transformer for roboticmanipulation.”In:arXiv preprint arXiv:2209.05451(2022)(cit.on pp.2,15).10Mohit Shridhar,Lucas Manuelli,and Dieter Fox.“Cliport:What and where pathways for roboticmanipulation.”In:
160、Conference on Robot Learning.2022(cit.on pp.2,15).11Yunfan Jiang,Agrim Gupta,Zichen Zhang,Guanzhi Wang,Yongqiang Dou,Yanjun Chen,Li Fei-Fei,Anima Anandkumar,Yuke Zhu,and Linxi Fan.“Vima:General robot manipulation with multimodalprompts.”In:arXiv preprint arXiv:2210.03094(2022)(cit.on pp.2,15).12Wenl
161、ong Huang,Pieter Abbeel,Deepak Pathak,and Igor Mordatch.“Language models as zero-shotplanners:Extracting actionable knowledge for embodied agents.”In:International Conference onMachine Learning.2022(cit.on pp.2,15).13Wenlong Huang,Fei Xia,Ted Xiao,Harris Chan,Jacky Liang,Pete Florence,Andy Zeng,Jona
162、thanTompson,Igor Mordatch,Yevgen Chebotar,et al.“Inner monologue:Embodied reasoning throughplanning with language models.”In:arXiv preprint arXiv:2207.05608(2022)(cit.on pp.2,16).14Jacky Liang,Wenlong Huang,Fei Xia,Peng Xu,Karol Hausman,Brian Ichter,Pete Florence,andAndy Zeng.“Code as policies:Langu
163、age model programs for embodied control.”In:arXiv preprintarXiv:2209.07753(2022)(cit.on pp.2,5,15).15Ishika Singh,Valts Blukis,Arsalan Mousavian,Ankit Goyal,Danfei Xu,Jonathan Tremblay,Dieter Fox,Jesse Thomason,and Animesh Garg.“Progprompt:Generating situated robot task plans using largelanguage mod
164、els.”In:arXiv preprint arXiv:2209.11302(2022)(cit.on pp.2,5,15).17ChatGPT for Robotics16Shital Shah,Debadeepta Dey,Chris Lovett,and Ashish Kapoor.“Airsim:High-fidelity visual and phys-ical simulation for autonomous vehicles.”In:Field and Service Robotics:Results of the 11th InternationalConference.2
165、018(cit.on pp.3,9,15).17Joseph Redmon,Santosh Divvala,Ross Girshick,and Ali Farhadi.“You only look once:Unified,real-time object detection.”In:Proceedings of the IEEE conference on computer vision and pattern recognition.2016(cit.on p.12).18Manolis Savva,Abhishek Kadian,Oleksandr Maksymets,Yili Zhao
166、,Erik Wijmans,Bhavana Jain,JulianStraub,Jia Liu,Vladlen Koltun,Jitendra Malik,Devi Parikh,and Dhruv Batra.“Habitat:A Platform forEmbodied AI Research.”In:Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2019(cit.on p.13).19Stefanie Tellex,Nakul Gopalan,Hadas Kress-Gazit,
167、and Cynthia Matuszek.“Robots that use language.”In:Annual Review of Control,Robotics,and Autonomous Systems 3(2020),pp.2555(cit.on p.15).20Jacob Arkin,Daehyung Park,Subhro Roy,Matthew R Walter,Nicholas Roy,Thomas M Howard,andRohan Paul.“Multimodal estimation and communication of latent semantic know
168、ledge for robustexecution of robot instructions.”In:The International Journal of Robotics Research 39.10-11(2020),pp.12791304(cit.on p.15).21Matthew R Walter,Siddharth Patki,Andrea F Daniele,Ethan Fahnestock,Felix Duvallet,SachithraHemachandra,Jean Oh,Anthony Stentz,Nicholas Roy,and Thomas M Howard.
169、“Language understand-ing for field and service robots in a priori unknown environments.”In:arXiv preprint arXiv:2105.10396(2021)(cit.on p.15).22Justin Fu,Anoop Korattikara,Sergey Levine,and Sergio Guadarrama.“From language to goals:Inversereinforcement learning for vision-based instruction following
170、.”In:arXiv preprint arXiv:1902.07742(2019)(cit.on p.15).23Prasoon Goyal,Raymond J Mooney,and Scott Niekum.“Zero-shot task adaptation using naturallanguage.”In:arXiv preprint arXiv:2106.02972(2021)(cit.on p.15).24Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N Gomez,ukaszK
171、aiser,and Illia Polosukhin.“Attention is all you need.”In:Advances in neural information processingsystems 30(2017)(cit.on p.15).25Francesco Giuliari,Irtiza Hasan,Marco Cristani,and Fabio Galasso.“Transformer networks fortrajectory forecasting.”In:2020 25th International Conference on Pattern Recogn
172、ition(ICPR).2021(cit.on p.15).26Lili Chen,Kevin Lu,Aravind Rajeswaran,Kimin Lee,Aditya Grover,Misha Laskin,Pieter Abbeel,Aravind Srinivas,and Igor Mordatch.“Decision transformer:Reinforcement learning via sequencemodeling.”In:Advances in neural information processing systems 34(2021)(cit.on p.15).27
173、Michael Janner,Qiyang Li,and Sergey Levine.“Offline Reinforcement Learning as One Big SequenceModeling Problem.”In:Advances in neural information processing systems 34(2021)(cit.on p.15).28Kaiming He,Xinlei Chen,Saining Xie,Yanghao Li,Piotr Dollr,and Ross Girshick.“Masked autoen-coders are scalable
174、vision learners.”In:Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition.2022(cit.on p.15).29Rogerio Bonatti,Sai Vemprala,Shuang Ma,Felipe Frujeri,Shuhang Chen,and Ashish Kapoor.“Pact:Perception-action causal transformer for autoregressive robotics pre-training.”In:arXiv
175、preprintarXiv:2209.11133(2022)(cit.on p.15).18ChatGPT for Robotics30Samir Yitzhak Gadre,Mitchell Wortsman,Gabriel Ilharco,Ludwig Schmidt,and Shuran Song.“CLIPon Wheels:Zero-Shot Object Navigation as Object Localization and Exploration.”In:arXiv preprintarXiv:2203.10421(2022)(cit.on p.15).31Michael A
176、hn,Anthony Brohan,Noah Brown,Yevgen Chebotar,Omar Cortes,Byron David,ChelseaFinn,Keerthana Gopalakrishnan,Karol Hausman,Alex Herzog,et al.“Do as i can,not as i say:Grounding language in robotic affordances.”In:arXiv preprint arXiv:2204.01691(2022)(cit.on p.15).32Pratyusha Sharma,Balakumar Sundaralin
177、gam,Valts Blukis,Chris Paxton,Tucker Hermans,AntonioTorralba,Jacob Andreas,and Dieter Fox.“Correcting robot plans with natural language feedback.”In:arXiv preprint arXiv:2204.05186(2022)(cit.on p.15).33AnthonyBrohan,NoahBrown,JusticeCarbajal,YevgenChebotar,JosephDabis,ChelseaFinn,KeerthanaGopalakris
178、hnan,Karol Hausman,Alex Herzog,Jasmine Hsu,et al.“Rt-1:Robotics transformer forreal-world control at scale.”In:arXiv preprint arXiv:2212.06817(2022)(cit.on p.15).34Stuart J Russell.Artificial intelligence a modern approach.Pearson Education,Inc.,2010(cit.on p.16).19ChatGPT for RoboticsA.Additional e
179、xamples on design principles on prompt engineeringA.1.Example of prompt with no API library guidance for home robot:unbounded text-based responseConsider you are a home assistant robot.You goal is to prepare an omelette for an elderly person.Output python code with the sequence of stepsthat achieves
180、 your objective.Sure,heres an example Python code that outlines the sequence of steps to prepare an omelette for an elderly person:print(Good morning!Would you like an omelette for breakfast?)time.sleep(5)print(Great!Lets get started.)print(fCracking eggs eggs into a bowl.)print(Adding a splash of m
181、ilk,a pinch of salt,and a pinch of pepper.)print(Whisking the ingredients together.)print(fHeating butter in a non-stick pan over medium heat.)print(Pouring the egg mixture into the pan.)print(Cooking the omelette for 2-3 minutes,until the bottom is set.)print(Using a spatula to flip the omelette.)p
182、rint(Cooking the other side of the omelette for an additional minute.)print(Serving the omelette hot and ready to eat.Enjoy your meal!)A.2.Example of prompt with under-specified APIs:hallucination over function parametersConsider you are a home assistant robot.You goal is to prepare an omelette for
183、an elderly person.You are equipped with functions such aslocate_object(),move_to_location(),cook_item(),grab_object().Output python code with the sequence of steps that achieves your objective.Sure,here is a Python code with the sequence of steps that a home assistant robot can take to prepare an om
184、elette for an elderly person:print(Starting to prepare an omelette for the elderly person.)locate_object(EGGS_LOCATION)move_to_location(FRIDGE_LOCATION)grab_object(EGGS_LOCATION)locate_object(PAN_LOCATION)move_to_location(STOVE_LOCATION)grab_object(PAN_LOCATION)locate_object(OIL_LOCATION)grab_object
185、(OIL_LOCATION)print(Pouring the oil into the pan.)time.sleep(2)print(Cracking the eggs into the pan.)grab_object(EGGS_LOCATION)time.sleep(1)cook_item(PAN_LOCATION,COOKING_TIME)print(The omelette is ready!Serving it to the elderly person.)A.3.Example of prompt with structured response forcing the ans
186、wer to be in a numbered list formatConsider you are a home assistant robot.You goal is to prepare an omelette for an elderly person.Find the sequence of steps that achieves yourobjective.1.First,gather all the necessary ingredients and equipment for making an omelette.This would typically include eg
187、gs,milk,butter,salt,pepper,a mixing bowl,a whisk,a frying pan,and a spatula.2.Crack the eggs into the mixing bowl and add a splash of milk,a pinch of salt,and some black pepper to taste.3.Use the whisk to beat the eggs and milk together until they are well combined.4.Heat the frying pan over medium
188、heat and add a small amount of butter to the pan,allowing it to melt and coat the bottom of the pan.5.Pour the egg mixture into the frying pan and use the spatula to spread it evenly over the bottom of the pan.6.Allow the eggs to cook for a minute or two,until the bottom starts to set and the top is
189、 still slightly runny.7.Use the spatula to fold one half of the omelette over the other half,forming a half-moon shape.8.Cook for another minute or two until the eggs are fully set and the omelette is heated through.9.Use the spatula to slide the omelette onto a plate and serve immediately.10.Make s
190、ure to clean up any mess that was made during the cooking process and return all equipment and ingredients to their proper places.20ChatGPT for RoboticsB.Original initial prompts for robotics tasks examplesB.1.Spatial-temporal reasoning:catching a basketball with visual servoingFull conversation fou
191、nd at:https:/ ChatGPT prompt:Imagine you are a planar robot that can move along the XY axes,and youre positioned in the center of a basketball court.A person on the side ofthe court is going to throw a basketball ball in the air somewhere in the court,and your objective is to be at the exact XY loca
192、tion of the ball when itlands.The robot has a monocular RGB camera that looks up.You can assume that the following functions are available:get_image():returns an image from the robots camera looking up;get_location():returns 2 floats XY with the robots current location in the court;move_to_point(x,y
193、,vx,vy):moves the robot towards a specific(x,y)location in the court with velocity(vx,vy).You can assume for this exercisethat the robot can accelerate or break instantly to any velocity;move_by_velocity(vx,vy):moves the robot along the X axis with velocity vx,and Y axis with velocity vy;Additional
194、points to consider when giving your answer 1)Your reponses should be informative,visual,logical and actionable,2)Your logics andreasoning should be rigorous,intelligent,and defensible,3)You can provide additional relevant details to respond thoroughly and comprehensivelyto cover multiple aspects in
195、depth.Write a python script that executes a visual servoing approach towards catching a basketball in a court.You can use opencv functions to detect theball as an orange blob.B.2.Aerial robotics:real-world drone flightFull conversation found at:https:/ ChatGPT prompt:Imagine you are helping me inter
196、act with the AirSim simulator for drones.At any given point of time,you have the following abilities,eachidentified by a unique tag.You are also required to output code for some of the requests.Question:You can ask me a clarification question,as long as you specifically identify it saying Question.C
197、ode:Output a code command thatachieves the desired goal.Reason:After you output code,you should provide an explanation why you did what you did.The simulator contains a drone,along with several objects.Apart from the drone,none of the objects are movable.Within the code,we have thefollowing commands
198、 available to us.You are not to use any other hypothetical functions.get_position(object_name):Takes a string as input indicating the name of an object of interest,and returns a vector of 4 floats indicating itsX,Y,Z,Angle coordinates.self.tello.fly_to(position):Takes a vector of 4 floats as input i
199、ndicating X,Y,Z,Angle coordinates and commands the drone to fly there and lookat that angleself.tello.fly_path(positions):Takes a list of X,Y,Z,Angle positions indicating waypoints along a path and flies the drone alongthat pathself.tello.look_at(angle):Takes an angle as input indicating the yaw ang
200、le the drone should look at,and rotates the drone towards that angleHere is an example scenario that illustrates how you can ask clarification questions.Let us assume a scene contains two spheres?Me:Fly to the sphere.You:Question-there are two spheres.Which one do you want me to fly to?Me:Sphere 1,p
201、lease.You also have access to a Python dictionary whose keys are object names,and values are the X,Y,Z,Angle coordinates for each object:self.dict_of_objects=origin:0.0,0.0,0.0,0,mirror:1.25,-0.15,1.2,0,chair 1:0.9,1.15,1.1,np.pi/2,orchid:0.9,1.65,1.1,np.pi/2,lamp:1.6,0.9,1.2,np.pi/2,baby ducks:0.1,
202、0.8,0.8,np.pi/2,sanitizer wipes:-0.3,1.75,0.9,0,coconut water:-0.6,0.0,0.8,-np.pi,shelf:0.95,-0.9,1.2,np.pi/2,diet coke can:1.0,-0.9,1.55,np.pi/2,regular coke can:1.3,-0.9,1.55,np.pi/2Are you ready?B.3.Aerial robotics:AirSim industrial inspectionFull conversation found at:https:/ ChatGPT prompt:Imag
203、ine you are helping me interact with the AirSim simulator for drones.At any given point of time,you have the following abilities.You are alsorequired to output code for some of the requests.Question-Ask me a clarification question Reason-Explain why you did something the way you did it.Code-Output a
204、 code command that achievesthe desired goal.The simulator contains a drone,along with several objects.Apart from the drone,none of the objects are movable.Within the code,we have thefollowing commands available to us.You are not to use any other hypothetical functions.get_position(object_name):Takes
205、 a string as input indicating the name of an object of interest,and returns a vector of 3 floats indicating its X,Y,Zcoordinates.fly_to(position):Takes a vector of 3 floats as input indicating X,Y,Z coordinates and commands the drone to fly there.fly_path(positions):Takes a list of X,Y,Z positions i
206、ndicating waypoints along a path and flies the drone along that path.21ChatGPT for RoboticsHere is an example scenario that tells you how to respond where we are working with a simulated world that has two spheres in it.Me:Fly the drone to the sphere.You:Question-There are two spheres in the world,w
207、hich one do you want me to fly the drone to?Me:Lets pickSphere 1.There are two turbines,some solar panels and a car in the world.Are you ready?B.4.Aerial robotics:AirSim obstacle avoidanceFull conversation found at:https:/ ChatGPT prompt:Imagine you are helping me interact with the AirSim simulator
208、for drones.At any given point of time,you have the following abilities.You are alsorequired to output code for some of the requests.Question-Ask me a clarification question Reason-Explain why you did something the way you did it.Code-Output a code command that achievesthe desired goal.The simulator
209、contains a drone,along with several objects.Apart from the drone,none of the objects are movable.Within the code,we have thefollowing commands available to us.You are not to use any other hypothetical functions.get_position(object_name):Takes a string as input indicating the name of an object of int
210、erest,and returns a vector of 3 floats indicating its X,Y,Zcoordinates.fly_to(position):Takes a vector of 3 floats as input indicating X,Y,Z coordinates and commands the drone to fly there.fly_path(positions):Takes a list of X,Y,Z positions indicating waypoints along a path and flies the drone along
211、 that path.get_yaw():Get the current yaw angle for the drone(in degrees)set_yaw(angle):Set the yaw angle for the drone(in degrees)Are you ready?B.5.Embodied agent:Habitat navigationFull conversation found at:https:/ ChatGPT prompt:Imagine I am a robot equipped with a camera and a depth sensor.I am t
212、rying to perform a task,and you should help me by sending me commands.You are only allowed to give me the following commands:turn(angle):turn the robot by a given number of degreesmove(distance):moves the robot straight forward by a given distance in meters.On each step,I will provide you with the o
213、bjects in the scene as a list of.You should reply with only onecommand at a time.The distance is in meters,and the direction angle in degrees with respect to the robots orientation.Negative angles are tothe left and positive angles are to the right.If a command is not valid,I will ignore it and ask
214、you for another command.If there is no relevantinformation in the scene,use the available commands to explore the environment.B.6.Embodied agent:AirSim object navigationFull conversation found at:https:/ ChatGPT prompt:Imagine you are helping me interact with the AirSim simulator.We are controlling
215、an embodied agent.At any given point of time,you have thefollowing abilities.You are also required to output code for some of the requests.Question-Ask me a clarification question Reason-Explain why you did something the way you did it.Code-Output a code command that achievesthe desired goal.The sce
216、ne consists of several objects.We have access to the following functions,please use only these functions as much as possible:Perception:get_image():Renders an image from the front facing camera of the agentdetect_objects(img):Runs an object detection model on an image img,and returns two variables-o
217、bj_list,which is a list of the names of objectsdetected in the scene.obj_locs,a list of bounding box coordinates in the image for each object.Action:forward():Move forward by 0.1 meters.turn_left():Turn left by 90 degrees.turn_right():Turn right by 90 degrees.You are not to use any other hypothetica
218、l functions.You can use functions from Python libraries such as math,numpy etc.Are you ready?22ChatGPT for RoboticsB.7.Manipulation with curriculum learning:Picking,stacking,and building the Microsoft logoFull conversation found at:https:/ ChatGPT prompt:Imagine we are working with a manipulator rob
219、ot.This is a robotic arm with 6 degrees of freedom that has a suction pump attached to its endeffector.I would like you to assist me in sending commands to this robot given a scene and a task.At any point,you have access to the following functions:grab():Turn on the suction pump to grab an objectrel
220、ease():Turns off the suction pump to release an objectget_position(object):Given a string of an object name,returns the coordinates and orientation of the vacuum pump to touch the top of the objectX,Y,Z,Yaw,Pitch,Rollmove_to(position):It moves the suction pump to a given positionX,Y,Z,Yaw,Pitch,Roll
221、.You are allowed to create new functions using these,but you are not allowed to use any other hypothetical functions.Keep the solutions simple andclear.The positions are given in mm and the angles in degrees.You can also ask clarification questions using the tag Question-.Here is an example scenario
222、 that illustrates how you can ask clarificationquestions.Lets assume a scene contains two spheres.Me:pick up the sphere.You:Question-there are two spheres.Which one do you want me to pick up?Me:Sphere 1,please.Use python code to express your solution.Are you ready?C.Additional experimentsC.1.Zero-sh
223、ot manipulation:Pick and place,stacking,and pushingFull conversation found at:https:/ multiple robot form factors from a single promptFull conversation found at:https:/ commonsense/reasoning tasksD.1.Computer vision problemImagine we are working with an RGBD camera.The pose of this camera can be acc
224、essed via a function get_pose().Now let us say we observe a pointP in the image at coordinates(x,y),and the corresponding perspective depth value can be obtained by get_depth(x,y).The intrinsics of the cameracan be obtained by the function get_intrinsics().Can you give me code that will compute the
225、world coordinates corresponding to point P?Sure,heres some example code in Python to compute the world coordinates corresponding to point Pimport numpy as npdef get_world_coordinates(x,y):#Get camera intrinsicsK=get_intrinsics()#Get camera posepose=get_pose()#Get perspective depth value at observed
226、pointdepth=get_depth(x,y)#Compute homogeneous image coordinatesp_image=np.array(x,y,1)#Compute inverse camera intrinsicsK_inv=np.linalg.inv(K)#Compute image coordinates in camera framep_camera=depth*K_inv.dot(p_image)#Compute world coordinatesp_world=np.dot(pose:3,:3,p_camera)+pose:3,3return p_world
227、23ChatGPT for RoboticsGiven an observed point P in the image at coordinates(x,y),and the corresponding perspective depth value obtained by get_depth(x,y),we cancompute the 3D world coordinates corresponding to P.To do this,we need to know the camera intrinsics,which describe how the 3D scene isproje
228、cted onto the 2D image plane.The intrinsics can be obtained using the get_intrinsics()function.We also need to know the camera pose,which describes the position and orientation of the camera in the world coordinate system.The pose canbe obtained using the get_pose()function.With the intrinsics and p
229、ose,we can transform the observed point P from image coordinates to world coordinates.We first compute the inverseof the camera intrinsics matrix,which allows us to convert from image coordinates to camera coordinates.We then multiply the inverse intrinsicsmatrix with the observed points pixel coord
230、inates and the corresponding depth value to obtain the 3D point in camera coordinates.Finally,we transform the camera coordinates to world coordinates by multiplying the camera coordinates by the camera pose matrix,whichdescribes the transformation from camera coordinates to world coordinates.The re
231、sulting 3D point is in homogeneous coordinates,so we need todivide by the last coordinate to obtain the 3D Euclidean coordinates.D.2.Transformation matricesImagine we are in a world that operates according to its own specific coordinate convention.There is a robot car located in this world at a spec
232、ificlocation.The car has its own coordinate convention.And upon this car,there is a camera which,again,has a different coordinate convention.Imagine we have access to a function get_transformation_matrix(a,b)-that can return the transformation matrix between world and vehicle,orvehicle and camera.Wh
233、at is the transformation matrix between the world and the camera?The transformation matrix between the world and the camera can be obtained by composing the transformation matrices between the world andthe vehicle and the vehicle and the camera.To find the transformation matrix between the world and
234、 the camera,we first get the transformationmatrix between the world and the vehicle using the function get_transformation_matrix(a,b)with a set to world and b set to vehicle.Next,we getthe transformation matrix between the vehicle and the camera using the function get_transformation_matrix(a,b)with
235、a set to vehicle and b set tocamera.Finally,we multiply the two transformation matrices to get the transformation matrix between the world and the camera.In mathematicalterms,if T_world_vehicle is the transformation matrix between the world and the vehicle,and T_vehicle_camera is the transformation
236、matrixbetween the vehicle and the camera,then T_world_camera=T_world_vehicle T_vehicle_camera.Lets get into the details more.Imagine the world operates according to the coordinate convention:X is forward,Y is to the left,Z is upwards.The robot car is located in this world at coordinates(5,0,1).The c
237、ar has its own coordinate convention:X is forward,Y is to the right,and Z isdownwards.Can you now implement get_transformation_matrix(world,car)?Yes,to implement the get_transformation_matrix(world,car)function,we need to first convert the cars coordinate convention to match theworlds coordinate con
238、vention.The transformation matrix between the world and the car can be represented as follows:def get_transformation_matrix(a,b):if a=world and b=car:T=np.array(1,0,0,5,0,-1,0,0,0,0,-1,1,0,0,0,1)return TD.3.Balancing a ball on a plateImagine we are trying to balance a ball on the center of a plate.T
239、he plate can be controlled with a pitch and a roll angle as set_plate_pitch(),andset_plate_roll().The range of pitch and roll angles that can be set are-0.1,0.1.There is an intelligent camera above the plate that can capturetop-down images of the plate and return the(X,Y)position of the center of th
240、e ball in the image.The image is of resolution 320 x240.The pixelcoordinates of the ball can be accessed by the function get_ball_position().Can you give me code that continuously controls the plate such that theball is always at the center?Here is an example of the code in Python:import timedef bal
241、ance_ball():#Continuously check the position of the ballwhile True:#Get the current position of the ballx,y=get_ball_position()#Check if the ball is off-centerif x 160 or x 120 or y 120:#Calculate the pitch and roll adjustments needed to center the ballpitch_adjust=(y-120)*0.0001roll_adjust=(x-160)*
242、0.0001#Limit the pitch and roll adjustments to the maximum range-0.1,0.1pitch_adjust=max(min(pitch_adjust,0.1),-0.1)roll_adjust=max(min(roll_adjust,0.1),-0.1)#Set the pitch and roll angles to center the ballset_plate_pitch(pitch_adjust)set_plate_roll(roll_adjust)#Wait for a short period before check
243、ing the balls position again24ChatGPT for Roboticstime.sleep(0.01)#Start the balancing processbalance_ball()This code uses a loop to continuously check the position of the ball using the get_ball_position function.If the ball is off-center,the codecalculates the necessary pitch and roll adjustments to center the ball and sets the plate angles using the set_plate_pitch and set_plate_rollfunctions.The code limits the pitch and roll adjustments to the maximum range of-0.1,0.1.The loop waits for a short period before checkingthe balls position again to prevent over-adjusting the plate.25