《OpenAI:大型語言模型對勞動力市場影響潛力的初步研究(英文版)(35頁).pdf》由會員分享,可在線閱讀,更多相關《OpenAI:大型語言模型對勞動力市場影響潛力的初步研究(英文版)(35頁).pdf(35頁珍藏版)》請在三個皮匠報告上搜索。
1、WORKING PAPERGPTs are GPTs:An Early Look at the Labor Market ImpactPotential of Large Language ModelsTyna Eloundou1,Sam Manning1,2,Pamela Mishkin1,and Daniel Rock31OpenAI2OpenResearch3University of PennsylvaniaMarch 27,2023AbstractWe investigate the potential implications of large language models(LL
2、Ms),such as Generative Pre-trained Transformers(GPTs),on the U.S.labor market,focusing on the increased capabilities arising fromLLM-powered software compared to LLMs on their own.Using a new rubric,we assess occupations basedon their alignment with LLM capabilities,integrating both human expertise
3、and GPT-4 classifications.Our findings reveal that around 80%of the U.S.workforce could have at least 10%of their work tasksaffected by the introduction of LLMs,while approximately 19%of workers may see at least 50%of theirtasks impacted.We do not make predictions about the development or adoption t
4、imeline of such LLMs.The projected effects span all wage levels,with higher-income jobs potentially facing greater exposure toLLM capabilities and LLM-powered software.Significantly,these impacts are not restricted to industrieswith higher recent productivity growth.Our analysis suggests that,with a
5、ccess to an LLM,about 15%of all worker tasks in the US could be completed significantly faster at the same level of quality.Whenincorporating software and tooling built on top of LLMs,this share increases to between 47 and 56%of all tasks.This finding implies that LLM-powered software will have a su
6、bstantial effect on scalingthe economic impacts of the underlying models.We conclude that LLMs such as GPTs exhibit traits ofgeneral-purpose technologies,indicating that they could have considerable economic,social,and policyimplications.1IntroductionAs shown in Figure 1,recent years,months,and week
7、s have seen remarkable progress in the field of generativeAI and large language models(LLMs).While the public often associates LLMs with various iterations of theGenerative Pre-trained Transformer(GPT),LLMs can be trained using a range of architectures,and are notlimited to transformer-based models(
8、Devlin et al.,2019).LLMs can process and produce various forms ofsequential data,including assembly language,protein sequences and chess games,extending beyond naturallanguage applications alone.In this paper,we use LLMs and GPTs somewhat interchangeably,and specify inour rubric that these should be
9、 considered similar to the GPT-family of models available via ChatGPT orthe OpenAI Playground(which at the time of labeling included models in the GPT-3.5 family but not in theGPT-4 family).We examine LLMs with text-and code-generating abilities,use the term generative AI toadditionally include moda
10、lities such as images or audio,and use LLM-powered software to cover tools builton top of LLMs or that combine LLMs with other generative AI models.Corresponding author().Authors contributed equally and are listed alphabetically.arXiv:2303.10130v4 econ.GN 23 Mar 2023WORKING PAPERFigure 1:To get a se
11、nse of how quickly model capabilities are progressing consider the jump in examperformance between GPT-3.5 and GPT-4(OpenAI,2023b).Our study is motivated less by the progress of these models alone though,and more by the breadth,scale,and capabilities weve seen in the complementary technologies devel
12、oped around them.The role ofcomplementary technologies remains to be seen,but maximizing the impact of LLMs appears contingenton integrating them with larger systems(Bresnahan,2019;Agrawal et al.,2021).While the focus of ourdiscussion is primarily on the generative capabilities of LLMs,it is importa
13、nt to note that these models canalso be utilized for various tasks beyond text generation.For example,embeddings from LLMs can be usedfor custom search applications,and LLMs can perform tasks such as summarization and classification wherethe context may be largely contained in the prompt.To compleme
14、nt predictions of technologys impacts on work and provide a framework for understandingthe evolving landscape of language models and their associated technologies,we propose a new rubricfor assessing LLM capabilities and their potential effects on jobs.This rubric(A.1)measures the overallexposure of
15、 tasks to LLMs,following the spirit of prior work on quantifying exposure to machine learning(Brynjolfsson et al.,2018;Felten et al.,2018;Webb,2020).We define exposure as a proxy for potentialeconomic impact without distinguishing between labor-augmenting or labor-displacing effects.We employhuman a
16、nnotators and GPT-4 itself as a classifier to apply this rubric to occupational data in the U.S.economy,primarily sourced from the O*NET database.12To construct our primary exposure dataset,we collected both human annotations and GPT-4 classifications,using a prompt tuned for agreement with a sample
17、 of labels from the authors.We observe similar agreementlevels in GPT-4 responses and between human and machine evaluations,when aggregated to the task level.1This is distinct from recent social science research that makes use of LLMs to simulate human behavior(Horton,2023;Sorensenet al.,2022)2While
18、 our exposure rubric does not necessarily tie the concept of language models to any particular model,we were stronglymotivated by our observed capabilities of GPT-4 and the suite of capabilities we saw in development with OpenAIs launch partners(OpenAI,2023b).WORKING PAPERThis exposure measure refle
19、cts an estimate of the technical capacity to make human labor more efficient;however,social,economic,regulatory,and other determinants imply that technical feasibility does notguarantee labor productivity or automation outcomes.Our analysis indicates that approximately 19%of jobshave at least 50%of
20、their tasks exposed when considering both current model capabilities and anticipatedtools built upon them.Human assessments suggest that only 3%of U.S.workers have over half of their tasksexposed to LLMs when considering existing language and code capabilities without additional software ormodalitie
21、s.Accounting for other generative models and complementary technologies,our human estimatesindicate that up to 49%of workers could have half or more of their tasks exposed to LLMs.Our findings consistently show across both human and GPT-4 annotations that most occupations exhibitsome degree of expos
22、ure to LLMs,with varying exposure levels across different types of work.Occupationswith higher wages generally present with higher exposure,a result contrary to similar evaluations of overallexposure to machine learning(Brynjolfsson et al.,2023).When regressing exposure measures on skillsetsusing O*
23、NETs skill rubric,we discover that roles heavily reliant on science and critical thinking skills showa negative correlation with exposure,while programming and writing skills are positively associated withLLM exposure.Following Autor et al.(2022a),we examine barriers to entry by Job Zones and find t
24、hatoccupational exposure to LLMs weakly increases with the difficulty of job preparation.In other words,workers facing higher(lower)barriers to entry in their jobs tend to experience more(less)exposure to LLMs.We further compare our measurements to previous efforts documenting the distribution of au
25、tomationexposure in the economy and find broadly consistent results.Most other technology exposure measures weexamine are statistically significantly correlated with our preferred exposure measure,while measures ofmanual routineness and robotics exposure show negative correlations.The variance expla
26、ined by these earlierefforts(Acemoglu and Autor,2011a;Frey and Osborne,2017;Brynjolfsson et al.,2018;Felten et al.,2018;Webb,2020;Brynjolfsson et al.,2023),along with wage controls,ranges from 60 to 72%,indicating that 28to 40%of the variation in our AI exposure measure remains unaccounted for by pr
27、evious technology exposuremeasurements.We analyze exposure by industry and discover that information processing industries(4-digit NAICS)exhibit high exposure,while manufacturing,agriculture,and mining demonstrate lower exposure.TheconnectionbetweenproductivitygrowthinthepastdecadeandoverallLLMexpos
28、ureappearsweak,suggestinga potential optimistic case that future productivity gains from LLMs may not exacerbate possible cost diseaseeffects(Baumol,2012;Aghion et al.,2018).3Our analysis indicates that the impacts of LLMs like GPT-4,are likely to be pervasive.While LLMshave consistently improved in
29、 capabilities over time,their growing economic effect is expected to persist andincrease even if we halt the development of new capabilities today.We also find that the potential impact ofLLMs expands significantly when we take into account the development of complementary technologies.Collectively,
30、these characteristics imply that Generative Pre-trained Transformers(GPTs)are general-purposetechnologies(GPTs).4(Bresnahan and Trajtenberg,1995;Lipsey et al.,2005).(Goldfarb et al.,2023)argue that machine learning as a broad category is likely a general-purposetechnology.Our evidence supports a wid
31、er impact,as even subsets of machine learning software meet thecriteria for general-purpose technology status independently.This papers primary contributions are to providea set of measurements of LLM impact potential and to demonstrate the use case of applying LLMs to developsuch measurements effic
32、iently and at scale.Additionally,we showcase the general-purpose potential of LLMs.If GPTs are GPTs,the eventual trajectory of LLM development and application may be challenging forpolicymakers to predict and regulate.As with other general-purpose technologies,much of these algorithms3Baumols cost d
33、isease is a theory that explains why the cost of labor-intensive services,such as healthcare and education,increases over time.This happens because wages for skilled workers in other industries increase,but there is no correspondingincrease in productivity or efficiency in these service industries.T
34、herefore,the cost of labor in these industries becomes relativelymore expensive compared to other goods and services in the economy.4For the remainder of the paper we spell out general-purpose technologies when it is used outside of stating GPTs are GPTs.WORKING PAPERpotential will emerge across a b
35、road range of economically valuable use cases,including the creation of newtypes of work(Acemoglu and Restrepo,2018;Autor et al.,2022a).Our research serves to measure what istechnically feasible now,but necessarily will miss the evolving impact potential of the LLMs over time.The paper is structured
36、 as follows:Section 2 reviews relevant prior work,Section 3 discusses methodsand data collection,Section 4 presents summary statistics and results,Section 5 relates our measurements toearlier efforts,Section 6 discusses the results,and Section 7 offers concluding remarks.2Literature Review2.1The Adv
37、ancement of Large Language ModelsIn recent years,generative AI models have gained significant attention from both the artificial intelligence(AI)research community and the general public,due to their ability to tackle a wide array of complexlanguage-based tasks.The progress in these models abilities
38、 has been fueled by multiple factors,includingincreased model parameter count,greater training data volume,and enhanced training configurations(Brownet al.,2020;Radford et al.,2019;Hernandez et al.,2021;Kaplan et al.,2020).Broad,state-of-the-art LLMs,such as LaMDA(Thoppilan et al.,2022)and GPT-4(Ope
39、nAI,2023b),excel in diverse applications liketranslation,classification,creative writing,and code generationcapabilities that previously demandedspecialized,task-specific models developed by expert engineers using domain-specific data.Concurrently,researchers have improved the steerability,reliabili
40、ty,and utility of these models usingmethods like fine-tuning and reinforcement learning with human feedback(Ouyang et al.,2022;Bai et al.,2022).These advancements enhance the models ability to discern user intent,rendering them moreuser-friendly and practical.Moreover,recent studies reveal the poten
41、tial of LLMs to program and controlother digital tools,such as APIs,search engines,and even other generative AI systems(Schick et al.,2023;Mialon et al.,2023;Chase,2022).This enables seamless integration of individual components for betterutility,performance,and generalization.At their limit,these t
42、rends suggest a world where LLMs may becapable of executing any task typically performed at a computer.Generative AI models have mostly been deployed as modular specialists,performing specific tasks such asgeneratingimagesfromcaptionsortranscribingtextfromspeech.However,wearguethatitisessentialtovie
43、wLLMs as versatile building blocks for creating additional tools.Developing these tools and integrating theminto systems will require time and possibly significant reconfiguration of existing processes across variousindustries.Nevertheless,we are already witnessing emerging adoption trends.Despite t
44、heir limitations,LLMs are increasingly being integrated into specialized applications in fields like writing assistance,coding,and legal research.These specialized applications then allow businesses and individuals to adopt LLMs intotheir workflows.We emphasize the significance of these complementar
45、y technologies,partly because out-of-the-boxgeneral-purposeLLMsmaycontinuetobeunreliableforvarioustasksduetoissuessuchasfactualinaccuracies,inherent biases,privacy concerns,and disinformation risks(Abid et al.,2021;Schramowski et al.,2022;Goldstein et al.,2023;OpenAI,2023a).However,specialized workf
46、lowsincluding tooling,software,orhuman-in-the-loop systemscan help address these shortcomings by incorporating domain-specific expertise.For example,Casetext offers LLM-based legal research tools that provide lawyers with quicker and moreaccurate legal research results,utilizing embeddings and summa
47、rization to counter the risk that GPT-4 couldprovide inaccurate details about a legal case or set of documents.GitHub Copilot is a coding assistant thatemploys LLMs to generate code snippets and auto-complete code,which users can then accept or reject basedon their expertise.In other words,while its
48、 true that on its own GPT-4 does not know what time it is,itseasy enough to give it a watch.Furthermore,a positive feedback loop may emerge as LLMs surpass a specific performance threshold,allowing them to assist in building the very tooling that enhances their usefulness and usability across variou
49、sWORKING PAPERcontexts.This could lower the cost and engineering expertise required to create such tools,potentiallyaccelerating LLM adoption and integration even further(Chen et al.,2021;Peng et al.,2023).LLMs can alsobecome valuable assets in machine learning model developmentserving as coding ass
50、istants for researchers,data labeling services,or synthetic data generators.There is potential for such models to contribute toeconomic decision-making at the task level,for instance,by refining methods for task and sub-task allocationbetween humans and machines(Singla et al.,2015;Shahaf and Horvitz
51、,2010).As LLMs advance over timeand better align with user preferences,we can anticipate continuous improvement in performance.However,itis essential to recognize that these trends also bring a variety of serious risks.(Khlaaf et al.,2022;Weidingeret al.,2022;Solaiman et al.,2019)2.2The Economic Imp
52、acts of Automation TechnologiesA large and growing body of literature addresses the labor market impacts of AI and automation technologies.The concept of skill-biased technological change and the task model of automationoften consideredthe standard framework for understanding technologys influence o
53、n labororiginated from researchdemonstrating that technological progress raises the demand for skilled workers over unskilled workers(Katzand Murphy,1992).Numerous studies have built upon this concept,exploring the effects of technologicalchange and automation on workers within a task-based framewor
54、k(Autor et al.,2003;Acemoglu and Autor,2011b;Acemoglu and Restrepo,2018).This strand of research has shown that workers involved in routine andrepetitive tasks are at a higher risk of technology-driven displacement,a phenomenon known as routine-biasedtechnological change.More recent studies have dis
55、tinguished between technologys task-displacement andtask-reinstatement effects(where new technology increases the need for a wider array of labor-intensive tasks)(Acemoglu and Restrepo,2018,2019).Several studies have shown that automation technologies have resultedin wage inequality in the US,driven
56、 by relative wage declines for workers specializing in routine tasks(Autoret al.,2006;Van Reenen,2011;Acemoglu and Restrepo,2022b).Prior research has employed various approaches to estimate the overlap between AI capabilities andthe tasks and activities workers undertake in different occupations.The
57、se methods include mapping patentdescriptions to worker task descriptions(Webb,2020;Meindl et al.,2021),linking AI capabilities tooccupational abilities documented in the O*NET database(Felten et al.,2018,2023),aligning AI taskbenchmark evaluations with worker tasks via cognitive abilities(Tolan et
58、al.,2021),labeling automationpotential for a subset of US occupations and using machine learning classifiers to estimate this potential forall other US occupations(Frey and Osborne,2017),modeling task-level automation and aggregating theresults to occupation-level insights(Arntz et al.,2017),collect
59、ing expert forecasts(Grace et al.,2018),andmost relevantly to this paper,devising a new rubric to assess worker activities for their suitability for machinelearning(Brynjolfsson et al.,2018,2023).Some of these approaches have found exposure to AI technologiesat the task-level tends to be diversified
60、 within occupation.Considering each job as a bundle of tasks,it wouldbe rare to find any occupation for which AI tools could do nearly all of the work.(Autor et al.,2022a)finds aswell that automation and augmentation exposures tend to be positively correlated.There is also a growing setof studies ex
61、amining specific economic impacts and opportunities for LLMs(Bommasani et al.,2021;Feltenet al.,2023;Korinek,2023;Mollick and Mollick,2022;Noy and Zhang,2023;Peng et al.,2023).Alongsidethis work,our measurements help characterize the broader potential relevance of language models to thelabor market.
62、General-purpose technologies(e.g.printing,the steam engine)are characterized by widespread prolifera-tion,continuous improvement,and the generation of complementary innovations(Bresnahan and Trajtenberg,1995;Lipsey et al.,2005).Their far-reaching consequences,which unfold over decades,are difficult
63、toanticipate,particularly in relation to labor demand(Bessen,2018;Korinek and Stiglitz,2018;Acemoglu et al.,2020;Benzell et al.,2021).The realization of general purpose technologies full potential requires extensiveco-invention(Bresnahan and Trajtenberg,1995;Bresnahan et al.,1996,2002;Lipsey et al.,
64、2005;Dixon et al.,WORKING PAPER2021),a costly and time-consuming process involving the discovery of new business procedures(David,1990;Bresnahan,1999;Frey,2019;Brynjolfsson et al.,2021;Feigenbaum and Gross,2021).Consequently,manystudies of machine learning technologies focus on systems-level adoptio
65、n,arguing that organizational systemsmay require redesign to effectively take advantage of novel machine learning advancements(Bresnahan,2019;Agrawal et al.,2021;Goldfarb et al.,2023).Appropriately designed systems can yield considerablebusiness value and improve firm performance(Rock,2019;Babina et
66、 al.,2021;Zolas et al.,2021),with AItools facilitating the discovery process(Cockburn et al.,2018;Cheng et al.,2022).By employing task-levelinformation to assess whether LLMs fulfill the criteria of a general purpose technology,we seek to merge thetwo perspectives for understanding the technology-la
67、bor relationship.We attempt to build on these diverse literature streams in several ways.Echoing(Felten et al.,2023),wefocus our analysis on the impact of LLMs,rather than addressing machine learning or automation technologiesmore broadly.Additionally,we propose anovelmethod thatemploys LLMs,specifi
68、cally GPT-4,to assesstasksfor exposure and automation potential,thereby bolstering human scoring efforts.Subsequently,we aggregateour findings to occupations and industries,capturing the overall potential exposure in the contemporary U.S.labor market.3Methods and Data Collection3.1Data on Activities
69、 and Tasks Performed by Occupation in the USWeusetheO*NET27.2database(O*NET,2023),whichcontainsinformationon1,016occupations,includingtheir respective Detailed Work Activities(DWAs)and tasks.A DWA is a comprehensive action that is part ofcompleting task,such as Study scripts to determine project req
70、uirements.A task,on the other hand,is anoccupation-specific unit of work that may be associated with zero,one,or multiple DWAs.We offer a sampleof tasks and DWAs in Table 1.The two datasets we use consist of:19,265 tasks,consisting of a task description and a corresponding occupation,and2,087 DWAs,w
71、here most DWAs are connected to one or more tasks,and tasks may be associated withone or more DWAs,though some tasks lack any associated DWAs.3.2Data on Wages,Employment,and DemographicsWe obtain employment and wage data from the 2020 and 2021 Occupational Employment series provided bythe Bureau of
72、Labor Statistics.This dataset encompasses occupational titles,the number of workers in eachoccupation,and occupation-level employment projections for 2031,typical education required for entry in anoccupation and on-the-job training required to attain competency in an occupation(BLS,2022).We use theB
73、LS-recommended crosswalk to O*NET(BLS,2023b)to link the O*NET task and DWA dataset and theBLS Labor Force Demographics(BLS,2023a),which is derived from the Current Population Survey(CPS).Both of these data sources are collected by the U.S.government and primarily capture workers who are notself-empl
74、oyed,are documented,and are working in the so-called formal economy.3.3ExposureWe present our results based on an exposure rubric,in which we defineexposureas a measure of whetheraccess to an LLM or LLM-powered system would reduce the time required for a human to perform a specificDWA or complete a
75、task by at least 50 percent.Though GPT-4 has vision capabilities OpenAI(2023b)andLLM is often used to refer to a much wider range of modalities,vision and image capabilities were onlyWORKING PAPERTask IDOccupation TitleDWAsTask Description14675Computer SystemsEngineers/ArchitectsMonitor computer sys
76、tem performanceto ensure proper operation.Monitor system operation to detect potentialproblems.18310Acute Care NursesOperate diagnostic or therapeuticmedical instruments or equipment.Prepare medical supplies or equipmentfor use.Set up,operate,or monitor invasiveequipment and devices,such as colostom
77、y ortracheotomy equipment,mechanicalventilators,catheters,gastrointestinal tubes,and central lines.4668.0Gambling CageWorkersExecute sales or other financialtransactions.Cash checks and process credit card advancesfor patrons.15709Online MerchantsExecute sales or other financialtransactions.Deliver
78、e-mail confirmation of completedtransactions and shipment.6529KindergartenTeachers,ExceptSpecial EducationInvolve parent volunteers and older students inchildrens activities to facilitate involvementin focused,complex play.6568Elementary SchoolTeachers,ExceptSpecial EducationInvolve parent volunteer
79、s and older students inchildrens activities to facilitate involvementin focused,complex play.Table 1:Sample of occupations,tasks,and Detailed Work Activities from the O*NET database.We seethat aggregating over activities alone is imprecise,as evidenced by the fact that wed expect Gambling CageWorker
80、s to complete the given DWA in person,using some physicality while wed expect Online Merchantsto complete the same activity solely with a computer.included in our definition of LLM-powered software.We provide a summary of our rubric below,while thecomplete rubric can be found in A.1.When we have lab
81、els for DWAs,we first aggregate them to the tasklevel before aggregating to the occupation level.No exposure(E0)if:using the described LLM results in no or minimal reduction in the time required tocomplete the activity or task while maintaining equivalent qualityaor using the described LLM results i
82、n a decrease in the quality of the activity/task output.Direct exposure(E1)if:using the described LLM via ChatGPT or the OpenAI playground can decrease the timerequired to complete the DWA or task by at least half(50%).LLM+Exposed(E2)if:access to the described LLM alone would not reduce the time req
83、uired to complete theactivity/task by at least half,butadditional software could be developed on top of the LLM that could reduce the time ittakes to complete the specific activity/task with quality by at least half.Among thesesystems,we count access to image generation systems.baEquivalent quality
84、means that a third party,typically the recipient of the output,would not notice orcare about LLM assistance.bIn practice,as can be seen in the full rubric in Appendix A.1,we categorize access to image capabilitiesseparately(E3)to facilitate annotation,though we combine E2 and E3 for all analyses.Sum
85、mary of exposure rubricWe set the exposure threshold at a potential 50%reduction in time required to complete a specific DWAor task while maintaining consistent quality.We anticipate that adoption will be highest and most immediateWORKING PAPERfor applications that realize a considerable increase in
86、 productivity.Although this threshold is somewhatarbitrary,it was selected for ease of interpretation by annotators.Moreover,regardless of the chosen threshold,we guessed that the real-world reduction in task time would likely be slightly or significantly lower than ourestimates,leading us to opt fo
87、r a relatively high threshold.In our own validation labeling,we found that thiscorresponded closely to whether an LLM or LLM-powered software could perform the core part of a task ornearly the entire task.ComparisonWeightingAgreementPearsonsGPT-4,Rubric 1;HumanE180.8%0.223E1+.5*E265.6%0.591E1+E282.1
88、%0.654GPT-4,Rubric 2;HumanE181.8%0.221E1+.5*E265.6%0.538E1+E279.5%0.589GPT-4,Rubric 1;GPT-4,Rubric 2E191.1%0.611E1+.5*E276.0%0.705E1+E282.4%0.680Table 2:Model and human comparison of agreement and Pearsons correlation scores.The agreement scoreis determined by looking at how often the two groups agr
89、ee on the annotation(e.g.E0,E1 or E2).In thepaper we use GPT-4,Rubric 1.We then collected both human and GPT-4-generated annotations using the exposure rubric,which underliethe bulk of the analyses in this paper.Human Ratings:We obtained human annotations by applying the rubric to each O*NET Detaile
90、dWorker Activity(DWA)and a subset of all O*NET tasks and then aggregated those DWA and taskscores5at the task and occupation levels.The authors personally labeled a large sample of tasks andDWAs and enlisted experienced human annotators who have reviewed GPT-3,GPT-3.5 and GPT-4outputs as part of Ope
91、nAIs alignment work(Ouyang et al.,2022).GPT-4 Ratings:We administered a similar rubric to an early version of GPT-4(OpenAI,2023b)but onall task/occupation pairs rather than DWAs.We made slight modifications to the rubric(which wasused as a prompt to the model in this case)to enhance agreement with a
92、 set of human labels.Fullagreement rates are given in Table 2.We construct three primary measures for our dependent variable of interest:(i),corresponding to E1 inthe exposure rubric above,anticipated to represent the lower bound of the proportion of exposed tasks withinan occupation,(ii),which is t
93、he sum of E1 and 0.5*E2,where the 0.5 weight on E2 is intended to accountfor exposure when deploying the technology via complementary tools and applications necessitates additionalinvestment,and(iii),the sum of E1 and E2,an upper bound of exposure that provides an assessment ofmaximal exposure to an
94、 LLLM and LLM-powered software.We summarize agreement between annotationgroups and measures in Table 2.For the remainder of the analysis,if not specified,the reader may assume thatwe refer toexposure meaning all tasks directly exposed via tools like ChatGPT or the OpenAI Playgroundare considered twi
95、ce as exposed as tasks requiring some complementary innovation.5The authors annotated DWAs that clearly required a high degree of physicality or manual dexterity,and the contracted annotatorslabeled the remaining activities,along with a subset of tasks including those without associated DWAs and tho
96、se for which there wasno clear task-level annotation after aggregating the DWA annotations.WORKING PAPERFigure 2:Human raters(x-axis)and GPT-4 ratings(y-axis)show a high degree of agreement about LLMexposure by occupation.Near the highest levels of exposure following themethod of aggregating exposur
97、escores to occupations,GPT-4 ratings tend to be lower than Human ratings.We present the raw scatter plot andthe binscatter.Near the top end of exposure ratings,humans are on average more likely to rate an occupationas exposed.3.4Limitations of our methodology3.4.1Subjective human judgmentsA fundamen
98、tal limitation of our approach lies in the subjectivity of the labeling.In our study,we employannotators who are familiar with LLM capabilities.However,this group is not occupationally diverse,potentially leading to biased judgments regarding LLMs reliability and effectiveness in performing taskswit
99、hin unfamiliar occupations.We acknowledge that obtaining high-quality labels for each task in anoccupation requires workers engaged in those occupations or,at a minimum,possessing in-depth knowledgeof the diverse tasks within those occupations.This represents an important area for future work in val
100、idatingthese results.3.4.2Measuring LLMs with GPT-4Recent research indicates that GPT-4 serves as an effective discriminator,capable of applying intricatetaxonomies and responding to changes in wording and emphasis(OpenAI,2023b).The outcomes of GPT-4task classification are sensitive to alterations i
101、n the rubrics wording,the prompts order and composition,thepresence or absence of specific examples in the rubric,the level of detail provided,and the definitions givenfor key terms.Iterating on the prompt,based on observed outcomes in a small validation set,can enhance theagreement between model ou
102、tputs and the rubrics intent.Consequently,there are slight differences betweenthe rubric presented to humans and the one used for GPT-4.This decision was made deliberately to guidethe model towards reasonable labels without excessively influencing human annotators.As a result,we usemultiple annotati
103、on sources,but none should be considered the definitive ground truth relative to the others.In this analysis,we present results from human annotators as our primary results.Further improvement andinnovation in crafting effective rubrics for LLM classification remains possible.Still,we observe a high
104、degree of agreement between human ratings and GPT-4 ratings at the occupation level concerning overallexposure to LLM systems(see Table 2,Figure 2).WORKING PAPER3.4.3Additional Weaknesses Validity of task-based framework.It is unclear to what extent occupations can be entirely brokendown into tasks,
105、and whether this approach systematically omits certain categories of skills or tasksthat are tacitly required for competent performance of a job.Additionally,tasks can be composed ofsub-tasks,some of which are more automatable than others.Some tasks may function as pre-cursor toother tasks,such that
106、 the completion of downstream tasks is dependent on precursor tasks.If indeed,the task-based breakdown is not a valid representation of how most work in an occupation is performed,our exposure analysis would largely be invalidated.Lack of expertise and task interpretation.Human annotators were mostl
107、y unaware of the specificoccupations mapped to each DWA during the labeling process.This led to unclear logic for aggregatingtasks and occupations,as well as some evident discrepancies in labels,demonstrated in Table 1.Weexperimented with various aggregation methods and discovered that even with a m
108、aximum-matchingapproach(taking the matching humanmodel label if one existed),the agreement remained relativelyconsistent.Ultimately,we collected additional labels for task/occupation pairs where there wassignificant disagreement.Forward-looking and subject to change,with some early evidence.Accurate
109、ly predicting futureLLM applications remains a significant challenge,even for experts(OpenAI,2023b).The discovery ofnew emergent capabilities,changes in human perception biases,and shifts in technological developmentcan all affect the accuracy and reliability of predictions regarding the potential i
110、mpact of LLMson worker tasks and the development of LLM-powered software.Our projections are inherentlyforward-looking and based on current trends,evidence,and perceptions of technological possibilities.As a result,they may change as new advancements arise in the field.For example,some tasks thatsee
111、m unlikely for LLMs or LLM-powered software to impact today might change with the introductionof new model capabilities.Conversely,tasks that appear exposed might face unforeseen challengeslimiting language model applications.Sources of disagreement.While we did not rigorously examine sources of dis
112、agreement,we found afew places where humans and the model tended to get stuck in their assessments:Tasks or activities where while an LLM could theoretically help or accomplish the task,adoptingit to do so would require multiple people to change their habits or expectations(e.g.meetings,negotiations
113、),Tasks or activities where there is currently some regulation or norm that requires or suggestshuman oversight,judgment or empathy(e.g.making decisions,counseling),andTasks or activities where there already exists a technology that can reasonably automate the task(e.g.making reservations).4ResultsG
114、eneral-purpose technologies are relatively rare and characterized by their pervasiveness,improvement overtime,and the development of significant co-invention and spillovers(Lipsey et al.,2005).Our assessment ofLLMs potential impact on the labor market is limited since it does not consider total fact
115、or productivity orcapital input potential.In addition to their influence on labor,LLMs may also influence these dimensions.At this stage,some general-purpose technology criteria are easier to evaluate than others.Our primaryfocus at this early stage is to test the hypothesis that LLMs have a pervasi
116、ve influence on the economy,similar to the approach taken by(Goldfarb et al.,2023),who analyzed machine learning diffusion throughWORKING PAPERjob postings to assess its status as a general-purpose technology.Instead of using job postings or studyingmachine learning in general,we employ the task eva
117、luation approach with both human and GPT-4 annotations.This analysis may reveal whether the impacts are limited to a specific set of similar tasks or occupations or ifthey will be more widespread.Our findings suggest that,based on their task-level capabilities,LLMs have the potential to significantl
118、yaffectadiverserangeofoccupationswithintheU.S.economy,demonstratingakeyattributeofgeneral-purposetechnologies.In the following sections,we discuss results across various roles and wage structures.Additionalresults on the relative exposure of industries within the U.S.economy can be found in Appendix
119、 D.4.1Summary StatisticsSummary statistics for these measures can be found in Table 3.Both human and GPT-4 annotations indicatethat average occupation-levelvalues fall between 0.14 and 0.15,suggesting that,on average,approximately15%of tasks within an occupation are directly exposed to LLMs.This fig
120、ure increases to over 30%forand surpasses 50%for.Coincidentally,human and GPT-4 annotations also tag between 15%and 14%oftotal tasks in the dataset as being exposed to LLMs.Based on thevalues,we estimate that 80%of workersbelong to an occupation with at least 10%of its tasks exposed to LLMs,while 19
121、%of workers are in anoccupation where over half of its tasks are labeled as exposed.We ran one set of analyses using O*NETs Importance scores but did not find significant changes to ourfindings.Though we do acknowledge that not weighting relative importance of a task to a given occupationyields some
122、 curious results(e.g.ranking Barbers as having reasonably high exposure).Although the potential for tasks to be affected is vast,LLMs and LLM-powered software must beincorporated into broader systems to fully realize this potential.As is common with general-purposetechnologies,co-inventionbarriersma
123、yinitiallyimpedetherapiddiffusionofGPTsintoeconomicapplications.Furthermore,predicting the need for human oversight is challenging,especially for tasks where modelcapabilities equal or surpass human levels.While the requirement for human supervision may initially slowdown the speed at which these sy
124、stems diffuse through the economy,users of LLMs and LLM-poweredsystems are likely to become increasingly acquainted with the technology over time,particularly in terms ofunderstanding when and how to trust its outputs.Occupation Level ExposureHumanGPT-4meanstdmeanstd 0.140.140.140.16 0.300.210.340.2
125、2 0.460.300.550.34Task Level ExposureHumanGPT-4meanstdmeanstd 0.150.360.140.35 0.310.370.350.35 0.470.500.560.50Table 3:Summary statistics of our human and model exposure data.WORKING PAPERFigure 3:Exposure intensity across the economy,displayed on the left in terms of percent of affectedoccupations
126、 and on the right as percent of affected workers.The distribution of exposure is similar acrossoccupations and across workers,suggesting that worker concentration in occupations is not highly correlatedwith occupational exposure to LLMs or LLM-powered software.We do however expect that it could be m
127、orehighly correlated with investment in developing LLM-powered software for particular domains.4.2Wages and EmploymentIn Figure 3,we present the exposure intensity across the economy.The first plot displays exposure in termsof occupations,while the second plot shows exposure in terms of total worker
128、s.Each point on the graphrepresents the estimated percentage of workers(and occupations)on the y-axis with an exposure level(,and)indicated on the x-axis.For example,human annotators determined that 2.4%of workers are50-exposed,18.6%are50-exposed,and 49.6%are50-exposed,where the threshold of 50%come
129、s from thex-axis and the percentage of workers comes from the y axis in the right plot of Figure 2.At any given point onthe x-axis,the vertical distance between theand therepresents the exposure potential attributable to toolsand applications beyond direct exposure to LLMs.The distribution of exposu
130、re is similar for both workersand occupations,suggesting that worker concentration in occupations does not have a strong correlation withoccupational exposure to LLMs or LLM-powered software.Aggregated at the occupation level,human and GPT-4 annotations exhibit qualitative similarities andtend to co
131、rrelate,as demonstrated in Figure 4.Human annotations estimate marginally lower exposure forhigh-wage occupations compared to GPT-4 annotations.While there are numerous low-wage occupationswith high exposure and high-wage occupations with low exposure,the overall trend in the binscatter plotreveals
132、that higher wages are associated with increased exposure to LLMs.The potential exposure to LLMs seems to have little correlation with current employment levels.InFigure 4,both human and GPT-4 ratings of overall exposure are aggregated to the occupation-level(y-axis)and compared with the log of total
133、 employment(x-axis).Neither plot reveals significant differences in LLMexposure across varying employment levels.4.3Skill ImportanceIn this section,we explore the relationship between the importance of a skill for an occupation(as annotatedin the O*NET dataset)and our exposure measures.First,we use
134、the Basic Skills provided by O*NET(skilldefinitions can be found in Appendix B)and normalize the measure of skill importance for each occupationto improve the comprehensibility of the results.Next,we conduct a regression analysis on our exposuremeasures(,)to examine the strength of associations betw
135、een skill importance and exposure.WORKING PAPERFigure 4:The binscatter plots depict the exposure to language models(LLMs)in various occupations,asassessed by both human evaluators and GPT-4.These plots compare the exposure to LLM and partialLLM-powered software()at the occupation level against the l
136、og of total employment within an occupationand log of the median annual wage for occupations.While some discrepancies exist,both human and GPT-4assessments indicate that higher wage occupations tend to be more exposed to LLMs.Additionally,numerouslower wage occupations demonstrate high exposure base
137、d on our rubric.Core tasks receive twice the weightof supplemental tasks within occupations when calculating average exposure scores.Employment and wagedata are sourced from the BLS-OES survey conducted in May 2021.WORKING PAPERFigure 5:exposure ratings of occupations in the five Job Zones,which are
138、 groups of similar occupations thatare classified according to the level of education,experience,and on-the-job training needed to perform them.Our findings indicate that the importance ofscienceandcritical thinkingskills are strongly negativelyassociated with exposure,suggesting that occupations re
139、quiring these skills are less likely to be impactedby current LLMs.Conversely,programmingandwritingskills show a strong positive association withexposure,implying that occupations involving these skills are more susceptible to being influenced by LLMs(see Table 5 for detailed results).4.4Barriers to
140、 EntryNext,we examine barriers to entry to better understand if there is differentiation in exposure due to types ofjobs.One such proxy is an O*NET occupation-level descriptor called the Job Zone.A Job Zone groupsoccupations that are similar in(a)the level of education needed to get a job in the occ
141、upation,(b)the amountof related experience required to do the work,and(c)the extent of on-the-job training needed to do the work.In the O*NET database,there are 5 Job Zones,with Job Zone 1 requiring the least amount of preparation(3months)and Job Zone 5 requiring the most extensive amount of prepara
142、tion,4 or more years.We observe thatmedian income increases monotonically across Job Zones as the level of preparation needed also increases,with the median worker in Job Zone 1 earning$30,230 and the median worker in Job Zone 5 earning$80,980.All of our measures(,and)show an identical pattern,that
143、is,exposure increases from Job Zone 1 toJob Zone 4,and either remains similar or decreases at Job Zone 5.Similar to Figure 3,in Figure 5,we plotthe percentage of workers at every threshold of exposure.We find that,on average,the percentage of workersin occupations with greater than 50%exposure in Jo
144、b Zones 1 through 5 haveat 0.00%(Job Zone 1),6.11%(Job Zone 2),10.57%(Job Zone 3),34.5%(Job Zone 4),and 26.45%(Job Zone 5),respectively.4.4.1Typical Education Needed for EntrySince inclusion in a Job Zone accounts for both the education requiredwhich itself is a proxy for skillacquisitionand the pre
145、paration required,we seek data to disentangle these variables.We use two variablesfromtheBureauofLaborStatisticsOccupationaldata:TypicalEducationNeededforEntryandOn-the-jobWORKING PAPERTraining Required to Attain Competency in an occupation.By examining these factors,we aim to uncovertrends with pot
146、ential implications for the workforce.There are 3,504,000 workers for whom we lack data oneducation and on-the-job training requirements,and they are therefore excluded from the summary tables.Our analysis suggests that individuals holding Bachelors,Masters,and professional degrees are moreexposed t
147、o LLMs and LLM-powered software than those without formal educational credentials(see Table 7).Interestingly,we also find that individuals with some college education but no degree exhibit a high level ofexposure to LLMs and LLM-powered software.Upon examining the table displaying barriers to entry,
148、weobserve that the jobs with the least exposure require the most training,potentially offering a lower payoff(interms of median income)once competency is achieved.Conversely,jobs with no on-the-job training requiredor only internship/residency required appear to yield higher income but are more expo
149、sed to LLMs.WORKING PAPERGroupOccupations with highest exposure%ExposureHuman Interpreters and Translators76.5Survey Researchers75.0Poets,Lyricists and Creative Writers68.8Animal Scientists66.7Public Relations Specialists66.7Human Survey Researchers84.4Writers and Authors82.5Interpreters and Transla
150、tors82.4Public Relations Specialists80.6Animal Scientists77.8Human Mathematicians100.0Tax Preparers100.0Financial Quantitative Analysts100.0Writers and Authors100.0Web and Digital Interface Designers100.0Humans labeled 15 occupations as fully exposed.Model Mathematicians100.0Correspondence Clerks95.
151、2Blockchain Engineers94.1Court Reporters and Simultaneous Captioners92.9Proofreaders and Copy Markers90.9Model Mathematicians100.0Blockchain Engineers97.1Court Reporters and Simultaneous Captioners96.4Proofreaders and Copy Markers95.5Correspondence Clerks95.2Model Accountants and Auditors100.0News A
152、nalysts,Reporters,and Journalists100.0Legal Secretaries and Administrative Assistants100.0Clinical Data Managers100.0Climate Change Policy Analysts100.0The model labeled 86 occupations as fully exposed.Highest varianceSearch Marketing Strategists14.5Graphic Designers13.4Investment Fund Managers13.0F
153、inancial Managers13.0Insurance Appraisers,Auto Damage12.6Table 4:Occupations with the highest exposure according to each measurement.The final row lists theoccupations with the highest2value,indicating that they had the most variability in exposure scores.Exposure percentages indicate the share of a
154、n occupations task that are exposed to GPTs()or GPT-poweredsoftware(and ),where exposure is defined as driving a reduction in time it takes to complete the task by atleast 50%(see exposure rubric A.1).As such,occupations listed in this table are those where we estimatethat GPTs and GPT-powered softw
155、are are able to save workers a significant amount of time completing alarge share of their tasks,but it does not necessarily suggest that their tasks can be fully automated by thesetechnologies.WORKING PAPERBasic Skill (std err)(std err)(std err)All skill importance scores are normalized to be betwe
156、en 0 and 1.Constant0.082*-0.112*0.300*(0.011)(0.011)(0.057)Active Listening0.128*0.214*0.449*(0.047)(0.043)(0.027)Mathematics-0.127*0.161*0.787*(0.026)(0.021)(0.049)Reading Comprehension0.153*0.470*-0.346*(0.041)(0.037)(0.017)Science-0.114*-0.230*-0.346*(0.014)(0.012)(0.017)Speaking-0.0280.133*0.294
157、*(0.039)(0.033)(0.042)Writing0.368*0.467*0.566*(0.042)(0.037)(0.047)Active Learning-0.157*-0.065*0.028(0.027)(0.024)(0.032)Critical Thinking-0.264*-0.196*-0.129*(0.036)(0.033)(0.042)Learning Strategies-0.072*-0.209*-0.346*(0.028)(0.025)(0.034)Monitoring-0.067*-0.149*-0.232*(0.023)0.020)(0.026)Progra
158、mming0.637*0.623*0.609*(0.030)(0.022)(0.024)Table 5:Regression of occupation-level,human-annotated exposure to GPTs on skill importance for eachskill in the O*NET Basic skills category,plus the programming skill.Descriptions of the skills may be foundin Appendix B.JobZonePreparationRequiredEducation
159、RequiredExample OccupationsMedianIncomeTot Emp(000s)H M H M H M 1None or little(0-3 months)High schooldiploma or GED(otional)Food preparation workers,dishwashers,floor sanders$30,23013,1000.030.040.060.060.090.082Some(3-12months)High schooldiplomaOrderlies,customerservice representatives,tellers$38,
160、21573,9620.070.120.160.200.240.273Medium(1-2years)Vocational school,on-the-job training,or associatesdegreeElectricians,barbers,medical assistants$54,81537,8810.110.140.260.320.410.514Considerable(2-4 years)Bachelors degreeDatabase administrators,graphic designers,costestimators$77,34556,8330.230.18
161、0.470.510.710.855Extensive(4+years)Masters degree orhigherPharmacists,lawyers,astronomers$81,98021,2210.230.130.430.450.630.76Table 6:Mean exposure to GPTs by job zone.For each job zone,we also present the median of medianannual income for each constituting occupation in USD,and the total number of
162、workers in all occupationsfor that job zone,in the thousands.WORKING PAPEROn The Job Training RequiredMedian IncomeTot Emp(000s)H M H M H M None$77,44090,7760.200.160.420.460.630.76Apprenticeship$55,9953,0660.010.020.040.060.070.10Internship/residency$77,1103,0630.160.060.360.380.550.71Short-term on
163、-the-job training$33,37066,2340.110.150.210.250.320.34Moderate-term on-the-job training$46,88031,2850.090.120.210.250.320.38Long-term on-the-job training$48,9255,0700.080.100.180.220.280.33Table 7:Mean exposure scores for occupations,grouped by level of on-the-job training required to attaincompeten
164、cy in the job.Alongside exposure scores,we display the median of median annual income for eachoccupation,as well as the total number of workers in each group,in thousands.WORKING PAPER5Validation of Measures5.1Comparison to Earlier EffortsThis paper aims to build on a number of previous empirical st
165、udies examining the occupational exposure toadvances in AI and/or automation.Previous studies have used a variety of methods,including:Using occupational taxonomies like O*NET to characterize which occupations have routine vs.non-routine and manual vs.cognitive task content(Autor et al.,2003;Acemogl
166、u and Autor,2011a).Mapping text descriptions of tasks to descriptions of technological advances in patents.(Kogan et al.,2021;Webb,2020)Linking capabilities of AI systems to occupational abilities and aggregating exposure estimates to theoccupations where those abilities are required.(Felten et al.,
167、2018,2023)Mapping the results of AI task benchmark evaluations(ImageNet,Robocup,etc.)to 59 worker tasksthrough a set of 14 cognitive abilities drawn from the cognitive science literature.(Tolan et al.,2021)Expert labeling of automation potential for a set of O*NET occupations where experts had highc
168、onfidence,combined with a probabilistic classifier to estimate automation potential for the remainderof O*NET occupations.(Frey and Osborne,2017)Developing a rubric for evaluating the suitability for machine learning(SML)of activities thatworkers are completing in the economy(Brynjolfsson and Mitche
169、ll,2017;Brynjolfsson et al.,2018,2023).We provide a set of summary statistics on many of these prior efforts in Table 8.This papers methodology primarily builds upon the SML approach by developing a rubric to evaluate theoverlap between LLM capabilities and worker tasks as reported in the O*NET data
170、base.Table 9 presents theresults of OLS regressions of our new LLM exposure measurements on occupation-level exposure measuresfrom(Felten et al.,2018)(AI Occupational Exposure Score in the table),(Frey and Osborne,2017)(Frey&Osborne Automation),scores from all three technologies in(Webb,2020),normal
171、ized routine manualand cognitive scores from(Acemoglu and Autor,2011a),and(Brynjolfsson et al.,2018,2023)(SML).Wealso use annualized occupational salaries from the most recent BLS Occupational Employment Survey as acontrol.There are four separate output variables representing new scores in this pape
172、r that are predicted byearlier efforts.GPT-4 Exposure Rating 1 corresponds to our overall exposure rubric as evaluated by GPT-4,where fullexposure potential is coded as 1,no exposure potential is coded as 0,and partial exposure(E2 in our labelingscheme)is coded as 0.5.GPT-4 Exposure Rating 2 is scor
173、ed similarly for overall exposure,but with a slightlydifferent prompt.The results are very similar across the two prompts.Human Exposure Rating represents thesame rubric as in GPT-4 Exposure Rating 1 but is scored by humans,as discussed in an earlier section of thepaper.These results correspond to t
174、he set of statistics presented above.The results across each type of measurement are consistent.We find generally positive and statisticallysignificant correlations between our LLM exposure measures and previous measurements targeting softwareand AI.Encouragingly,the SML exposure scores by occupatio
175、n show significant and positive associationswith the exposure scores we develop in this paper,demonstrating a level of cohesion between the two studieswith similar approaches.The Webb software and AI patent-based measures,SML,and normalized(demeanedand divided by standard deviation)routine cognitive
176、 scores all exhibit positive associations with some of ourmeasures.WORKING PAPERMin25th Perc.Median75th PercMaxMeanStd.Dev.CountGPT-4 Exposure Rating 10.000.130.340.501.000.330.22750GPT-4 Exposure Rating 20.000.090.240.400.980.260.20750Human Exposure Rating0.000.090.290.470.840.290.21750Software(Web
177、b)1.0025.0050.0075.00100.0050.6930.05750Robot(Webb)1.0022.0052.0069.00100.0048.6128.61750AI(Webb)1.0028.0055.0082.00100.0054.5329.65750Suitability for Machine Learning2.602.842.953.123.552.990.18750Normalized Routine Cognitive-3.05-0.460.100.633.420.070.86750Normalized Routine Manual-1.81-0.81-0.110
178、.732.960.051.01750AI Occupational Exposure Score1.423.093.564.046.543.560.70750Frey&Osborne Automation0.000.070.590.880.990.500.38681Log Avg.Salary10.1310.6711.0011.3412.6511.020.45749Table 8:Summary statistics for a suite of prior efforts to measure occupational exposure to AI and automation.We hav
179、e also included summary statistics for measurements newly presented in this work.We include allmeasures from(Webb,2020),normalized routine cognitive and manual scores from(Acemoglu and Autor,2011a)(means may deviate slightly from 0 due to imperfect matching of occupational groups),Suitability forMac
180、hine Learning from(Brynjolfsson and Mitchell,2017;Brynjolfsson et al.,2018,2023),AI OccupationalExposure from(Felten et al.,2018),and Automation exposure from(Frey and Osborne,2017).We include asmany occupations as we can match,but since O*NET taxonomies have changed as these measures have beendevel
181、oped,some of the roles may be missing from the most recent version of O*NET 6-digit occupations.Software,SML,and routine cognitive scores all show positive and statistically significant associationswith LLM exposure scores at a 1%level.Coefficients on AI scores from(Webb,2020)are also positive andst
182、atistically significant at a 5%level,but our secondary prompt on overall exposure to LLMs in columns 3and 4 does not exhibit a statistically significant relationship.For the most part,the AI Occupational ExposureScore is not correlated with our exposure measures.Webbs Robot exposure scores,routine m
183、anual taskcontent,and the overall Automation metric from(Frey and Osborne,2017)are all negatively correlated withour primary GPT-4 and human-assessed overall exposure ratings,conditional on the other measurements.This negative correlation reflects the limited exposure of physical tasks to LLMs.Manua
184、l work is not exposedto LLMs or even LLMs with additional systems integration for the time being.Low correlations with(Felten et al.,2018)and(Frey and Osborne,2017)could potentially be explainedby differences in approaches.Linking AI capabilities to worker abilities or scoring exposure directly base
185、d onthe occupations characteristics,rather than aggregating up to the occupation from DWA or task-level scoring(as in the SML paper and our own),offer a slightly different perspective on the content of occupations.In all regressions,the2ranges between 60.7%(column 3)and 72.8%(column 5).This suggests
186、 thatour measure,which explicitly focuses on LLM capabilities,has between 28 and 40%unexplained variancecompared to other measurements.Particularly in the case of AI-related exposure scores,we anticipate that acombination of other measurements would have a strong correlation with our scores.However,
187、earlier effortshad limited information about the future progress of LLMs or LLM-powered software.We expect that ourunderstanding of future machine learning technologies is similarly imperfectly captured by our rubric today.6Discussion6.1GPTs as a General-Purpose TechnologyEarlier in this paper we di
188、scuss the possibility that LLMs could be classified as a general-purpose technology.ThisclassificationrequiresLLMstomeetthreecorecriteria:improvementovertime,pervasivenessthroughoutWORKING PAPERGPT-4 Exposure Rating 1GPT-4 Exposure Rating 2Human Exposure Rating(1)(2)(3)(4)(5)(6)Software(Webb)0.00113
189、0.001230.001110.001190.000960.00101(0.00031)(0.00031)(0.00031)(0.00031)(0.00031)(0.00031)Robot(Webb)0.003780.004050.003770.003990.003710.00383(0.00032)(0.00031)(0.00034)(0.00033)(0.00029)(0.00028)AI(Webb)0.000800.000900.000360.000450.000670.00071(0.00030)(0.00029)(0.00030)(0.00030)(0.00030)(0.00030)
190、Suitability for Machine Learning0.295220.268880.284680.262450.195140.18373(0.04503)(0.04418)(0.04404)(0.04342)(0.03990)(0.03886)Normalized Routine Cognitive0.066010.068680.047430.050150.035680.03659(0.00886)(0.00894)(0.00872)(0.00879)(0.00671)(0.00669)Normalized Routine Manual0.111470.113710.093900.
191、095610.110450.11152(0.00785)(0.00789)(0.00817)(0.00818)(0.00741)(0.00744)AI Occupational Exposure Score0.009930.024650.015370.002650.006300.01252(0.01107)(0.01059)(0.01160)(0.01114)(0.00918)(0.00845)Frey&Osborne Automation0.030240.039500.003640.012170.038900.04253(0.01835)(0.01841)(0.02007)(0.01972)
192、(0.01883)(0.01858)Log Avg.Salary0.058040.048630.02531(0.01870)(0.01860)(0.01727)Constant1.129370.457430.961170.399350.470780.17706(0.26859)(0.15327)(0.26365)(0.15017)(0.24684)(0.13256)N680.00000681.00000680.00000681.00000680.00000681.0000020.687410.682120.607370.601980.712130.71126Table9:Regressiono
193、fLLM-exposurescoresonpriormeasuresofoccupationalexposuretoAIandautomation.We also include annualized wages from the BLS-OES survey in May 2021.Each measure is kept in itsoriginal scale,with the exception of routine cognitive and routine manual scores from(Acemoglu and Autor,2011a).Those two scores a
194、re standardized to mean zero and variance 1.Generally we find strong positiveassociations with previous efforts,though large residual variance to still be explained by our new measures.Columns 1 and 2 are based on our mainexposure measure from GPT-4 ratings.Columns 3 and 4 are basedon a similar slig
195、htly different exposure rubric also rated by GPT-4 for robustness.Columns 5 and 6 reflecthuman ratings on the same rubric as columns 1 and 2.WORKING PAPERthe economy,and the ability to spawn complementary innovations(Lipsey et al.,2005).Evidence from the AIand machine learning literature thoroughly
196、demonstrates that LLMs meet the first criteria they are improvingin capabilities over time with the ability to complete or be helpful for an increasingly complex set of tasks anduse-cases(see 2.1).This paper presents evidence to support the latter two criteria,finding that LLMs on theirown can have
197、pervasive impacts across the economy,and that complementary innovations enabled by LLMs particularly via software and digital tools can have widespread application to economic activity.Figure 3 offers one illustration of the potential economic impact of complementary software built on top ofLLMs.Tak
198、ing the difference in the y-axis(the share of all occupations)betweenandat a given point alongthe x-axis(the share of tasks within an occupation that are exposed)gives the aggregate within-occupationexposure potential attributable to tools and software over and above direct exposure from LLMs on the
199、irown.The difference in means across all tasks betweenandof 0.42 using the GPT-4 annotations and 0.32using the human annotations(see Figure 3),suggests that the average impact of LLM-powered software ontask-exposure may be more than twice as large as the mean exposure from LLMs on their own(meanof 0
200、.14based on both human annotations and GPT-4 annotations).While our findings suggest that out-of-the-boxthese models are relevant to a meaningful share of workers and tasks,they also suggest that the softwareinnovations they spawn could drive a much broader impact.One component of the pervasiveness
201、of a technology is its level of adoption by businesses and users.This paper does not systematically analyze adoption of these models,however,there is early qualitativeevidence that adoption and use of LLMs is becoming increasingly widespread.The power of relativelysimple UI improvements on top of LL
202、Ms was evident in the rollout of ChatGPT wherein versions of theunderlying language model had been previously available via API,but usage skyrocketed after the release ofthe ChatGPT interface.(Chow,2023;OpenAI,2022)Following this release,a number of commercial surveysindicate that firm and worker ad
203、option of LLMs has increased over the past several months.(Constantz,2023;ResumeB,2023)Widespread adoption of these models requires addressing existing bottlenecks.A key determinant oftheir utility is the level of confidence humans place in them and how humans adapt their habits.For instance,in the
204、legal profession,the models usefulness depends on whether legal professionals can trust modeloutputs without verifying original documents or conducting independent research.The cost and flexibilityof the technology,worker and firm preferences,and incentives also significantly influence the adoption
205、oftools built on top of LLMs.In this way,adoption may be driven by progress on some of the ethical andsafety risks associated with LLMs:bias,fabrication of facts,and misalignment,to name a few OpenAI(2023a).Moreover,the adoption of LLMs will vary across different economic sectors due to factors such
206、as data availability,regulatory environment,and the distribution of power and interests.Consequently,acomprehensive understanding of the adoption and use of LLMs by workers and firms requires a more in-depthexploration of these intricacies.One possibility is that time savings and seamless applicatio
207、n will hold greater importance than qualityimprovement for the majority of tasks.Another is that the initial focus will be on augmentation,followed byautomation(Huang and Rust,2018).One way this might take shape is through an augmentation phase wherejobs first become more precarious(e.g.,writers bec
208、oming freelancers)before transitioning to full automation.6.2Implications for US Public PolicyThe introduction of automation technologies,including LLMs,has previously been linked to heightenedeconomic disparity and labor disruption,which may give rise to adverse downstream effects(Acemoglu andRestr
209、epo,2022a;Acemoglu,2002;Moll et al.,2021;Klinova and Korinek,2021;Weidinger et al.,2021,2022).Our results examining worker exposure in the United States underscore the need for societal and policypreparedness to the potential economic disruption posed by LLMs and the complementary technologiesthat t
210、hey spawn.While it is outside the scope of this paper to recommend specific policy prescriptions toWORKING PAPERsmooth the transition to an economy with increasingly widespread LLM adoption,prior work such as(Autoret al.,2022b)has articulated several important directions for US policy related to edu
211、cation,worker training,reforms to safety net programs,and more.6.3Limitations and Future WorkIn addition to those discussed above,we highlight some particular limitations of this work that warrant furtherinvestigation.Primarily,our focus on the United States restricts the generalizability of our fin
212、dings to othernations where the adoption and impact of generative models may differ due to factors such as industrialorganization,technological infrastructure,regulatory frameworks,linguistic diversity,and cultural contexts.We hope to address this limitation by extending the studys scope and by shar
213、ing our methods so otherresearchers can build on them.Subsequent research efforts should consider two additional studies:one exploring LLM adoption patternsacross various sectors and occupations,and another scrutinizing the actual capabilities and limitations ofstate-of-the-art models in relation to
214、 worker activities beyond the scope of our exposure scores.For example,despite recent advances in multimodal capabilities with GPT-4,we did not consider vision capabilities intheratings on direct LLMs-exposure(OpenAI,2023b).Future work should consider the impact of suchcapability advances as they un
215、fold.Furthermore,we acknowledge that there may be discrepancies betweentheoretical and practical performance,particularly in complex,open-ended,and domain-specific tasks.7ConclusionIn conclusion,this study offers an examination of the potential impact of LLMs on various occupations andindustries wit
216、hin the U.S.economy.By applying a new rubric for understanding LLM capabilities and theirpotential effects on jobs,we have observed that most occupations exhibit some degree of exposure to LLMs,with higher-wage occupations generally presenting more tasks with high exposure.Our analysis indicates tha
217、tapproximately 19%of jobs have at least 50%of their tasks exposed to LLMs when considering both currentmodel capabilities and anticipated LLM-powered software.Our research aims to highlight the general-purpose potential of LLMs and their possible implications forUS workers.Previous literature demons
218、trates the impressive improvements of LLMs to date(see 2.1).Ourfindings confirm the hypothesis that these technologies can have pervasive impacts across a wide swath ofoccupations in the US,and that additional advancements supported by LLMs,mainly through software anddigital tools,can have significa
219、nt effects on a range of economic activities.However,while the technicalcapacity for LLMs to make human labor more efficient appears evident,it is important to recognize that social,economic,regulatory,and other factors will influence actual labor productivity outcomes.As capabilitiescontinue to evo
220、lve,the impact of LLMs on the economy will likely persist and increase,posing challenges forpolicymakers in predicting and regulating their trajectory.Further research is necessary to explore the broader implications of LLM advancements,includingtheir potential to augment or displace human labor,the
221、ir impact on job quality,impacts on inequality,skilldevelopment,and numerous other outcomes.By seeking to understand the capabilities and potential effectsof LLMs on the workforce,policymakers and stakeholders can make more informed decisions to navigate thecomplex landscape of AI and its role in sh
222、aping the future of work.7.1LLM Conclusion(GPT-4s Version)Generative Pre-trained Transformers(GPTs)generate profound transformations,garnering potential technolog-ical growth,permeating tasks,greatly impacting professions.This study probes GPTs potential trajectories,presenting a groundbreaking rubr
223、ic to gauge tasks GPT exposure,particularly in the U.S.labor market.WORKING PAPER7.2LLM Conclusion(Author-Augmented Version)Generative Pre-trained Transformers(GPTs)generate profound transformations,garnering potential techno-logical growth,permeating tasks,gutting professional management.Gauging po
224、ssible trajectories?Generatepioneering taxonomies,gather policymakers together,generalize past today.AcknowledgmentsThank you to the group of annotators who helped us annotate task exposure,including Muhammad AhmedSaeed,Bongane Zitha,Merve zen enen,J.J.,and Peter Hoeschele.We also thank Lauryn Fuld,
225、Ashley Glat,Michael Lampe,and Julia Susser for excellent research assistance.We thank Miles Brundage for significantfeedback on this paper.We thank Todor Markov and Vik Goel for setting up the infrastructure we use to run our rubrics withGPT-4.We thank Lama Ahmad,Donald Bakong,Seth Benzell,Erik Bryn
226、jolfsson,Parfait Eloundou-Enyegue,Carl Frey,Sarah Giroux,Gillian Hadfield,Johannes Heidecke,Alan Hickey,Eric Horvitz,Shengli Hu,Ashyana Kachra,Christina Kim,Katya Klinova,Daniel Kokotajlo,Gretchen Krueger,Michael Lampe,AalokMehta,Larissa Schiavo,Daniel Selsam,Sarah Shoker,Prasanna Tambe,and JeffWu f
227、or feedback and edits atvarious stages of the project.LLM assistance statementGPT-4 and ChatGPT were used for writing,coding,and formatting assistance in this project.ARubricA.1Exposure#E Exposure RubricConsider the most powerful OpenAI large language model(LLM).This model can complete many taskstha
228、t can be formulated as having text input and text output where the context for the input can be captured in2000 words.The model also cannot draw up-to-date facts(those from 1 year ago)unless they are capturedin the input.Assume you are a worker with an average level of expertise in your role trying
229、to complete the given task.You have access to the LLM as well as any other existing software or computer hardware tools mentionedin the task.You also have access to any commonly available technical tools accessible via a laptop(e.g.amicrophone,speakers,etc.).You do not have access to any other physi
230、cal tools or materials.Please label the given task according to the rubric below.Equivalent quality means someone reviewing the work would not be able to tell whether a humancompleted it on their own or with assistance from the LLM.If you arent sure how to judge the amount of time a task takes,consi
231、der whether the tools describedexposed the majority of subtasks associated with the task.#E1 Direct exposureLabel tasks E1 if direct access to the LLM through an interface like ChatGPT or the OpenAI playgroundalone can reduce the time it takes to complete the task with equivalent quality by at least
232、 half.This includestasks that can be reduced to:-Writing and transforming text and code according to complex instructions,-Providing edits to existing text or code following specifications,-Writing code that can help perform a taskthat used to be done by hand,-Translating text between languages,-Sum
233、marizing medium-length documents,WORKING PAPER-Providing feedback on documents,-Answering questions about a document,-Generating questions a usermight want to ask about a document,-Writing questions for an interview or assessment,-Writing andresponding to emails,including ones that involve refuting
234、information or engaging in a negotiation(but only ifthe negotiation is via written correspondence),-Maintain records of written data,-Prepare training materialsbased on general knowledge,or-Inform anyone of any information via any written or spoken medium.#E2 Exposure by LLM-powered applicationsLabe
235、l tasks E2 if having access to the LLM alone may not reduce the time it takes to complete the task byat least half,but it is easy to imagine additional software that could be developed on top of the LLM thatwould reduce the time it takes to complete the task by half.This software may include capabil
236、ities suchas:-Summarizing documents longer than 2000 words and answering questions about those documents,-Retrieving up-to-date facts from the Internet and using those facts in combination with the LLM capabilities,-Searching over an organizations existing knowledge,data,or documents and retreiving
237、information,-Retrieving highly specialized domain knowledge,-Make recommendations given data or written input,-Analyze written information to inform decisions,-Prepare training materials based on highly specializedknowledge,-Provide counsel on issues,and-Maintain complex databases.#E3 Exposure given
238、 image capabilitiesSuppose you had access to both the LLM and a system that could view,caption,and create images as wellas any systems powered by the LLM(those in E2 above).This system cannot take video as an input and itcannot produce video as an output.This system cannot accurately retrieve very d
239、etailed information fromimage inputs,such as measurements of dimensions within an image.Label tasks as E3 if there is a significantreduction in the time it takes to complete the task given access to a LLM and these image capabilities:-Reading text from PDFs,-Scanning images,or-Creating or editing di
240、gital images according to instructions.The images can be realistic but they should not be detailed.The model can identify objects in the imagebut not relationships between those options.#E0 No exposureLabel tasks E0 if none of the above clearly decrease the time it takes for an experienced worker to
241、 completethe task with high quality by at least half.Some examples:-If a task requires a high degree of humaninteraction(for example,in-person demonstrations)then it should be classified as E0.-If a task requiresprecise measurements then it should be classified as E0.-If a task requires reviewing vi
242、suals in detail then itshould be classified as E0.-If a task requires any use of a hand or walking then it should be classified asE0.-Tools built on top of the LLM cannot make any decisions that might impact human livelihood(e.g.hiring,grading,etc.).If any part of the task involves collecting inputs
243、 to make a final decision(as opposed toanalyzing data to inform a decision or make a recommendation)then it should be classified as E0.The LLMcan make recommendations.-Even if tools built on top of the LLM can do a task,if using those tools wouldnot save an experienced worker significant time comple
244、ting the task,then it should be classified as E0.-TheLLM and systems built on top of it cannot do anything that legally requires a human to perform the task.-If there is existing technology not powered by an LLM that is commonly used and can complete the taskthen you should mark the task E0 if using
245、 an LLM or LLM-powered tool will not further reduce the time tocomplete the task.When in doubt,you should default to E0.#Annotation examples:Occupation:Inspectors,Testers,Sorters,Samplers,and Weighers Task:Adjust,clean,or repair productsor processing equipment to correct defects found during inspect
246、ions.Label(E0/E1/E2/E3):E0 Explanation:The model does not have access to any kind of physicality,and more than half of the task(adjusting,cleaningand repairing equipment)described requires hands or other embodiment.Occupation:Computer and Information Research Scientists Task:Apply theoretical expert
247、ise andinnovation to create or apply new technology,such as adapting principles for applying computers to new uses.Label(E0/E1/E2/E3):E1 Explanation:The model can learn theoretical expertise during training as part of itsWORKING PAPERgeneral knowledge base,and the principles to adapt can be captured
248、 in the text input to the model.Activity:Schedule dining reservations.Label(E0/E1/E2/E3):E2 Explanation:Automation technologyalready exists for this(e.g.Resy)and its unclear what an LLM offers on top of using that technology(no-diff).That said,you could build something that allows you to ask the LLM
249、 to make a reservation on Resy for you.BO*NET Basic Skills DefinitionsBasic SkillsDeveloped capacities that facilitate learning or the more rapid acquisition of knowledge.ContentBackground structures needed to work with and acquire more specific skills in a variety of different domains.Reading Compr
250、ehension Understanding written sentences and paragraphs in work-related docu-ments.Active Listening Giving full attention to what other people are saying,taking time to understandthe points being made,asking questions as appropriate,and not interrupting at inappropriate times.Writing Communicating e
251、ffectively in writing as appropriate for the needs of the audience.Speaking Talking to others to convey information effectively.Mathematics Using mathematics to solve problems.Science Using scientific rules and methods to solve problems.ProcessProcedures that contribute to the more rapid acquisition
252、 of knowledge and skill across a variety of domains Critical Thinking Using logic and reasoning to identify the strengths and weaknesses of alternativesolutions,conclusions or approaches to problems.Active Learning Understanding the implications of new information for both current and futureproblem-
253、solving and decision-making.Learning Strategies Selecting and using training/instructional methods and procedures appropriatefor the situation when learning or teaching new things.Monitoring Monitoring/Assessing performance of yourself,other individuals,or organizations tomake improvements or take c
254、orrective action.Cross-Functional SkillsNote:We selected only Programming from the list of cross-functional skills because of our prior knowledgeabout the models ability to code.Programming-Writing computer programs for various purposes.WORKING PAPERCEducationMedian IncomeEmp(000s)H M H M H M No for
255、mal educational credential$31,90036,1870.050.060.100.100.150.15High school diploma or equivalent$45,47067,0330.090.130.200.250.310.37Postsecondary nondegree award$48,3159,6360.070.150.190.280.310.41Some college,no degree$40,9702,8980.230.340.390.530.550.72Associates degree$60,3603,5370.120.140.310.3
256、60.490.59Bachelors degree$78,37571,6980.230.170.470.510.700.84Masters degree$79,6053,2160.260.140.460.440.660.74Doctoral or professional degree$82,4205,2900.210.130.410.430.600.74Table 10:Mean exposure scores for occupations,grouped by typical education needed for entry into theoccupation.Alongside
257、exposure scores,we display the median of median annual income for each occupation,as well as the total number of workers in each group,in thousands.DIndustrial and Productivity ExposureFigures 6 and 7 show the overall employment-weighted relative exposure of 3-digit NAICS industriesaccording to huma
258、n raters and GPT-4 respectively(based on our exposure rubric).The impact potentialis present across nearly all industries,with wide heterogeneity.Both methods agree generally on relativeexposures:data processing,information processing,and hospitals all have high exposure.Recent productivity growth(b
259、oth total factor and labor)appears uncorrelated with exposure as well.Figures Dand D show little relationship between productivity growth since 2012 and current exposure to LLMs as ratedby the model.A high correlation between already fast-growing productive industries and exposure mightmean an exace
260、rbation of Baumols cost disease.In other words,if LLMs are likely to increase productivitydifferentially across industries,one concern is that the most productive would become even more productive.With inelastic demand for the production of those industries,the most productive sectors would shrink a
261、s aproportion of inputs in the economy.We see little to suggest this will be the case.Productivity growth since2012 and exposure to LLM technologies appear unrelated.WORKING PAPERFigure 6WORKING PAPERFigure 7WORKING PAPEREOccupations Without Any Exposed TasksOccupations with no labeled exposed tasks
262、Agricultural Equipment OperatorsAthletes and Sports CompetitorsAutomotive Glass Installers and RepairersBus and Truck Mechanics and Diesel Engine SpecialistsCement Masons and Concrete FinishersCooks,Short OrderCutters and Trimmers,HandDerrick Operators,Oil and GasDining Room and Cafeteria Attendants
263、 and Bartender HelpersDishwashersDredge OperatorsElectrical Power-Line Installers and RepairersExcavating and Loading Machine and Dragline Operators,Surface MiningFloor Layers,Except Carpet,Wood,and Hard TilesFoundry Mold and CoremakersHelpersBrickmasons,Blockmasons,Stonemasons,and Tile and Marble S
264、ettersHelpersCarpentersHelpersPainters,Paperhangers,Plasterers,and Stucco MasonsHelpersPipelayers,Plumbers,Pipefitters,and SteamfittersHelpersRoofersMeat,Poultry,and Fish Cutters and TrimmersMotorcycle MechanicsPaving,Surfacing,and Tamping Equipment OperatorsPile Driver OperatorsPourers and Casters,
265、MetalRail-Track Laying and Maintenance Equipment OperatorsRefractory Materials Repairers,Except BrickmasonsRoof Bolters,MiningRoustabouts,Oil and GasSlaughterers and Meat PackersStonemasonsTapersTire Repairers and ChangersWellhead PumpersTable 11:All 34 occupations for which none of our measures lab
266、eled any tasks as exposed.ReferencesAbid,A.,Farooqi,M.,and Zou,J.(2021).Persistent anti-muslim bias in large language models.InProceedings of the 2021 AAAI/ACM Conference on AI,Ethics,and Society,AIES 21,page 298306,New York,NY,USA.Association for Computing Machinery.WORKING PAPERAcemoglu,D.(2002).T
267、echnical change,inequality,and the labor market.Journal of Economic Literature,40.Acemoglu,D.and Autor,D.(2011a).Skills,tasks and technologies:Implications for employment andearnings.In Handbook of labor economics,volume 4,pages 10431171.Elsevier.Acemoglu,D.and Autor,D.(2011b).Skills,Tasks and Techn
268、ologies:Implications for Employment andEarnings.In Ashenfelter,O.and Card,D.,editors,Handbook of Labor Economics,volume 4 ofHandbookof Labor Economics,chapter 12,pages 10431171.Elsevier.Acemoglu,D.,Autor,D.,Hazell,J.,and Restrepo,P.(2020).Ai and jobs:Evidence from online vacancies.Technical report,N
269、ational Bureau of Economic Research.Acemoglu,D.and Restrepo,P.(2018).The race between man and machine:Implications of technology forgrowth,factor shares,and employment.American economic review,108(6):14881542.Acemoglu,D.and Restrepo,P.(2019).Automation and new tasks:How technology displaces and rein
270、stateslabor.Journal of Economic Perspectives,33(2):330.Acemoglu,D.and Restrepo,P.(2022a).Demographics and automation.The Review of Economic Studies,89(1):144.Acemoglu,D.and Restrepo,P.(2022b).Tasks,automation,and the rise in us wage inequality.Econometrica,90(5):19732016.Aghion,P.,Jones,B.F.,and Jon
271、es,C.I.(2018).Artificial intelligence and economic growth.InTheeconomics of artificial intelligence:An agenda,pages 237282.University of Chicago Press.Agrawal,A.K.,Gans,J.S.,and Goldfarb,A.(2021).Ai adoption and system-wide change.Technical report,National Bureau of Economic Research.Arntz,M.,Gregor
272、y,T.,and Zierahn,U.(2017).Revisiting the risk of automation.Economics Letters,159:157160.Autor,D.,Chin,C.,Salomons,A.M.,and Seegmiller,B.(2022a).New frontiers:The origins and content ofnew work,19402018.Technical report,National Bureau of Economic Research.Autor,D.,Mindell,D.A.,and Reynolds,E.B.(202
273、2b).The Work of the Future:Building Better Jobs in anAge of Intelligent Machines.The MIT Press.Autor,D.H.,Katz,L.F.,and Kearney,M.S.(2006).The polarization of the us labor market.Americaneconomic review,96(2):189194.Autor,D.H.,Levy,F.,and Murnane,R.J.(2003).The skill content of recent technological
274、change:Anempirical exploration.The Quarterly journal of economics,118(4):12791333.Babina,T.,Fedyk,A.,He,A.,and Hodson,J.(2021).Artificial intelligence,firm growth,and productinnovation.Firm Growth,and Product Innovation(November 9,2021).Bai,Y.,Jones,A.,Ndousse,K.,Askell,A.,Chen,A.,DasSarma,N.,Drain,
275、D.,Fort,S.,Ganguli,D.,Henighan,T.,Joseph,N.,Kadavath,S.,Kernion,J.,Conerly,T.,El-Showk,S.,Elhage,N.,Hatfield-Dodds,Z.,Hernandez,D.,Hume,T.,Johnston,S.,Kravec,S.,Lovitt,L.,Nanda,N.,Olsson,C.,Amodei,D.,Brown,T.,Clark,J.,McCandlish,S.,Olah,C.,Mann,B.,and Kaplan,J.(2022).Training a Helpful andHarmless A
276、ssistant with Reinforcement Learning from Human Feedback.arXiv:2204.05862 cs.WORKING PAPERBaumol,W.J.(2012).The cost disease:Why computers get cheaper and health care doesnt.Yale universitypress.Benzell,S.G.,Kotlikoff,L.J.,LaGarda,G.,and Ye,V.Y.(2021).Simulating endogenous global automation.Working
277、Paper 29220,National Bureau of Economic Research.Bessen,J.(2018).Artificial intelligence and jobs:The role of demand.InThe economics of artificialintelligence:an agenda,pages 291307.University of Chicago Press.BLS(2022).Employment by detailed occupation.BLS(2023a).Demographic characteristics(cps).BL
278、S(2023b).Occupational outlook handbook a-z index.Bommasani,R.,Hudson,D.A.,Adeli,E.,Altman,R.,Arora,S.,von Arx,S.,Bernstein,M.S.,Bohg,J.,Bosselut,A.,Brunskill,E.,et al.(2021).On the opportunities and risks of foundation models.arXivpreprint arXiv:2108.07258.Bresnahan,T.(2019).Artificial intelligence
279、technologies and aggregate growth prospects.Bresnahan,T.,Greenstein,S.,Brownstone,D.,and Flamm,K.(1996).Technical progress and co-inventionin computing and in the uses of computers.Brookings Papers on Economic Activity.Microeconomics,1996:183.Bresnahan,T.F.(1999).Computerisation and wage dispersion:
280、an analytical reinterpretation.The economicjournal,109(456):390415.Bresnahan,T.F.,Brynjolfsson,E.,andHitt,L.M.(2002).Informationtechnology,workplaceorganization,andthe demand for skilled labor:Firm-level evidence.The quarterly journal of economics,117(1):339376.Bresnahan,T.F.and Trajtenberg,M.(1995)
281、.General purpose technologies engines of growth?Journal ofeconometrics,65(1):83108.Brown,T.,Mann,B.,Ryder,N.,Subbiah,M.,Kaplan,J.D.,Dhariwal,P.,Neelakantan,A.,Shyam,P.,Sastry,G.,Askell,A.,et al.(2020).Language models are few-shot learners.Advances in neural informationprocessing systems,33:18771901.
282、Brynjolfsson,E.,Frank,M.R.,Mitchell,T.,Rahwan,I.,and Rock,D.(2023).Quantifying the Distribution ofMachine Learnings Impact on Work.Forthcoming.Brynjolfsson,E.and Mitchell,T.(2017).What can machine learning do?workforce implications.Science,358(6370):15301534.Brynjolfsson,E.,Mitchell,T.,and Rock,D.(2
283、018).What can machines learn,and what does it mean foroccupations and the economy?AEA Papers and Proceedings,108:4347.Brynjolfsson,E.,Rock,D.,and Syverson,C.(2021).The productivity j-curve:How intangibles complementgeneral purpose technologies.American Economic Journal:Macroeconomics,13(1):33372.Cha
284、se,H.(2022).LangChain.Chen,M.,Tworek,J.,Jun,H.,Yuan,Q.,Pinto,H.P.d.O.,Kaplan,J.,Edwards,H.,Burda,Y.,Joseph,N.,Brockman,G.,et al.(2021).Evaluating large language models trained on code.arXiv preprintarXiv:2107.03374.WORKING PAPERCheng,Z.,Lee,D.,and Tambe,P.(2022).Innovae:Generative ai for understandi
285、ng patents and innovation.Available at SSRN.Chow,A.R.(2023).Why ChatGPT Is the Fastest Growing Web Platform Ever|Time.Cockburn,I.M.,Henderson,R.,and Stern,S.(2018).The impact of artificial intelligence on innovation:Anexploratory analysis.InThe economics of artificial intelligence:An agenda,pages 11
286、5146.Universityof Chicago Press.Constantz,J.(2023).Nearly a third of white collar workers have tried chatgpt or other ai programs,accordingto a new survey.David,P.A.(1990).The dynamo and the computer:an historical perspective on the modern productivityparadox.The American Economic Review,80(2):35536
287、1.Devlin,J.,Chang,M.-W.,Lee,K.,and Toutanova,K.(2019).Bert:Pre-training of deep bidirectionaltransformers for language understanding.ArXiv,abs/1810.04805.Dixon,J.,Hong,B.,and Wu,L.(2021).The robot revolution:Managerial and employment consequences forfirms.Management Science,67(9):55865605.Feigenbaum
288、,J.J.and Gross,D.P.(2021).Organizational frictions and increasing returns to automation:Lessons from at&t in the twentieth century.Technical report,National Bureau of Economic Research.Felten,E.,Raj,M.,and Seamans,R.(2023).How will language modelers like chatgpt affect occupations andindustries?arXi
289、v preprint arXiv:2303.01157.Felten,E.W.,Raj,M.,and Seamans,R.(2018).A method to link advances in artificial intelligence tooccupational abilities.AEA Papers and Proceedings,108:5457.Frey,C.B.(2019).The technology trap.In The Technology Trap.Princeton University Press.Frey,C.B.andOsborne,M.A.(2017).T
290、hefutureofemployment:Howsusceptiblearejobstocomputerisation?Technological Forecasting and Social Change,114(C):254280.Goldfarb,A.,Taska,B.,andTeodoridis,F.(2023).Couldmachinelearningbeageneralpurposetechnology?acomparison of emerging technologies using data from online job postings.Research Policy,5
291、2(1):104653.Goldstein,J.A.,Sastry,G.,Musser,M.,DiResta,R.,Gentzel,M.,and Sedova,K.(2023).Generative languagemodels and automated influence operations:Emerging threats and potential mitigations.Grace,K.,Salvatier,J.,Dafoe,A.,Zhang,B.,and Evans,O.(2018).When will ai exceed human performance?evidence f
292、rom ai experts.Journal of Artificial Intelligence Research,62:729754.Hernandez,D.,Kaplan,J.,Henighan,T.,and McCandlish,S.(2021).Scaling laws for transfer.arXiv preprintarXiv:2102.01293.Horton,J.J.(2023).Large language models as simulated economic agents:What can we learn from homosilicus?arXiv prepr
293、int arXiv:2301.07543.Huang,M.-H.and Rust,R.T.(2018).Artificial intelligence in service.Journal of service research,21(2):155172.Kaplan,J.,McCandlish,S.,Henighan,T.,Brown,T.B.,Chess,B.,Child,R.,Gray,S.,Radford,A.,Wu,J.,and Amodei,D.(2020).Scaling laws for neural language models.arXiv preprint arXiv:2
294、001.08361.WORKING PAPERKatz,L.F.and Murphy,K.M.(1992).Changes in relative wages,19631987:supply and demand factors.The quarterly journal of economics,107(1):3578.Khlaaf,H.,Mishkin,P.,Achiam,J.,Krueger,G.,and Brundage,M.(2022).A hazard analysis framework forcode synthesis large language models.Klinov
295、a,K.and Korinek,A.(2021).Ai and shared prosperity.InAIES 2021-Proceedings of the 2021AAAI/ACM Conference on AI,Ethics,and Society.Kogan,L.,Papanikolaou,D.,Schmidt,L.D.W.,and Seegmiller,B.(2021).Technology,vintage-specifichuman capital,and labor displacement:Evidence from linking patents with occupat
296、ions.Working Paper29552,National Bureau of Economic Research.Korinek,A.(2023).Language models and cognitive automation for economic research.Technical report,National Bureau of Economic Research.Korinek,A.and Stiglitz,J.E.(2018).Artificial intelligence and its implications for income distributionand
297、 unemployment.InThe economics of artificial intelligence:An agenda,pages 349390.University ofChicago Press.Lipsey,R.G.,Carlaw,K.I.,andBekar,C.T.(2005).Economictransformations:generalpurposetechnologiesand long-term economic growth.Oup Oxford.Meindl,B.,Frank,M.R.,and Mendona,J.(2021).Exposure of occu
298、pations to technologies of the fourthindustrial revolution.arXiv preprint arXiv:2110.13317.Mialon,G.,Dess,R.,Lomeli,M.,Nalmpantis,C.,Pasunuru,R.,Raileanu,R.,Rozire,B.,Schick,T.,Dwivedi-Yu,J.,Celikyilmaz,A.,et al.(2023).Augmented language models:a survey.arXiv preprintarXiv:2302.07842.Moll,B.,Rachel,
299、L.,and Restrepo,P.(2021).Uneven growth:Automations impact on income and wealthinequality.SSRN Electronic Journal.Mollick,E.R.and Mollick,L.(2022).New modes of learning enabled by ai chatbots:Three methods andassignments.Available at SSRN.Noy,S.and Zhang,W.(2023).Experimental evidence on the producti
300、vity effects of generative artificialintelligence.Available at SSRN 4375283.O*NET(2023).O*net 27.2 database.OpenAI(2022).Introducing chatgpt.OpenAI(2023a).Gpt-4 system card.Technical report,OpenAI.OpenAI(2023b).Gpt-4 technical report.Technical report,OpenAI.Ouyang,L.,Wu,J.,Jiang,X.,Almeida,D.,Wainwr
301、ight,C.L.,Mishkin,P.,Zhang,C.,Agarwal,S.,Slama,K.,Ray,A.,et al.(2022).Training language models to follow instructions with human feedback.arXivpreprint arXiv:2203.02155.Peng,S.,Kalliamvakou,E.,Cihon,P.,and Demirer,M.(2023).The impact of ai on developer productivity:Evidence from github copilot.arXiv
302、 preprint arXiv:2302.06590.WORKING PAPERRadford,A.,Wu,J.,Child,R.,Luan,D.,Amodei,D.,Sutskever,I.,et al.(2019).Language models areunsupervised multitask learners.OpenAI blog,1(8):9.ResumeB(2023).1 in 4 companies have already replaced workers with chatgpt.Rock,D.(2019).Engineering value:The returns to
303、 technological talent and investments in artificialintelligence.Available at SSRN 3427412.Schick,T.,Dwivedi-Yu,J.,Dess,R.,Raileanu,R.,Lomeli,M.,Zettlemoyer,L.,Cancedda,N.,and Scialom,T.(2023).Toolformer:Language models can teach themselves to use tools.arXiv preprint arXiv:2302.04761.Schramowski,P.,
304、Turan,C.,Andersen,N.,Rothkopf,C.A.,and Kersting,K.(2022).Large pre-trainedlanguage models contain human-like biases of what is right and wrong to do.Nature Machine Intelligence,4(3):258268.Shahaf,D.andHorvitz,E.(2010).Generalizedtaskmarketsforhumanandmachinecomputation.Proceedingsof the AAAI Confere
305、nce on Artificial Intelligence.Singla,A.K.,Horvitz,E.,Kohli,P.,and Krause,A.(2015).Learning to hire teams.InAAAI Conference onHuman Computation&Crowdsourcing.Solaiman,I.,Brundage,M.,Clark,J.,Askell,A.,Herbert-Voss,A.,Wu,J.,Radford,A.,Krueger,G.,Kim,J.W.,Kreps,S.,McCain,M.,Newhouse,A.,Blazakis,J.,McG
306、uffie,K.,and Wang,J.(2019).Releasestrategies and the social impacts of language models.Sorensen,T.,Robinson,J.,Rytting,C.,Shaw,A.,Rogers,K.,Delorey,A.,Khalil,M.,Fulda,N.,and Wingate,D.(2022).An information-theoretic approach to prompt engineering without ground truth labels.InProceedings of the 60th
307、 Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Association for Computational Linguistics.Thoppilan,R.,De Freitas,D.,Hall,J.,Shazeer,N.,Kulshreshtha,A.,Cheng,H.-T.,Jin,A.,Bos,T.,Baker,L.,Du,Y.,et al.(2022).Lamda:Language models for dialog applications.arXiv pre
308、print arXiv:2201.08239.Tolan,S.,Pesole,A.,Mart nez-Plumed,F.,Fernndez-Mac as,E.,Hernndez-Orallo,J.,and Gmez,E.(2021).Measuring the occupational impact of ai:tasks,cognitive abilities and ai benchmarks.Journal ofArtificial Intelligence Research,71:191236.Van Reenen,J.(2011).Wage inequality,technology
309、 and trade:21st century evidence.Labour economics,18(6):730741.Webb,M.(2020).The impact of artificial intelligence on the labor market.Working paper,Stanford University.Weidinger,L.et al.(2021).Ethical and social risks of harm from language models.arXiv:2112.04359 cs.Weidinger,L.,Uesato,J.,Rauh,M.,G
310、riffin,C.,Huang,P.-S.,Mellor,J.,Glaese,A.,Cheng,M.,Balle,B.,Kasirzadeh,A.,Biles,C.,Brown,S.,Kenton,Z.,Hawkins,W.,Stepleton,T.,Birhane,A.,Hendricks,L.A.,Rimell,L.,Isaac,W.,Haas,J.,Legassick,S.,Irving,G.,and Gabriel,I.(2022).Taxonomy of risks posed bylanguage models.In2022 ACM Conference on Fairness,A
311、ccountability,and Transparency,FAccT 22,page 214229,New York,NY,USA.Association for Computing Machinery.Zolas,N.,Kroff,Z.,Brynjolfsson,E.,McElheran,K.,Beede,D.N.,Buffington,C.,Goldschlag,N.,Foster,L.,and Dinlersoz,E.(2021).Advanced technologies adoption and use by us firms:Evidence from the annualbusiness survey.Technical report,National Bureau of Economic Research.