This paper explores the potential impacts of large language models (LLMs) on the Chinese labor market. We analyze occupational exposure to LLM capabilities by incorporating human expertise and LLM classifications, following Eloundou et al. (2023)'s methodology. We then aggregate occupation exposure to the industry level to obtain industry exposure scores. The results indicate a positive correlation between occupation exposure and wage levels/experience premiums, suggesting higher-paying and experience-intensive jobs may face greater displacement risks from LLM-powered software. The industry exposure scores align with expert assessments and economic intuitions. We also develop an economic growth model incorporating industry exposure to quantify the productivity-employment trade-off from AI adoption. Overall, this study provides an analytical basis for understanding the labor market impacts of increasingly capable AI systems in China. Key innovations include the occupation-level exposure analysis, industry aggregation approach, and economic modeling incorporating AI adoption and labor market effects. The findings will inform policymakers and businesses on strategies for maximizing the benefits of AI while mitigating adverse disruption risks. 本文探讨了大型语言模型 (LLMs) 对中国劳动力市场的潜在影响。我们遵循 Eloundou 等人(2023 年)的方法,通过结合人类专业知识和 LLM。然后,我们将职业风险汇总到行业层面,以获得行业风险敞口分数。结果表明,职业风险与工资水平/经验溢价之间存在正相关关系,这表明高薪和经验密集型工作可能面临更大的取代风险,因为 LLM 驱动的软件。行业曝光分数与专家评估和经济直觉一致。我们还开发了一个经济增长模型,纳入行业敞口,以量化人工智能采用带来的生产力-就业权衡。总体而言,本研究为理解中国功能日益强大的 AI 系统对劳动力市场的影响提供了分析基础。主要创新包括职业层面的风险敞口分析、行业聚合方法以及结合 AI 采用和劳动力市场影响的经济建模。这些发现将为政策制定者和企业提供战略信息,以最大限度地发挥 AI 的优势,同时降低不利的中断风险。
Introduction 介绍
The recent remarkable progress in the field of generative AI and large language models (LLMs) (Bubeck et al. 2023; Zhao et al. 2023) has provoke many pressing questions about the effects of these powerful technologies on the economy. One of the most significant questions surrounding advances in Generative AI and LLMs is the impact these technologies will have on the dynamics of labor market due to the influence of LLMs on labor inputs. A branch of research emphasizing on the disruptive labor market impacts of LLMs is emerging quickly, however, it predominantly focuses attention on the labor market in developed economy, in particularly U.S. (Eloundou et al. 2023; Peng et al. 2023; Noy and Zhang 2023; Brynjolfsson, Li, and Raymond 2023; Felten, Raj, and Seamans 2023). Nevertheless, countries differ in their labor market structures such as occupation and industry 生成式 AI 和大型语言模型领域最近的显着进展 (LLMs) (Bubeck et al. 2023;Zhao et al. 2023)引发了许多关于这些强大技术对经济影响的紧迫问题。围绕生成式 AI LLMs 进步的最重要问题之一是,由于LLMs对劳动力投入的影响,这些技术将对劳动力市场动态产生影响。一个强调颠覆性劳动力市场影响的研究分支LLMs正在迅速出现,然而,它主要将注意力集中在发达经济体的劳动力市场,尤其是美国(Eloundou 等人,2023 年;Peng 等人,2023 年;Noy 和 Zhang 2023;Brynjolfsson、Li 和 Raymond 2023;Felten、Raj 和 Seamans 2023)。然而,各国的劳动力市场结构(如职业和工业)各不相同
composition. Even for the same occupation, the detailed task composition or work content may show great discrepancies across countries. Therefore, this paper analyzes the potential impacts of LLMs on China’s labor market. To construct our primary exposure index, we use a recently developed methodology to systematically assess which occupations are most exposed to advances in LLMs in China. Specifically, we employ three large language models - GPT4 (OpenAI 2023), InternLM (InternLM-Team 2023), and GLM (Zeng et al. 2022; Du et al. 2022) - as classifiers to determine the occupational exposure based on the detailed description for each occupation contained in the general code of occupational classification of the People’s Republic of China. We also employ expert annotators to explore the impacts of LLMs, to make comparisons, and to shed more light on this issue. 组成。即使是相同的职业,各国的详细任务构成或工作内容也可能表现出很大的差异。因此,本文分析了 LLMs 对中国劳动力市场的潜在影响。为了构建我们的主要暴露指数,我们使用了一种最近开发的方法,系统地评估了哪些职业最容易受到中国 LLMs的影响。具体来说,我们采用了三种大型语言模型——GPT4 (OpenAI 2023)、InternLM (InternLM-Team 2023) 和 GLM(Zeng 等人,2022 年;Du et al. 2022) - 作为分类器,根据中华人民共和国职业分类总则中包含的每个职业的详细描述来确定职业暴露。我们还聘请了专家注释者来探索 LLMs,以进行比较,并更清楚地了解这个问题。
We then characterize the profile of occupational exposure to LLMs based on characteristics of the occupation in China. Our analysis indicates that the impacts of LLMs on China’s labor market are likely to be pervasive and diverse. The results show a great heterogeneous exposure across occupations and suggest that more educated, relatively high-paid, white-collar occupations may be most exposed to LLMs. In addition to the positive correlation between wage, education and occupational exposure, we also find a positive correlation between experience premium and exposure to LLMs, implying a potential diminishing return of learning by doing in the future. We analyze exposure by industry and discover that the education and healthcare industries exhibit high exposure, while manufacturing, agriculture, mining, and construction demonstrate lower exposure. Unlike other developed countries, that young and old cohort distribute highly unevenly across industries in China demonstrate a great variance of demographic exposure of LLMs. Relatively LLMs inevitably exert a higher impact on young age employers. 然后,我们LLMs根据中国的职业特征来描述职业暴露概况。我们的分析表明,对LLMs中国劳动力市场的影响可能是普遍和多样的。结果显示,不同职业之间存在很大的异质性,并表明受教育程度更高、收入相对较高的白领职业可能最容易受到影响LLMs。除了工资、教育和职业接触之间的正相关外,我们还发现经验溢价与接触之间存在正相关LLMs,这意味着未来边做边学的回报可能会递减。我们按行业分析了风险敞口,发现教育和医疗保健行业的风险敞口较高,而制造业、农业、采矿和建筑业的风险敞口较低。与其他发达国家不同,中国的年轻人和老年人群体在各个行业的分布高度不均衡,这表明 LLMs的人口敞口存在很大差异。相对LLMs不可避免地对年轻雇主产生更大的影响。
Our analysis also indicates that LLMs will have a greater impact on labor demand. We utilize an online job postings dataset from January 2017 to December 2022 in China to construct an occupational vacancies index. We find a positive correlation between the share of vacancies and exposure score at occupation level. This implies that the labor demand structure is likely to exacerbate the disruptive im- 我们的分析还表明,这将LLMs对劳动力需求产生更大的影响。我们利用 2017 年 1 月至 2022 年 12 月中国的在线职位发布数据集来构建职业空缺指数。我们发现职位空缺份额与职业层面的暴露得分呈正相关。这意味着劳动力需求结构可能会加剧破坏性的影响。
pacts on the labor market in China. Additionally a positive correlation between the growth rate of vacancy shares and exposure scores indicates a potential reversal of the occupational labor demand trend. Contrary to popular expectations, the economic and labor market structure in China exacerbate the disruptive impacts of LLMs rather than alleviate them. 关于中国劳动力市场的协议。此外,空缺份额增长率与暴露分数之间的正相关表明职业劳动力需求趋势可能逆转。与普遍预期相反,中国的经济和劳动力市场结构加剧了 LLMs,而不是减轻它们。
When faced with the potential relevance of LLMs, it becomes imperative to address the issue of regulating LLMs. To structure our discussion, we present a simple theoretical model. Transformative technologies like LLMs exert a heterogeneous impact on different occupations or industries. The regulation of LLMs needs to strike a balance between the positive productivity effects and the negative disruption costs on labor markets. Our analysis provides the initial steps toward a comprehensive quantitative evaluation. 当面临 LLMs的潜在相关性时,必须解决监管LLMs问题。为了组织我们的讨论,我们提出了一个简单的理论模型。变革性技术(如LLMs)对不同的职业或行业产生异质性影响。监管LLMs需要在对劳动力市场的积极生产率影响和消极的破坏成本之间取得平衡。我们的分析为全面定量评估提供了初始步骤。
The paper is structured as follows: Section 2 reviews the related literature, Section 3 discusses methods and data collection, Section 4 presents the main results, Section 5 introduces a theoretical model to further discuss the impacts of LLMs, and Section 6 offers concluding remarks. 本文的结构如下:第 2 节回顾了相关文献,第 3 节讨论了方法和数据收集,第 4 节介绍了主要结果,第 5 节介绍了一个理论模型以进一步讨论其LLMs影响,第 6 节提供了结论性评论。
Literature Review 文献综述
Artificial Intelligence, like previous technologies, is poised to impact the economy in various ways, potentially fostering economic growth and reshaping the labor market structure (Furman and Seamans 2019; Goldfarb, Taska, and Teodoridis 2023). A substantial and expanding body of literature delves into the labor market consequences of artificial intelligence and automation technologies broadly defined. The skill-biased technological change framework (Katz and Murphy 1992; Acemoglu 2002), along with the task-based framework of automation (Autor, Levy, and Murnane 2003; Acemoglu and Autor 2011; Acemoglu and Restrepo 2018) are often regarded as the foundational frameworks for comprehending technology’s impact on the labor market. This line of research has introduced the concept of routine-biased technological change, indicating that workers engaged in routine tasks face a heightened risk of displacement due to technological advancements. Numerous studies have demonstrated that automation technologies have contributed to both income inequality and job polarization, driven by declines in relative wages and employment for workers specializing in routine tasks (Autor, Katz, and Kearney 2006; Van Reenen 2011; Acemoglu and Restrepo 2022). The influence of AI on work is anticipated to be multi-faceted. Recent studies have made distinctions between task-displacement and task-reinstatement effects of technology, whereby new technology introduces novel occupations that bolster labor demand (Acemoglu and Restrepo 2018, 2019). 与以前的技术一样,人工智能有望以各种方式影响经济,从而可能促进经济增长并重塑劳动力市场结构(Furman 和 Seamans 2019;Goldfarb、Taska 和 Teodoridis 2023 年)。大量且不断扩大的文献深入探讨了广义的人工智能和自动化技术对劳动力市场的影响。以技能为导向的技术变革框架(Katz 和 Murphy 1992;Acemoglu 2002 年)以及基于任务的自动化框架(Autor、Levy 和 Murnane 2003 年;Acemoglu 和 Autor 2011;Acemoglu 和 Restrepo 2018)通常被视为理解技术对劳动力市场影响的基础框架。这一研究路线引入了偏向常规的技术变革的概念,表明由于技术进步,从事日常任务的工人面临更高的流离失所风险。大量研究表明,自动化技术导致了收入不平等和工作两极分化,这是由于专门从事日常任务的工人的相对工资和就业率下降(Autor、Katz 和 Kearney 2006 年;Van Reenen 2011 年;Acemoglu 和 Restrepo 2022)。预计 AI 对工作的影响将是多方面的。最近的研究区分了技术的任务取代效应和任务恢复效应,其中新技术引入了支持劳动力需求的新职业(Acemoglu 和 Restrepo 2018,2019)。
Historically, prior research has predominantly adopted the task-oriented approach to analyze the labor market impacts of artificial intelligence. Various methods have been employed to evaluate the similarity between AI capabilities and the tasks performed by workers across different occupations. These methodologies encompass aligning AI capabilities with diverse skills and abilities demanded by distinct occupations (Felten, Raj, and Seamans 2018; Tolan et al. 2021), mapping AI patent descriptions to worker task descriptions 从历史上看,以前的研究主要采用以任务为导向的方法来分析人工智能对劳动力市场的影响。已经采用了各种方法来评估 AI 能力与不同职业工人所执行任务之间的相似性。这些方法包括将 AI 能力与不同职业所需的各种技能和能力保持一致(Felten、Raj 和 Seamans 2018 年;Tolan 等人,2021 年),将 AI 专利描述映射到工人任务描述
(Webb 2019; Meindl, Frank, and Mendonça 2021), employing machine learning classifiers to estimate the potential for automation across all occupations (Frey and Osborne 2017), devising innovative rubric the suitability of worker activities by machine learning (Brynjolfsson, Mitchell, and Rock 2018), and leveraging expert forecasts (Grace et al. 2018). (韦伯 2019 年;Meindl、Frank 和 Mendonça 2021),使用机器学习分类器来估计所有职业自动化的潜力(Frey 和 Osborne 2017),通过机器学习设计工人活动适用性的创新量规(Brynjolfsson、Mitchell 和 Rock 2018),并利用专家预测(Grace 等人,2018 年)。
Nevertheless, this line of research is becoming increasingly challenging due to the evolving and advancing capabilities of AI. Following recent breakthroughs in Generative AI and LLMs, there has been a growing body of studies investigating the specific economic impacts and opportunities presented by LLMs. For instance, Peng et al. (2023) conducted a study where software engineers were enlisted for a specific coding task, revealing that those with access to GitHub Copilot completed the task twice as quickly. Similarly, Noy and Zhang (2023) conducted an online experiment to explore the displacement effects of Generative AI on professional writing tasks. Additionally, Brynjolfsson, Li, and Raymond (2023) examined the effects of Generative AI on customer support agents. Pertinent to this paper, Felten, Raj, and Seamans (2023) explored the heterogeneity in occupational exposure, while Eloundou et al. (2023) introduced a novel rubric to assess the impacts of LLMs on labor forces. Concurrently with this line of inquiry, we aim to characterize the potential relevance of LLMs to China’s labor market in particular. 然而,由于 AI 的发展和进步,这一研究领域变得越来越具有挑战性。随着生成式 AI 的最新突破LLMs,越来越多的研究调查了 LLMs.例如,Peng 等人(2023 年)进行了一项研究,其中软件工程师被招募来执行特定的编码任务,结果显示,有权访问 GitHub Copilot 的人完成任务的速度是其两倍。同样,Noy 和 Zhang (2023) 进行了一项在线实验,以探索生成式 AI 对专业写作任务的位移影响。此外,Brynjolfsson、Li 和 Raymond (2023) 研究了生成式 AI 对客户支持代理的影响。与本文相关的是,Felten、Raj 和 Seamans (2023) 探讨了职业暴露的异质性,而 Eloundou 等人(2023 年)引入了一个新的标准来评估对LLMs劳动力的影响。在进行这一研究的同时,我们的目标是描述与中国LLMs劳动力市场的潜在相关性。
Our paper is intricately linked to the emerging literature of on the regulation of transformative technologies like LLMs. Preliminary contributions to the AI and regulation discourse include works such as Galasso and Luo (2018) and Agrawal, Gans, and Goldfarb (2019), which delve into the implications of privacy and trade policies concerning AI adoption. Acemoglu and Lensman (2023) develop a multi-sector technology adoption model to explore the optimal regulation of transformative technologies in instances where society can progressively comprehend associated risks. In line with this discourse, we present a static multi-sector model aimed at addressing the regulatory tradeoff concerning LLMs. This model is designed to incorporate the potential disruptions stemming from the labor replacement effect, thereby contributing to a comprehensive understanding of the regulatory dynamics surrounding LLMs. 我们的论文与关于监管变革性技术的新兴文献有着错综复杂的联系,例如 LLMs.对人工智能和监管话语的初步贡献包括 Galasso 和 Luo (2018) 以及 Agrawal、Gans 和 Goldfarb (2019) 等作品,这些作品深入探讨了隐私和贸易政策对人工智能采用的影响。Acemoglu 和 Lensman (2023) 开发了一个多部门技术采用模型,以探索在社会可以逐步理解相关风险的情况下对变革性技术的最佳监管。根据这一论述,我们提出了一个静态多部门模型,旨在解决有关 LLMs.该模型旨在纳入劳动力替代效应产生的潜在干扰,从而有助于全面理解围绕 LLMs的监管动态。
Methodology of Exposure Scoring and Data Collection
Data on Occupations in China 暴露评分和数据收集 方法
中国职业数据
In order to assess occupational exposure scoring in China and facilitate standardized comparisons, a consistent occupational classification system is essential. The Occupational Classification Dictionary of the People’s Republic of China (OCD) version 2022, published by the National Bureau of Statistics of China (NBS) and the Ministry of Human Resources and Social Security of the People’s Republic of China (MOHRSS), provides such a classification system that serves as a standardized analytical tool for occupations. In particular, we utilize the general code of Occupational Classification of the People’s Republic of China 2022, which encompasses comprehensive information across 8 large cat- 为了评估中国的职业暴露评分并促进标准化比较,一致的职业分类系统至关重要。中华人民共和国职业分类词典 (OCD) 2022 版由中国国家统计局 (NBS) 和中华人民共和国人力资源和社会保障部 (MOHRSS) 发布,提供了这样一个分类系统,作为职业的标准化分析工具。特别是,我们利用了《中华人民共和国职业分类 2022》通用代码,其中包含了 8 种大型猫类的综合信息。
egories, 79 medium categories, 449 small categories, and 1636 fine categories. This information, includes the definition of each occupation, the content and format of work activities, as well as specific description of the scope of work activities. We leverage the detailed descriptions of various occupations to facilitate the classification of online job vacancies into distinct occupational categories. To provide further clarity, a sample of occupations and their exposure to LLMs, categorized by medium categories, is presented in Appendix A. 自我,79 个中等类别,449 个小类别和 1636 个精细类别。这些信息包括每个职业的定义、工作活动的内容和格式,以及工作活动范围的具体描述。我们利用各种职业的详细描述来促进将在线职位空缺分类为不同的职业类别。为了进一步明确起见,附录 A 中提供了职业及其对 LLMs,按中等类别分类。
Data on Wages and Vacancies 工资和职位空缺数据
To acquire both vacancy and wage information, we leverage two datasets. Our primary data source is an online job posting dataset collected by the City Data Group. The dataset compiles online job postings spanning from January 2017 to December 2022, originating from major online job market platforms in China including zhaoping.com, 51 job, 58.com58 . c o m, Ganji.com, Lagou.com, and Kanzhun. This comprehensive compilation encompasses over 800 million job openings across nearly 400 cities and 5.2 million employers. For each job vacancy entry within this database, we have access to a range of information: including the posting date, position type, occupation titles, the quantity of workers to be recruited, wage ranges (if specified), education requirements (if applicable), work experience prerequisites (if indicated), the name of the employing firm, the work location for the position, and the textual content of job descriptions. Following the classification of each job posting distinct occupations, we derive corresponding statistics encompassing the number of vacancies, the typical educational qualifications required for entry, and the wage structure within each occupation. 为了获得职位空缺和工资信息,我们利用了两个数据集。我们的主要数据源是 City Data Group 收集的在线招聘信息数据集。该数据集汇集了 2017 年 1 月至 2022 年 12 月期间来自中国主要在线就业市场平台的在线招聘信息,包括 zhaoping.com、51 job、 58.com58 . c o mGanji.com、Lagou.com 和 Kanzhuun。这份全面的汇编涵盖了近 400 个城市和 520 万雇主的 8 亿多个职位空缺。对于此数据库中的每个职位空缺条目,我们可以访问一系列信息:包括发布日期、职位类型、职业名称、要招聘的工人数量、工资范围(如果指定)、教育要求(如果适用)、工作经验先决条件(如果注明)、雇主公司名称、职位的工作地点、 以及职位描述的文本内容。根据每个职位发布的不同职业进行分类,我们得出相应的统计数据,包括职位空缺数量、入职所需的典型教育资格以及每个职业的工资结构。
Our second data source is the China Labor-force Dynamic Survey (CLDS) 2016, conducted by the Social Science Survey Center of Sun Yat-sen University. The CLDS surveys the working-age population to explore aspects such as education, employment, labor rights, occupational mobility, health, and well-being. The survey encompasses comprehensive industry and occupation information for each employment entry. Consequently, we can harness this dataset to construct an occupational intensity index for each industry. This index facilitates the acquisition of an exposure score at the industry level derived from occupational exposure scores. Further details regarding the occupational intensity index across 15 industry categories are presented in Appendix C. 我们的第二个数据来源是由中山大学社会科学调查中心进行的 2016 年中国劳动力动态调查 (CLDS)。CLDS 对劳动年龄人口进行调查,以探讨教育、就业、劳动权利、职业流动性、健康和福祉等方面。该调查包括每个就业条目的全面行业和职业信息。因此,我们可以利用这个数据集来构建每个行业的职业强度指数。该指数有助于获得从职业暴露评分得出的行业层面的暴露评分。附录 C 中介绍了 15 个行业类别的职业强度指数的更多详细信息。
Methodology of Exposure Scoring 曝光评分方法
To evaluate the likelihood of an occupation in China undergoing a disruptive shock due to the widespread availability and utilization of LLMs, we continue to adopt a taskbased approach. We gauge the exposure of each occupation to LLMs based on the comprehensive occupation descriptions in Chinese, as documented the Occupational Classification Dictionary of the People’s Republic of China 2022. Building upon the exposure scoring methods proposed by Eloundou et al. (2023) which conceive an occupation as a collection of tasks, assess whether a given occupation can be executed more efficiently using ChatGPT or analogous 为了评估由于 LLMs,我们继续采用基于任务的方法。我们根据中文的综合职业描述来衡量每个职业对 LLMs程度,如 2022 年中华人民共和国职业分类词典所述。基于 Eloundou 等人(2023 年)提出的暴露评分方法,该方法将职业视为任务的集合,评估是否可以使用 ChatGPT 或类似方法更有效地执行给定的职业
LLMs. Our methodology employs three prominent large language models to determine exposure of various occupations. Specifically, we utilize Open AI’s GPT-4 model (OpenAI 2023), the InternLM model developed by Shanghai AI Laboratory and Sense Time (InternLM-Team 2023), and the GLM (Zeng et al. 2022; Du et al. 2022) to categorize occupations based on their complete set of occupational descriptions. Each of these models operates based on a comprehensive rubric for scoring LLMs exposure. We, then, submit an occupation’s description, in conjunction with its title, to each model. In response, each model provides an exposure score. These scores effectively capture whether the time required to complete a task could be halved while maintaining consistent quality, assuming a worker has access to ChatGPTlike LLMs. The Scores are divided into four categories: E 0 , E1, E2, and E3, the details of each category are presented in Appendix A. LLMs。我们的方法采用三个突出的大型语言模型来确定各种职业的暴露。具体来说,我们利用了 Open AI 的 GPT-4 模型 (OpenAI 2023)、上海人工智能实验室和商汤科技开发的 InternLM 模型 (InternLM-Team 2023) 和 GLM(Zeng 等人,2022 年;Du et al. 2022)根据职业的完整职业描述对职业进行分类。这些模型中的每一个都基于一个全面的评分标准来运行LLMs。然后,我们向每个模型提交职业描述及其标题。作为响应,每个模型都提供了一个曝光分数。这些分数有效地捕捉了完成任务所需的时间是否可以减半,同时保持质量一致,假设工作人员可以访问类似 ChatLLMs分数分为四类:E 0、E1、E2 和 E3,每个类别的详细信息在附录 A 中介绍。
Although we share similar methodologies with Eloundou et al. (2023), several caveats should be noted. First, all the occupational description and prompt are in Chinese, our exercise relies on capabilities of the Chinese large language models. To accomplish better Chinese language performance, we use two top Chinese large language modelsGLM and InternLM-in addition to GPT4. Second, the Occupational Classification Dictionary of the People’s Republic of China contains only detailed work content and descriptions for each occupation. Therefore, it is infeasible to calculate exposure score at the task level as Eloundou et al. (2023). Instead, we choose to calculate exposure score directly at the occupation level. 尽管我们与 Eloundou 等人(2023 年)共享相似的方法,但应注意一些注意事项。首先,所有的职业描述和提示都是中文的,我们的练习依赖于中文大语言模型的能力。为了实现更好的中文语言性能,除了 GPT4 之外,我们还使用了两个顶级的中文大型语言模型 GLM 和 InternLM-。其次,《中华人民共和国职业分类词典》仅包含每个职业的详细工作内容和描述。因此,像 Eloundou et al. (2023) 那样在任务级别计算暴露分数是不可行的。相反,我们选择直接在职业级别计算曝光分数。
OpenAI has pointed out several weaknesses of the method, such as the validity of the task-based framework, relative versus absolute measures, as well as forwardlooking and changing nature of the scores. Another limitation we would like to discuss here is the randomness of LLM scoring. The same prompt can still yields different results from large language models, even with a higher temperature setting. In order to compensate for this issue, we first had each LLM model label each occupation 8 times and calculated the scores. We then took the average as the final scoring result for each LLM model for each occupation. OpenAI 指出了该方法的几个弱点,例如基于任务的框架的有效性、相对与绝对测量,以及分数的前瞻性和变化性。我们想在这里讨论的另一个限制是评分的LLM随机性。即使使用更高的温度设置,相同的提示仍可能产生与大型语言模型不同的结果。为了解决这个问题,我们首先让每个LLM模型为每个职业贴上 8 次标签并计算分数。然后,我们将平均值作为每个职业的每个LLM模型的最终评分结果。
To compare the consistency of LLM scoring with human scoring, we invited 21 experts in economics and artificial intelligence to serve as judges. We provided them with descriptions of medium-category occupations in China and asked them to score each occupation according to the rating criteria. After collecting the scores from all of the experts, we calculated the average score for each occupation as the final human score. The rating criteria represented the proportion of labor input that large language models could save in each occupation. More details on expert evaluation is presented in Appendix A. 为了比较评分与人工评分的LLM一致性,我们邀请了 21 位经济学和人工智能专家担任评委。我们向他们提供了中国中等类别职业的描述,并要求他们根据评级标准对每个职业进行评分。在收集了所有专家的分数后,我们计算了每个职业的平均分数作为最终的人类分数。评级标准表示大型语言模型在每个职业中可以节省的劳动力投入比例。有关专家评估的更多详细信息,请参阅附录 A。
Results and Assessment of Impacts of LLMs on China's Labor Market LLMs
Summary Statistics 汇总统计
We employed the aforementioned methodology to gather results from GPT-4, GLM, and InternLM, subsequently as- 我们采用上述方法从 GPT-4、GLM 和 InternLM 收集结果,随后作为 -