Measuring the Impact of ChatGPT on Fostering Concept Generation in Innovative Product Design

Filippi, Stefano

doi:10.3390/electronics12163535

Open AccessArticle 开放获取文章

Measuring the Impact of ChatGPT on Fostering Concept Generation in Innovative Product Design
衡量 ChatGPT 在促进创新产品设计中概念生成方面的影响

by

Stefano Filippi

由 Stefano Filippi

DPIA Department, University of Udine, 33100 Udine, Italy
意大利乌迪内大学 DPIA 系，33100 乌迪内，意大利

Electronics 2023, 12(16), 3535; https://doi.org/10.3390/electronics12163535
电子学 2023 年，12(16)，3535；https://doi.org/10.3390/electronics12163535

Submission received: 19 June 2023 / Revised: 8 August 2023 / Accepted: 10 August 2023 / Published: 21 August 2023
收到提交日期：2023 年 6 月 19 日 / 修订日期：2023 年 8 月 8 日 / 接受日期：2023 年 8 月 10 日 / 发表日期：2023 年 8 月 21 日

Download

Browse Figures

Review Reports Versions Notes
下载浏览图表评论报告版本说明

Abstract 摘要

The growing demand for innovative and user-centric product design has led to a growing need for effective idea generation methods. In recent years, natural language processing (NLP) tools such as ChatGPT have emerged as a promising solution for supporting idea generation in various domains. This paper investigates a framework for studying the role of ChatGPT in facilitating the ideation process in product design. This investigation measures the impact of ChatGPT on the generation of innovative concepts compared to the use of “classic” design methods. An overview of the state-of-the-art idea generation methods in product design opens the paper. Then, the paper highlights some hypotheses about the impact of ChatGPT on innovative product design, aiming for product augmentation by adding features. The paper then describes the design experience in which ChatGPT is used as a tool for concept generation. Finally, the paper analyzes the dataset, using precise metrics to characterize the participants’ performance and compare them. This analysis allows the paper to argue about the validation/rejection of the hypotheses. The paper concludes with a discussion of the implications of the findings and some suggestions for future research. Along with the paper, the Microsoft Excel workbook used to perform the data analysis is available to the readers to perform their own data collection and analysis. The workbook UX has been carefully studied and developed to make it usable by anyone. At the same time, it should be flexible enough to manage several situations characterized by different numbers of participants, product functions to implement, and generated concepts.
对创新和以用户为中心的产品设计日益增长的需求导致了对有效创意生成方法的日益增长需求。近年来，自然语言处理（NLP）工具如 ChatGPT 已经成为支持各个领域创意生成的有前途的解决方案。本文研究了一个框架，用于研究 ChatGPT 在促进产品设计创意生成过程中的作用。该研究衡量了 ChatGPT 对创新概念生成的影响，与使用“经典”设计方法相比。文章首先概述了产品设计中最先进的创意生成方法。然后，文章强调了关于 ChatGPT 对创新产品设计影响的一些假设，旨在通过添加功能来增强产品。文章接着描述了使用 ChatGPT 作为概念生成工具的设计经验。最后，文章分析了数据集，使用精确的指标来表征参与者的表现并进行比较。这种分析使文章能够就假设的验证/拒绝进行论证。论文最后讨论了研究结果的意义以及未来研究的一些建议。除了论文之外，用于进行数据分析的 Microsoft Excel 工作簿也可供读者使用，以进行自己的数据收集和分析。工作簿的用户体验已经经过认真研究和开发，使任何人都能够使用。同时，它应该足够灵活，以便处理由不同参与者数量、要实现的产品功能以及生成的概念所特征化的几种情况。

Keywords:

idea generation; conceptual design; product design; ChatGPT; innovation metrics
关键词：创意生成；概念设计；产品设计；ChatGPT；创新指标

1. Introduction 1. 引言

Concept generation is a critical phase in product design [1,2]. If innovation is the goal, the product development process must separate the analysis of “what” the product will be called to do from the study and development of “how” that “what” will be accomplished. In other words, there are two stages in the product development process: a first stage, when pure product functions are considered, and a second stage, when those functions are embodied in physical products. This two-stage process helps to keep design degrees of freedom open as long as possible [3]. Concept generation is in between the two stages, and this, together with other peculiarities, makes its role fundamental for the success of the design activities.
概念生成是产品设计中的关键阶段[1, 2]。如果创新是目标，产品开发过程必须将产品将要完成的“什么”分析与“如何”完成这个“什么”的研究和开发分开。换句话说，在产品开发过程中有两个阶段：第一阶段，考虑纯产品功能，第二阶段，将这些功能体现在实际产品中。这个两阶段的过程有助于尽可能保持设计自由度的开放[3]。概念生成处于这两个阶段之间，这个特点与其他特点一起，使其在设计活动的成功中起着基础性的作用。

There are many different methods that designers use to generate ideas, each with its own strengths and limitations [4]. These methods range from simple brainstorming to more structured methods like TRIZ and emerging AI-based methods like ChatGPT or Google Bard [5,6].
设计师用于生成创意的方法有很多种，每种方法都有其优势和局限性[4]。这些方法从简单的头脑风暴到更加结构化的方法，如 TRIZ 和新兴的基于人工智能的方法，如 ChatGPT 或 Google Bard[5,6]。

Inside this scenario, this research aims to develop a framework for measuring the impact of ChatGPT, the only AI-based method available in Italy at the time of this research, on the generation of ideas (concepts) in product design. The framework will compare the performance of designers who use classic methods to the performance of designers who use ChatGPT in designing products aiming for product augmentation by adding features. The comparison will be conducted both quantitatively and qualitatively, using well defined metrics [7]. Along with the paper, the Microsoft Excel workbook used to perform the data analysis is available to the readers to perform their own data collection and analysis. The workbook UX has been carefully studied and developed to make it usable by anyone. At the same time, it should be flexible enough to manage several situations characterized by different numbers of participants, product functions to implement, and generated concepts.
在这种情况下，本研究旨在开发一个框架，用于衡量 ChatGPT 在产品设计中的想法（概念）生成上的影响，该方法是在本研究进行时意大利唯一可用的基于人工智能的方法。该框架将比较使用传统方法设计产品的设计师与使用 ChatGPT 设计产品的设计师在旨在通过添加功能来增强产品的产品设计中的表现。比较将定量和定性地进行，使用明确定义的指标[7]。除了论文外，用于进行数据分析的 Microsoft Excel 工作簿可供读者使用，以进行自己的数据收集和分析。该工作簿的用户体验已经经过认真研究和开发，使任何人都可以使用。同时，它应该足够灵活，以便管理由不同参与者数量、要实现的产品功能和生成的概念特征化的几种情况。

The paper is structured as follows. After Section 2, which depicts several idea generation methods, Section 3 describes a design experience conducted with 18 participants. The participants were tasked with generating concepts that implemented a given set of product functions. Section 4 presents the outcome of the design experience. Section 5 analyzes the data and draws some considerations. Section 6 and the references conclude the paper.
本文结构如下。在第 2 节之后，描述了几种创意生成方法，第 3 节描述了与 18 名参与者进行的设计体验。参与者的任务是生成实现给定产品功能集的概念。第 4 节介绍了设计体验的结果。第 5 节分析了数据并得出一些考虑。第 6 节和参考文献结束了本文。

2. Background 2. 背景

Idea Generation Methods 创意生成方法

Traditional methods of idea generation include brainstorming, mind mapping, and sketching. These methods have been described in the literature [8,9,10,11]. Brainstorming is a group ideation technique that encourages participants to generate as many ideas as possible without criticism. This allows for a free flow of ideas, which can lead to more creative solutions. Mind mapping is a visual tool that can be used to represent the relationships between different concepts. This can help designers to see the big picture and to identify potential connections among seemingly unrelated ideas. Sketching is a common method used by designers to quickly generate and communicate ideas. This allows designers to capture their ideas in a tangible way, which can help to refine and develop them later.
传统的创意生成方法包括头脑风暴、思维导图和素描。这些方法已经在文献中描述过[8, 9, 10, 11]。头脑风暴是一种团体构思技术，鼓励参与者尽可能提出更多的想法而不受批评。这有助于想法的自由流动，从而可能导致更具创意的解决方案。思维导图是一种可视化工具，可用于表示不同概念之间的关系。这可以帮助设计师看到整体情况，并识别看似无关的想法之间的潜在联系。素描是设计师常用的一种方法，用于快速生成和传达想法。这使设计师能够以具体方式捕捉他们的想法，有助于稍后对其进行完善和发展。

Among more recent methods, the extreme-inverse method is a technique for generating ideas that challenge existing assumptions and can lead to innovative solutions. This method involves identifying the extremes of a particular product attribute and then generating ideas that are the opposite of these extremes. For example, if a product is designed to be lightweight, the extreme-inverse method would involve generating ideas for a product that is as heavy as possible. This approach can help designers to think outside the box and to generate creative solutions [12].
在较新的方法中，极端反向法是一种用于产生挑战现有假设并可能导致创新解决方案的创意生成技术。该方法涉及识别特定产品属性的极端，并生成与这些极端相反的想法。例如，如果产品设计为轻巧，极端反向法将涉及生成一个尽可能重的产品的想法。这种方法可以帮助设计师跳出思维定势，产生创造性解决方案。

As another more recent method, the analogies-metaphors method is a technique for generating ideas by drawing inspiration from unrelated fields or domains. This approach allows designers to look at problems from new perspectives and to generate creative solutions. For example, if a designer is trying to solve a problem with a product, they might look at how the problem is solved in a different field, such as biology or physics. This can help the designer to come up with new ideas for solving the problem [13].
作为另一种较新的方法，类比-隐喻方法是一种通过从不相关的领域或领域汲取灵感来生成想法的技术。这种方法使设计师能够从新的角度看待问题并生成创造性的解决方案。例如，如果设计师试图解决产品中的问题，他们可能会看看如何在不同的领域，比如生物学或物理学中解决问题。这可以帮助设计师提出解决问题的新想法。

TRIZ is a problem-solving methodology that aims to develop systematic approaches to solve technical problems. It is one of the most well-known logical problem-solving methods [14]. TRIZ involves identifying common patterns in successful innovations and applying them to new problems. This allows designers to draw on the knowledge of past successes to solve new problems. The most well-known and used design tools offered by TRIZ are the 40 principles, the trends of evolution, the contradiction matrix, and trimming. These tools have been used successfully in a wide range of industries, including aerospace, automotive, and manufacturing [15,16,17,18]. The 40 principles are based on the analysis of a large number of patents and ideas. These principles summarize different hints to stimulate creative thinking and guide problem-solving processes by providing a systematic approach to generate innovative solutions. For example, one of the 40 principles is “segmentation”, which suggests that a problem can be solved by dividing it into smaller, more manageable parts. Trends of evolution describe the patterns observed in the development of technological systems. These trends can help designers to understand the direction of technological progress and to guide their own inventive thinking. For example, one of the trends of evolution is “miniaturization”, which suggests that technological systems will tend to become smaller over time. When designers try to improve something in a product, quite often something else risks getting worse. Thus, a contradiction occurs. The TRIZ contradiction matrix contains references to the 40 principles for each contradiction between improving and worsening product features. This can help designers to find solutions to contradictions. Along with the 40 principles, the trends of evolution, and the contradiction matrix, TRIZ trimming is a tool that contributes to gaining product ideality. An ideal product has all the features and functions needed without any costs or drawback. Trimming helps to achieve product ideality by deleting unnecessary product components and moving the useful functions to the remaining components. TRIZ has been used successfully in a wide range of industries, such as aerospace, automotive, and manufacturing.
TRIZ 是一种旨在开发系统化方法解决技术问题的问题解决方法论。它是最著名的逻辑问题解决方法之一。TRIZ 涉及识别成功创新中的共同模式，并将其应用于新问题。这使设计师能够借鉴过去成功的经验来解决新问题。TRIZ 提供的最著名和使用最广泛的设计工具包括 40 个原则、演化趋势、矛盾矩阵和修剪。这些工具已成功应用于航空航天、汽车和制造等广泛领域。这 40 个原则是基于大量专利和创意的分析而得出的。这些原则总结了不同的提示，以激发创造性思维，并通过提供系统化方法生成创新解决方案来引导解决问题的过程。例如，40 个原则之一是“分割”，它建议通过将问题分解为更小、更易管理的部分来解决问题。演化趋势描述了技术系统发展中观察到的模式。这些趋势可以帮助设计师了解技术进步的方向，并引导他们自己的创造性思维。例如，进化的一个趋势是“微型化”，这表明随着时间的推移，技术系统往往会变得更小。当设计师试图改进产品的某个方面时，很多时候会有其他方面的风险恶化。因此，矛盾就会出现。TRIZ 矛盾矩阵包含了针对改进和恶化产品特征之间的每个矛盾的 40 条原则。这可以帮助设计师找到解决矛盾的方法。除了 40 条原则、进化趋势和矛盾矩阵之外，TRIZ 修剪是一种有助于实现产品理想性的工具。理想的产品具有所有所需的特征和功能，而没有任何成本或缺点。修剪通过删除不必要的产品组件并将有用的功能移至剩余组件来实现产品理想性。TRIZ 已成功应用于航空航天、汽车和制造等广泛的行业中。

More recently, digital tools and techniques have been developed to support idea generation in product design. These include generative design tools, data-driven approaches, and artificial intelligence (AI) tools such as ChatGPT and Google Bard [19]. Generative design tools use algorithms to generate a large number of design options, which are based on predefined rules and constraints [20]. Data-driven approaches use user data and feedback to guide ideation and design decisions [21]. AI tools use natural language processing and machine learning algorithms to support ideation by interacting with users in natural language [22].
近年来，数字工具和技术已经被开发用于支持产品设计中的创意生成。这些工具包括生成设计工具、数据驱动方法，以及人工智能（AI）工具，如 ChatGPT 和 Google Bard。生成设计工具使用算法生成大量设计选项，这些选项基于预定义的规则和约束。数据驱动方法利用用户数据和反馈指导构思和设计决策。AI 工具利用自然语言处理和机器学习算法通过自然语言与用户互动来支持构思。

While these digital tools and techniques have the potential to enhance idea generation in product design, there are still challenges that need to be addressed, such as the potential for a lack of creativity and the difficulty in translating generated ideas into tangible designs.
虽然这些数字工具和技术有潜力增强产品设计中的创意生成，但仍然存在需要解决的挑战，比如创意不足的可能性以及将生成的想法转化为具体设计的困难。

Among the methods summarized above, this research exploited brainstorming, analogies-metaphors, extremes-inverses, TRIZ (limited to the 40 principles), and ChatGPT.
在上述总结的方法中，这项研究利用了头脑风暴、类比-隐喻、极端-反向、TRIZ（仅限于 40 个原则）和 ChatGPT。

3. Activities 3. 活动

In order to set up a framework to measure the impact of ChatGPT on fostering concept generation in innovative product design, the author carries out a design experience in the field involving university students and colleagues. The research path is as follows. First, the two product innovation experts who supervised the design experience highlighted the metrics for establishing the quality of the concepts and comparing the performance of the participants. The supervisors then hypothesized about the impact of ChatGPT on product innovation, which the design experience was designed to confirm or reject. The next activity was to characterize the design experience referring to the double diamond approach and implement it, which involved preparing the materials, selecting the participants, executing the design activities, and collecting the data. Only half of the participants had access to ChatGPT. Once finished, the supervisors discussed the results with the participants and then analyzed the dataset to verify the hypotheses.
为了建立一个框架来衡量 ChatGPT 对促进创新产品设计中概念生成的影响，作者进行了一项涉及大学生和同事的设计体验。研究路径如下。首先，两位监督设计体验的产品创新专家强调了建立概念质量和比较参与者表现的指标。然后，监督者对 ChatGPT 对产品创新的影响进行了假设，设计体验旨在验证或否定这一假设。接下来的活动是参照双钻石方法对设计体验进行表征并实施，包括准备材料、选择参与者、执行设计活动和收集数据。只有一半的参与者可以使用 ChatGPT。完成后，监督者与参与者讨论结果，然后分析数据集以验证假设。

3.1. Highlighting the Metrics
3.1. 突出指标

Four metrics have been selected from the literature to use in the framework. They are quantity, usefulness, novelty, and variety, as described in the research reported in [23,24]. The selection of these metrics makes it easier to explore possible relationships between personality traits and ChatGPT use, since previous research has established a relationship between the big five personality traits [25,26] and design activities.
从文献中选择了四个指标用于框架中。它们是数量、实用性、新颖性和多样性，如在[23, 24]中所述的研究中。选择这些指标使得更容易探索人格特质与 ChatGPT 使用之间的可能关系，因为先前的研究已经建立了大五人格特质[25, 26]和设计活动之间的关系。

The four metrics are defined and used as follows.
四个度量标准的定义和使用如下。

Quantity. The quantity metric simply measures the number of concepts generated.
数量。数量度量简单地衡量生成的概念数量。
Usefulness. The usefulness metric measures the applicability of each concept in the specific context; the supervisors will assign a value using a [0, 1] interval (useless to fully useful).
有用性。有用性度量衡量每个概念在特定背景下的适用性；监督者将使用[0, 1]区间（无用到完全有用）分配一个值。
Novelty. A concept is as novel as it has not been exploited in the specific context before. Again, novelty is assigned to each concept using the [0, 1] interval (well known to novel).
新颖性。一个概念在特定背景中尚未被利用时就是多么新颖。同样，新颖性通过使用[0, 1]区间（从众所周知到新颖）来赋予每个概念。
Variety. The variety metric measures how original a concept is in the dataset. Again, the interval is [0, 1]. (If everybody highlighted that concept, the variety score will be 0; if the concept has been highlighted just once, the value will be 1).
多样性。多样性度量衡量数据集中概念的独创性。再次，区间为[0, 1]。（如果每个人都突出显示该概念，则多样性得分为 0；如果该概念仅被突出显示一次，则值为 1）。

3.2. Stating the Hypotheses to Verify
3.2. 提出待验证的假设

The considerations in the background section, along with the experience of the supervisors, allow for highlighting some hypotheses that can be verified through the design experience. These hypotheses, quite naturally mapped with the metrics described before, are as follows.
背景部分的考虑，再加上主管的经验，使得可以突出一些假设，这些假设可以通过设计经验来验证。这些假设与之前描述的度量标准非常自然地对应，具体如下。

H1.

Given its wide knowledge base, ChatGPT is expected to foster much larger numbers of concepts than those highlighted by the participants not allowed to use it.

H1. 鉴于其广泛的知识库，ChatGPT 预计将培养比那些不允许使用它的参与者所强调的概念数量要多得多。

H2.

This hypothesis refers to usefulness and is the most interesting one. ChatGPT could be seen as a big extension of TRIZ, as it has a much wider knowledge base than TRIZ (which is based on patents and inventions only). At the same time, just like TRIZ, ChatGPT can easily overcome the NIH—not-invented-here syndrome—and PI—psychological inertia [27,28,29]. However, the ability of ChatGPT to filter suggestions about concepts that could solve problems or implement functions is unknown, as is how this ability depends on the interaction between the designers and ChatGPT. From this point of view, TRIZ’s strategy of processing data at a general level and delegating designers to the customization of the proposed solutions could be a winning strategy for generating useful concepts. All of this makes the expectations about the design experience outcome even more intriguing, depriving the supervisors of being able to get any real prediction. The supervisors could just suppose that there is a balance between ChatGPT ability to suggest concepts due to its wider knowledge base and TRIZ’s finer strategies to focus and filter possible design suggestions. Finally, usefulness will impact the computation of the other metrics, as it is a sort of go-no go filter for considering concepts based on their novelty and variety.

H2. 这个假设涉及到实用性，是最有趣的一个。ChatGPT 可以被视为 TRIZ 的一个重要延伸，因为它拥有比 TRIZ 更广泛的知识库（TRIZ 仅基于专利和发明）。与此同时，就像 TRIZ 一样，ChatGPT 可以轻松克服“非我所创”综合症和心理惯性。然而，ChatGPT 过滤关于解决问题或实现功能的概念建议的能力尚不明确，以及这种能力如何取决于设计师与 ChatGPT 之间的互动。从这个角度来看，TRIZ 在一般水平上处理数据并将设计师委托定制提议解决方案的策略可能是生成有用概念的成功策略。所有这些使得对设计体验结果的期望更加引人入胜，剥夺了监督者能够得到任何真实预测的能力。监督者只能假设 ChatGPT 能够提出概念的能力与其更广泛的知识库以及 TRIZ 更精细的策略来聚焦和过滤可能的设计建议之间存在平衡。最后，有用性将影响其他指标的计算，因为它是一种考虑基于新颖性和多样性的概念的一种通过-不通过过滤器。

H3.

The novelty of the concepts generated by participants who use ChatGPT is expected to be low, as ChatGPT works on existing pieces of information reporting experiences that have already happened in the past.

H3. 使用 ChatGPT 的参与者生成的概念的新颖性预计会较低，因为 ChatGPT 是基于已有的信息工作的，报告已经发生过的经验。

H4.

The variety of the concepts generated by participants who use ChatGPT is expected to be higher than that of the of the concepts generated using classic methods given the wideness of the ChatGPT knowledge base. However, again considering usefulness as a filter that removes out-of-scope concepts, etc., the results from the teams are expected to be comparable.

H4. 使用 ChatGPT 的参与者生成的概念的多样性预计会高于使用传统方法生成的概念，这是由于 ChatGPT 知识库的广度。然而，再次考虑到作为过滤器的有用性，用于移除超出范围的概念等，团队的结果预计会是可比较的。

These four hypotheses will be considered again once the dataset from the design experience is available. Their validation (confirmed vs. rejected) is described in Section 5, which is structured to find an exact match.
这四个假设将在设计经验的数据集可用后再次考虑。它们的验证（确认 vs. 拒绝）在第 5 节中描述，该节旨在找到精确匹配。

3.3. Characterizing the Design Experience According to the Double Diamond Approach
3.3. 根据双钻石方法表征设计体验

The double diamond approach is a well-known design process model that helps teams to understand and solve problems creatively [30]. It consists of four phases: discover, define, develop, and deliver. In the discover phase, the team gathers information using different methods and tools. In the define phase, the team understands the problem in detail by identifying the causes and the people affected by it. During the develop phase, the team generates ideas and explores different possibilities. In the deliver phase, the solutions are implemented and tested with the users; moreover, the outcomes are communicated to the stakeholders. All of this is double diamond shaped because flowing through the phases is represented by a diverge-converge-diverge-converge sequence. The mapping between the design experience used in this research and the double diamond approach can be seen as follows.
双钻石方法是一个众所周知的设计过程模型，帮助团队创造性地理解和解决问题[30]。它包括四个阶段：发现、定义、开发和交付。在发现阶段，团队使用不同的方法和工具收集信息。在定义阶段，团队通过识别问题的原因和受影响的人员详细了解问题。在开发阶段，团队产生想法并探索不同的可能性。在交付阶段，解决方案被实施并与用户进行测试；此外，结果被传达给利益相关者。所有这些都是双钻石形状的，因为通过各个阶段的流程由分歧-汇聚-分歧-汇聚的顺序表示。在这项研究中使用的设计经验与双钻石方法之间的映射如下所示。

Regarding the discover phase, a very simple, mature, well-known product was considered. This reduced the divergence in analyzing the data, with all of this done in order to state the starting point for every participant of the test as more or less the same. The same considerations last for the define phase; converging to the problem to solve and to the product aspects to focus on to innovate it led to very similar conclusions for all the participants. The design methods and tools studied during the university course were adopted in the develop phase. Here, that mix should have encouraged as much divergence as possible. For example, TRIZ fundamentals make clear that this theory of inventive problem solving aims at generating as many ideas as possible. Finally, the deliver phase finds correspondence in the design experience during the data analysis and evaluation performed by the innovation experts, who assigned marks to the concepts generated by the participants regarding the usefulness and novelty metrics.
关于发现阶段，考虑了一个非常简单、成熟、广为人知的产品。这减少了分析数据时的分歧，所有这些都是为了确立每位测试参与者的起点尽可能一致。同样的考虑也适用于定义阶段；将问题收敛到要解决的问题和要关注的产品方面，以创新引导到非常相似的结论。在开发阶段采用了在大学课程中学习的设计方法和工具。在这里，这种混合应该鼓励尽可能多的分歧。例如，TRIZ 基本原理明确指出，这种创新性问题解决理论旨在产生尽可能多的想法。最后，交付阶段在由创新专家执行的数据分析和评估中找到了对应，他们根据参与者生成的概念的有用性和新颖性指标给予评分。

3.4. Implementing the Design Experience
3.4. 实施设计体验

A total of 18 students were involved in the design experience, all of them enrolled in the “Product Interaction and Innovation” university course of the Management Engineering and Mechanical Engineering Master’s Degrees at the University of Udine (Italy). The course focuses on the design process, mainly the mechanical one but not limited to it, from the very beginning—highlighting of customers’ requirements—to the end-of-life of the product, with particular emphasis on innovative tools and methods (QFD, TRIZ, ChatGPT, etc.) for each step of the process. Along with this, ideation, concept generation and product evaluation consider product UX as one of the most important aspects; again, tools and methods for UX innovation (design thinking, persona development, journey-mapping, etc.) are described and put into practice during laboratorial activities. A total of 13 males and five females participated; nine of them were 20 years old and nine were 22. Their background was mainly technical, with a couple of exceptions coming from classic high schools. All of them knew the brainstorming method before accessing the university lessons but none got in touch with analogies/metaphors, extremes/inverses, or TRIZ. Some had very limited, unstructured experiences with ChatGPT, mainly because it had become available in Italy just some weeks before and they were just curious about it. With all of this said, it can be assumed that the students had more or less the same knowledge about the fundamentals of the concept generation methods used in the design experience, from brainstorming to TRIZ, with some slight difference regarding ChatGPT. A random selection but considering prior knowledge about ChatGPT, generated eight teams of two students each. Having nine sets of data rather than 18 makes data collection and analysis faster and easier. Clearly, this change of granularity could impact the meaningfulness and robustness of the research results, but the main point here is to set up a framework that will be applied in the future to get more data and to analyze the trends. Four teams (T1 to T4) were assigned to only use the classic generation methods; these teams will be addressed hereafter as the G1 group of teams. The other five teams (T5 to T9) were allowed to access ChatGPT; they are referred to as the G2 group of teams.
一共有 18 名学生参与了设计体验，他们全部都是意大利乌迪内大学（University of Udine）管理工程与机械工程硕士课程“产品互动与创新”（Product Interaction and Innovation）的学生。该课程侧重于设计过程，主要是机械设计，但不仅限于此，从最初的客户需求强调到产品的生命周期结束，特别强调创新工具和方法（如 QFD、TRIZ、ChatGPT 等）在每个过程步骤中的应用。除此之外，构思、概念生成和产品评估将产品用户体验（UX）视为其中最重要的方面之一；同样，在实验室活动中描述并实践了 UX 创新的工具和方法（如设计思维、角色扮演开发、旅程绘图等）。共有 13 名男生和 5 名女生参与其中；其中 9 人为 20 岁，9 人为 22 岁。他们的背景主要是技术方面，有几个例外来自传统高中。所有人在进入大学课程之前都了解头脑风暴的方法，但没有人接触过类比/隐喻、极端/反向或 TRIZ。一些人对 ChatGPT 有非常有限、非结构化的经验，主要是因为在意大利刚刚几周前才开始提供，他们只是对它感到好奇。综上所述，可以假设学生们对设计经验中使用的概念生成方法的基础知识大致相同，从头脑风暴到 TRIZ，对于 ChatGPT 可能略有不同。通过随机选择但考虑到对 ChatGPT 的先前了解，生成了八组两名学生的团队。与 18 组数据相比，有九组数据使数据收集和分析更快更容易。显然，这种粒度的改变可能会影响研究结果的意义和稳健性，但这里的主要重点是建立一个框架，将来将应用于获取更多数据并分析趋势。四个团队（T1 至 T4）被指定只使用经典的生成方法；这些团队将在此后被称为 G1 团队组。其他五支球队（T5 至 T9）被允许访问 ChatGPT；它们被称为 G2 组的球队。

Two supervisors with pluriannual experience in product innovation methods and tools took part in the experience. They decided the design experience logistics and prepared the materials to conduct it.
两位具有多年产品创新方法和工具经验的主管参与了这次体验。他们决定了设计体验的物流，并准备了进行体验所需的材料。

The first decision was about the order of the methods that the teams would use. The peculiarities of each method made their sorting easy. For example, the extremes-inverses method needs existing concepts to start from when generating variations. Therefore, this method cannot occur first when no concept has been already highlighted. In the end, the actual order of the methods was brainstorming, analogies-metaphors, extremes-inverses, TRIZ, and ChatGPT, addressed by the letters B, A, E, T, and C, respectively.
第一个决定涉及团队将使用的方法顺序。每种方法的特点使得它们的排序变得容易。例如，极端-反向方法需要现有概念作为起点来生成变化。因此，当没有概念被突出显示时，这种方法不能首先出现。最终，方法的实际顺序是头脑风暴、类比-隐喻、极端-反向、TRIZ 和 ChatGPT，分别用字母 B、A、E、T 和 C 表示。

The design problem was the development of an innovative sharpener for classic wooden pencils. This selection came from the following requirements. First, the number of functions to manage should be limited, since the whole design experience must take at most one lesson. Second, the product should show a mature design in order to level the problem-solving of the teams by clearing it from contributions about creativity, etc., that are too exotic. Third, simplicity and maturity are coupled with the fact that the product must be well-known by everybody; all of this is required in order to lower the bias and focus on the impact of the ChatGPT use as much as possible. Again, always to introduce as little bias as possible, the functional scheme of the product was delivered to the students. Figure 1 reports that. The topmost box contains the overall function; the sentence in the verb-object format that describes the functional scope of the product. The other boxes contain three main functions (F1, “Sharpen correctly”, and F5) and three subfunctions (F2, F3, and F4). The leaf functions in the graph (F1 to F5) are those functions to find concepts for. Thus, this set is fixed for every team.
设计问题是为经典木制铅笔开发一款创新的削笔刀。这一选择源自以下要求。首先，应限制需要管理的功能数量，因为整个设计体验最多只能占用一节课。其次，产品应展现成熟的设计，以便通过清除有关创意等过于奇特的贡献，来平衡团队的解决问题能力。第三，简单和成熟与产品必须为所有人熟知相结合；所有这些都是为了降低偏见，尽可能专注于 ChatGPT 使用的影响。再次，为了尽量减少偏见，产品的功能方案被提供给学生。图 1 显示了这一点。最顶部的方框包含整体功能；以动词-宾语格式描述产品功能范围的句子。其他方框包含三个主要功能（F1，“正确削尖”和 F5）以及三个子功能（F2，F3 和 F4）。图中的叶子功能（F1 至 F5）是需要找到概念的功能。因此，这一集合对每个团队都是固定的。

Figure 1. The functional scheme of the product, an innovative sharpener for classic wooden pencils.
图 1. 产品的功能方案，一种创新的经典木铅笔削尖器。

The expected result for every team is a filled table, called a morphology [3], where five rows correspond to the leaf-functions F1–F5, and each row contains a list of concepts implementing the specific function. Each concept is tagged with the letters B, A, E, T, and C in order to identify the method used to highlight it. Exceptionally, due to the possible suggestions about further functions coming, for example, from the analogies-metaphors method and its exploitation of the semantic fields [31] or from ChatGPT, the teams were allowed to add functions with the related concepts.
每个团队的预期结果是填满的表格，称为形态学[3]，其中五行对应于叶子功能 F1-F5，每行包含实现特定功能的概念列表。为了识别用于突出显示它的方法，每个概念都标有字母 B、A、E、T 和 C。由于可能会有关于进一步功能的建议，例如来自类比-隐喻方法及其对语义领域的利用[31]或来自 ChatGPT，团队被允许添加具有相关概念的功能。

The teams had one hour and a half to fill the morphology by sketching the concepts or describing them textually on paper sheets to avoid delays. For each concept, there was a box allowing the insertion of the letters B, A, E, T, and C, corresponding to the five methods available. Explanations of the sketches by the students would come afterwards, in an offline mode, to get the real meaning and to allow the supervisors to classify and evaluate the concepts as best as possible. To limit bias as much as possible again, the adoption of each method was given a precise amount of time. The supervisors set this timing: brainstorming was given 20 min, analogies-metaphors and extremes-inverses 15 min each, and TRIZ and ChatGPT 20 min each. Teams not allowed to use ChatGPT (teams T1 to T4) could use the last 20 min to refine their findings or to highlight more concepts using any method they like (except for ChatGPT, of course).
团队有一个半小时的时间来填写形态学，通过在纸张上草绘概念或文字描述来避免延误。对于每个概念，都有一个方框，允许插入字母 B、A、E、T 和 C，分别对应五种可用的方法。学生们对草图的解释将随后以离线模式进行，以获得真正的含义，并让监督者尽可能地对概念进行分类和评估。为了尽量减少偏见，每种方法的采用都被赋予了精确的时间。监督者设定了这个时间：头脑风暴给予 20 分钟，类比-隐喻和极端-反向各 15 分钟，TRIZ 和 ChatGPT 各 20 分钟。不允许使用 ChatGPT（T1 至 T4 团队）的团队可以利用最后的 20 分钟来完善他们的发现或者使用任何他们喜欢的方法来突出更多的概念（当然除了 ChatGPT）。

A briefing before the start of the test allowed the supervisors to remember the objectives, the rules to follow, the use of the material, and the behavior to adopt. Furthermore, the precise reason why those design methods and tools were involved, in addition to having been studied in the university course, was explained using the concept of product augmentation [32]. The different ways a product can be improved to meet marketing requirements and make it stand out from the competition can refer to additional features, services, or benefits. Here, dealing with a simple product like a pencil sharpener, additional features are chosen to innovate the product rather than adding services or benefits. Therefore, in addition to brainstorming, those proposed (analogies-metaphors, extremes-inverses, and TRIZ) are the most used methods and tools to add features and optimize products. Clearly, thanks to its versatility, ChatGPT would help all three aspects (features, services, and benefits); hence, it is added easily and smoothly. The scope of the experiment does not include customer expectations; in other words, there has not been a preventive analysis in the field on real needs. If, on one hand, this does not allow us to quantify the complexity of the tasks relating to product augmentation, on the other, it does not limit design freedom, also aiming to discover unexpected solutions, even for the most demanding customers.
测试开始前的简报使监督人员记住了目标、遵循的规则、材料的使用以及应采取的行为。此外，除了在大学课程中学习过的设计方法和工具被涉及的明确原因，还使用了产品增值的概念进行解释。产品可以通过不同方式进行改进以满足营销要求并使其脱颖而出，这可能涉及额外的特性、服务或好处。在这里，处理像铅笔刀这样的简单产品时，选择额外的特性来创新产品，而不是添加服务或好处。因此，除了头脑风暴外，提出的方法和工具（类比-隐喻、极端-反向和 TRIZ）是用于添加特性和优化产品的最常用方法。显然，由于其多功能性，ChatGPT 将有助于所有三个方面（特性、服务和好处）；因此，它可以轻松顺利地添加。实验的范围不包括客户期望；换句话说，在实际需求领域没有进行预防性分析。一方面，这并不允许我们量化与产品增值相关的任务的复杂性；另一方面，它并不限制设计自由，也旨在发现意想不到的解决方案，甚至适用于最苛刻的客户。

4. Results 4. 结果

At the end of the design experience, nine morphologies are collected. Figure 2 shows one of the morphologies (concepts are in Italian).
在设计经验结束时，共收集了九种形态。图 2 展示了其中一种形态（概念为意大利语）。

Figure 2. One of the morphologies generated during the design experience.
图 2. 设计经验中生成的形态之一。

The logs of the interactions with ChatGPT are also collected from teams T5 to T9 to get more data for further research, analysis, and comparisons.
与 ChatGPT 的交互日志也被收集自团队 T5 至 T9，以获取更多数据用于进一步研究、分析和比较。

To get some examples of concepts highlighted during the design experience, the concept of “solar panel”, referring to function F2, “get the power to sharpen”, appeared in the morphologies of teams T2, T4, T5, T6, and T8, highlighted thanks to the A(nalogies-metaphors), B(rainstorming), B(rainstorming), C(hatGPT), and C(hatGPT) design methods, respectively. The concept of “gravity”, referring again to function F2, appeared only in the morphology of team T5, highlighted thanks to the B(rainstorming) design method; finally, the concept of “burner”, referring to function F5, “get rid of the shavings”, appeared in the morphology of teams T1, T2, T4, T7, T8, and T9, highlighted thanks to the B(rainstorming) design method by everyone except for T9, where the A(nalogies-metaphors) allowed highlighting it.
在设计经验中突出的概念示例中，"太阳能电池板"的概念，指的是功能 F2，“获得电力以磨削”，出现在团队 T2、T4、T5、T6 和 T8 的形态中，分别通过类比-隐喻、头脑风暴、头脑风暴、ChatGPT 和 ChatGPT 设计方法突出。再次指向功能 F2 的“重力”概念仅出现在团队 T5 的形态中，通过头脑风暴设计方法突出；最后，指向功能 F5，“摆脱刨花”的“燃烧器”概念出现在团队 T1、T2、T4、T7、T8 和 T9 的形态中，除了 T9 团队外，所有人都通过头脑风暴设计方法突出，而 T9 团队则通过类比-隐喻方法突出。

As a first consideration, it is quite surprising that none of the teams added any new functions, given the kind of design problem proposed (a well-known product, performing a daily action for thousands of engineering students).
作为首要考虑，令人惊讶的是，鉴于所提出的设计问题（一个众所周知的产品，为数千名工程学生执行日常操作），没有任何团队添加任何新功能。

Having said this, the quantity metric was considered. In all, the nine teams generated 92 different concepts, distributed as follows: 18 concepts referred to function F1, 18 to F2, 22 to F3, 20 to F4, and 14 to F5. Figure 3 shows the number of concepts generated by each team, highlighting the numbers of concepts coming from the different methods adopted. The far right of the bar chart compares the situations between the teams not allowed to use ChatGPT and those that could do it.
说到这一点，数量度量被考虑了。总共，这九个团队产生了 92 个不同的概念，分布如下：18 个概念涉及功能 F1，18 个涉及 F2，22 个涉及 F3，20 个涉及 F4，14 个涉及 F5。图 3 显示了每个团队生成的概念数量，突出显示了来自不同采用方法的概念数量。柱状图的最右侧比较了不允许使用 ChatGPT 的团队和可以使用 ChatGPT 的团队之间的情况。

Figure 3. The bar chart representing the quantity metric.
图 3. 表示数量度量的条形图。

Next, the supervisors started evaluating each concept from the usefulness point of view. This is because useless concepts would have been ignored from that moment on. Figure 4 shows the result of this evaluation, the teams’ performance regarding the usefulness metric. Each bar represents the total number of concepts generated by the team and the portion of them showing any usefulness (weighted sum, based on the usefulness values, and percentage). Again, the far right of the bar chart compares the situations between the teams not allowed to use ChatGPT and those that could do it. It is worth mentioning that although the allowed interval for usefulness was [0, 1], for simplicity, the supervisors used only the values 0 (useless), 0.5 (partially useful), and 1 (fully useful).
接下来，监督者开始从实用性的角度评估每个概念。这是因为无用的概念从那一刻起就会被忽略。图 4 显示了这一评估的结果，团队在实用性指标方面的表现。每个柱状图代表团队生成的概念总数以及其中显示任何实用性的部分（加权总和，基于实用性值和百分比）。再次，柱状图的最右侧比较了那些不允许使用 ChatGPT 的团队和那些可以使用的团队之间的情况。值得一提的是，尽管实用性的允许区间为[0, 1]，但为简单起见，监督者仅使用值 0（无用）、0.5（部分有用）和 1（完全有用）。

Figure 4. The performance of the teams regarding the usefulness metric.
图 4. 团队在有用性指标方面的表现。

Then, novelty and variety values were associated with the concepts showing some usefulness. Figure 5 reports the performance of the teams regarding the novelty metric. Each bar represents the total number of concepts generated by the team, with the highlighted portion of them showing some novelty (weighted sum, based on the novelty values of each concept, and percentage). The far right of the bar chart compares the situations between the teams not allowed to use ChatGPT and those that could. Again, it must be noted that although the allowed interval for novelty was [0, 1], for simplicity, the supervisors used only the values 0 (known), 0.5 (partially novel), and 1 (fully novel).
然后，新颖性和多样性价值与展示一定实用性的概念相关联。图 5 报告了团队在新颖性指标方面的表现。每个柱状图代表团队生成的概念总数，其中突出部分显示出一定的新颖性（加权总和，基于每个概念的新颖性价值和百分比）。柱状图的最右侧比较了不允许使用 ChatGPT 的团队和允许使用的团队之间的情况。再次需要注意的是，尽管新颖性的允许区间为[0, 1]，但为简单起见，监督人员仅使用值 0（已知）、0.5（部分新颖）和 1（完全新颖）。

Figure 5. The performance of the teams regarding the novelty metric.
图 5. 各团队在新颖性指标方面的表现。

Similarly, Figure 6 reports the performance of the teams referring to the variety metric. The bars represent the variety of each team (weighted mean, based on the variety values of each concept). The far right of the bar chart compares the situations between the teams not allowed to use ChatGPT and those that could.
同样，图 6 报告了团队在参考多样性指标方面的表现。柱状图代表每个团队的多样性（加权平均值，基于每个概念的多样性值）。柱状图的最右侧比较了不允许使用 ChatGPT 的团队和允许使用 ChatGPT 的团队之间的情况。

Figure 6. The performance of the teams regarding the variety metric.
图 6. 各团队在多样性指标方面的表现。

The Microsoft Excel workbook used to elaborate the dataset generated during the design experience can be downloaded here: (https://uniudamce-my.sharepoint.com/:x:/g/personal/stefano_filippi_uniud_it/EbphTEZJrZ5Nnke7mANJs3gBeYqOasGBBC_klAP0G91tfA?rtime=rZH55FSZ20g (accessed on 18 June 2023)). Aside from obviously working properly, the workbook has been developed to be usable enough for anyone to collect data and do their own analysis. Thus, it can be used in different design contexts, from schools to academia/research centers and industries, in a homogeneous way. Figure 7 shows the user interface of the workbook. It replicates the procedure to collect the data (by offering, among other things, the material for doing it), insert them into the data sheet, and perform the analysis. It works with up to nine product functions, two groups (no ChatGPT allowed and ChatGPT allowed) of up to ten teams each, and one hundred generated concepts (different from each other). The author reputes that these numbers are enough to make the workbook suitable for almost any research situation.
用于详细说明设计经验期间生成的数据集的 Microsoft Excel 工作簿可以在此处下载：(https://uniudamce-my.sharepoint.com/:x:/g/personal/stefano_filippi_uniud_it/EbphTEZJrZ5Nnke7mANJs3gBeYqOasGBBC_klAP0G91tfA?rtime=rZH55FSZ20g (访问日期为 2023 年 6 月 18 日))。除了显然能够正常工作外，该工作簿还被开发为足够易于使用，以便任何人都能够收集数据并进行自己的分析。因此，它可以在不同的设计背景下使用，从学校到学术/研究中心和工业界，以一种统一的方式。图 7 显示了工作簿的用户界面。它复制了收集数据的过程（提供了收集数据所需的材料等），将其插入数据表格并进行分析。它可以处理多达九个产品功能，两个组（不允许 ChatGPT 和允许 ChatGPT）每组最多十个团队，以及一百个生成的概念（彼此不同）。作者认为这些数字足以使该工作簿适用于几乎任何研究情境。

Figure 7. The user interface of the Microsoft Excel workbook used in this research, which is available to everyone who would like to perform the same study.
图 7. 本研究使用的 Microsoft Excel 工作簿的用户界面，可供希望进行相同研究的所有人使用。

5. Discussion 5. 讨论

Before dealing with the precise validation of the hypotheses, some considerations can be drawn from the analysis of the design methods from which the highlighted concepts come. First, it seems that teams in G1 exploited brainstorming much more, on average, than teams in G2. In G1, 49 out of 66 concepts (74%) come from brainstorming. In G2, this percentage dropped to 49% (53 out of 108). The supervisors justify this with the longer time that G1 used the design methods they liked while G2 was using ChatGPT. Brainstorming is undoubtedly the easiest method to adopt among those available; thus, G1 likely used it for 20 min longer than G2. Second, it is worth mentioning the singular distribution of the concepts highlighted thanks to the TRIZ 40 principles. In G1, only one concept appears, from team T1. In G2, eight concepts were generated by three teams out of five. At the moment, the supervisors do not have a precise explanation for this other than a simple, random distribution. It could be that the concepts suggested by ChatGPT reminded G2 teams of one or more TRIZ principles, and for this reason, teams T5 to T9 tagged those principles with T instead of C. Clearly, more data are needed to give more precise answers for these impressions.
在处理假设的精确验证之前，可以从设计方法的分析中得出一些考虑。首先，似乎 G1 团队比 G2 团队平均更多地利用了头脑风暴。在 G1 中，66 个概念中有 49 个（74%）来自头脑风暴。在 G2 中，这一比例下降到 49%（108 个中的 53 个）。导师们解释说，这是因为 G1 团队使用他们喜欢的设计方法的时间更长，而 G2 团队则在使用 ChatGPT。毫无疑问，头脑风暴是可采用的方法中最容易的一种；因此，G1 团队可能比 G2 团队多使用了 20 分钟。其次，值得一提的是，由 TRIZ 40 原则突出的概念的独特分布。在 G1 中，只有一个概念出现，来自 T1 团队。在 G2 中，由五个团队中的三个团队生成了八个概念。目前，导师们对此没有一个明确的解释，除了简单的随机分布。可能是 ChatGPT 提出的概念让 G2 团队想起了一个或多个 TRIZ 原则，因此，T5 至 T9 团队用 T 而不是 C 标记了这些原则。显然，需要更多数据来对这些印象给出更精确的答案。

Referring to hypothesis H1, considering Figure 3, teams T1 to T4 highlighted an average of 16.5 concepts, while teams T5 to T9 reached a value of 21.5. This suggests that ChatGPT clearly boosts the design activities a lot in terms of the number of concepts generated. This is enough to validate hypothesis H1.
提到假设 H1，参考图 3，团队 T1 至 T4 平均突出了 16.5 个概念，而团队 T5 至 T9 达到了 21.5 个。这表明 ChatGPT 在生成概念数量方面明显促进了设计活动。这足以验证假设 H1。

Regarding H2, Figure 4 shows that despite the higher numbers of concepts generated by teams T5–T9, the percentages of concepts considered as somehow useful (i.e., with a usefulness value different from zero) are much lower for these teams. The same trend is observed for the average values for the two groups of teams (66.7% for G1 and 59.7% for G2). This starts to give answers to the H2 hypothesis. Surprisingly, ChatGPT was not as helpful in suggesting useful concepts (on average). There is no definitive answer for the reason why the concepts suggested by ChatGPT do not seem to be as focused on implementing the product functions and, ultimately, on solving the design problem. Is it due to bad interaction with ChatGPT, or to bad use of it? Further investigation is needed. Nevertheless, as planned, the usefulness evaluation lowered the number of concepts to evaluate from then on. They moved from the total of 92 to the useful or somehow useful 58.
关于 H2，图 4 显示，尽管 T5-T9 团队生成的概念数量较高，但这些团队被认为有用的概念的百分比（即具有与零不同的有用性值）要低得多。对于这两组团队的平均值也观察到相同的趋势（G1 为 66.7％，G2 为 59.7％）。这开始回答 H2 假设。令人惊讶的是，ChatGPT 在建议有用概念方面并不那么有帮助（平均值）。对于 ChatGPT 建议的概念似乎不太专注于实施产品功能，最终解决设计问题，目前还没有明确答案。这是由于与 ChatGPT 的交互不佳，还是使用不当？需要进一步调查。尽管如计划，有用性评估降低了从那时起要评估的概念数量。它们从总数 92 减少到有用或某种程度有用的 58。

Regarding H3, the percentages shown in Figure 5 make clear that all novelty values for teams in G2 are higher than those for teams in G1. This contradicts the hypothesis that ChatGPT would suggest low-novelty concepts. The fact that ChatGPT suggests design solutions by exploiting pieces of information referring to things that have already happened in the past (and are conveniently reported) does not seem to be a limit on suggesting novel design solutions. However, it is worth noting that almost half of the initial concepts were excluded from the novelty evaluation due to the usefulness filter; therefore, the larger quantities of concepts generated in G2 made a difference. In other words, it could be said that “the more concepts generated, the more likely that some of them will be novel”, As a result, checking the H3 hypothesis is not straightforward. If all the generated concepts (useful or not) are considered, novelty values between G1 and G2 are quite comparable. On the other hand, if useless concepts are excluded, G2′s performance appears much better that that of G1 from the novelty point of view. All of this could be interpreted differently. The adoption of the “usefulness filter” could be seen as similar to what TRIZ does when selecting the best solutions to suggest. The big difference is that TRIZ has the selection strategy embedded, while the “usefulness filter” was applied by the supervisors during the dataset analysis.
关于 H3，图 5 中显示的百分比清楚地表明，G2 团队的所有新颖性值都高于 G1 团队的值。这与 ChatGPT 会提出低新颖性概念的假设相矛盾。ChatGPT 提出设计解决方案的事实是通过利用涉及过去已经发生的事情的信息片段（并且方便地报告），似乎并不限制提出新颖的设计解决方案。然而，值得注意的是，由于有用性过滤器的原因，几乎一半的初始概念被排除在新颖性评估之外；因此，在 G2 中生成的概念数量更多产生了差异。换句话说，可以说“生成的概念越多，就越有可能有一些是新颖的”。因此，检验 H3 假设并不是直截了当的。如果考虑所有生成的概念（无论有用与否），G1 和 G2 之间的新颖性值是相当可比的。另一方面，如果排除无用的概念，从新颖性的角度看，G2 的表现似乎比 G1 要好得多。所有这些都可以有不同的解释。采用“实用性过滤器”可以被视为类似于 TRIZ 在选择最佳建议解决方案时所做的事情。主要区别在于 TRIZ 已嵌入选择策略，而“实用性过滤器”是由监督员在数据集分析过程中应用的。

Finally, regarding hypothesis H4, Figure 6 shows that the mean variety values of G1 and G2 are quite comparable (0.68 vs. 0.72). Therefore, the hypothesis seems to be confirmed. However, there is something else to note. The variance in G1 (0.00795) is much higher than in G2 (0.001216). This suggests that ChatGPT somehow leveled the teams’ performance from the variety perspective. Teams that performed differently (sometimes significantly differently) in terms of quantity, usefulness, and novelty appear to be comparable in terms of generating varied design solutions. The only reason for this could be that since ChatGPT uses the same knowledge base, the answers it provides to users are more or less the same. Clearly, ChatGPT seems to make its pieces of information available regardless of the specific user. This could also highlight that the way users interact with ChatGPT (the questions they ask and how they ask them) is almost irrelevant. This is an important point to consider. In any case, the analysis of the conversations between the participants and ChatGPT, collected at the end of the design experience, highlighted interesting considerations about the different approaches adopted. Students went from looking for confirmation of their own concepts (“What do you think about a laser beam to get rid of the shavings when sharpening a wooden pencil?”) to asking direct questions (“Give me some concepts on detecting the need to sharpen a wooden pencil”) to writing the entire problem to be solved and asking for help (“I am developing a wooden pencil sharpener. I need some concepts to implement the required subfunctions”). All of this shows the different ways in which participants conceptualized ChatGPT and its potentialities, which is likely due to the short time passing since it became available. It is also worth noting that these different approaches led to an increasing amount of help from ChatGPT. When asked for simple confirmations, ChatGPT limited its intervention to almost yes/no answers. On the other hand, when the entire problem was posed, ChatGPT exploited its freedom and suggested several solutions, organized by topic, and described the inferential process that led to them. As a final consideration, different degrees of empathy were observed in the dialogues between designers and ChatGPT. These ranged from warm and highly empathic to cold, impersonal, and unfeeling. Figure 8 and Figure 9 contain excerpts of two such dialogues (which were originally in Italian and have been translated into English using ChatGPT). The excerpt in Figure 8 shows a generic formulation of the problem to be solved (with many degrees of freedom open). Moreover, the designer’s interaction was quite independent of the ChatGPT answers; it seems that the designer had a list of questions to ask and simply proceeded through them. Finally, the dialogue is mainly unfeeling, cold, and impersonal.
最后，关于假设 H4，图 6 显示 G1 和 G2 的平均多样性值相当可比（0.68 vs. 0.72）。因此，假设似乎得到了确认。然而，还有一些需要注意的地方。G1 的方差（0.00795）远高于 G2（0.001216）。这表明 ChatGPT 在多样性角度上在一定程度上平衡了团队的表现。在数量、实用性和新颖性方面表现不同（有时显著不同）的团队似乎在生成多样化设计解决方案方面是可比的。唯一的原因可能是，由于 ChatGPT 使用相同的知识库，它向用户提供的答案多少是相同的。显然，ChatGPT 似乎会提供其信息片段，而不考虑特定用户。这也可能突显用户与 ChatGPT 互动的方式（提出的问题以及提问方式）几乎是无关紧要的。这是一个重要的考虑点。无论如何，对参与者与 ChatGPT 之间的对话进行的分析，这些对话是在设计体验结束时收集的，突出了关于采用不同方法的有趣观点。学生们从寻找对自己概念的确认（“你认为用激光束来清除削木铅笔时产生的碎屑怎么样？”）转变为直接提问（“给我一些检测木铅笔需要削尖的概念”），再到写下整个待解决问题并寻求帮助（“我正在开发木铅笔削尖器，需要一些实现所需子功能的概念”）。所有这些展示了参与者构想 ChatGPT 及其潜力的不同方式，这可能是因为自 ChatGPT 推出以来的时间很短。值得注意的是，这些不同的方法导致 ChatGPT 提供的帮助越来越多。当要求简单确认时，ChatGPT 将其干预限制在几乎是/否回答上。另一方面，当整个问题被提出时，ChatGPT 利用其自由度提出了几种解决方案，按主题组织，并描述了导致这些解决方案的推理过程。最后，观察到设计师与 ChatGPT 之间对话中存在不同程度的共情。这些对话从温暖和高度共情到冷漠、无人情。图 8 和图 9 包含了两种这样对话的摘录（原文为意大利语，已被翻译成英语使用 ChatGPT）。图 8 中的摘录展示了要解决的问题的一般性表述（有许多自由度）。此外，设计师的互动与 ChatGPT 的回答相当独立；设计师似乎有一系列问题要问，然后简单地逐个进行。最后，对话主要是冷漠、冷淡和无人情的。

Figure 8. Excerpt of a designer–ChatGPT dialogue (generic description of the problem to be solved; independence from ChatGPT answers; unfeeling, cold, and impersonal).
图 8. 设计师-ChatGPT 对话摘录（要解决的问题的通用描述；独立于 ChatGPT 的回答；冷漠、冷淡和不贴心）。

Figure 9. Excerpt of another designer–ChatGPT dialogue (detailed description of the problem to be solved; dependence on ChatGPT answers; warm, very empathic).
图 9. 另一位设计师与 ChatGPT 的对话摘录（要解决的问题的详细描述；对 ChatGPT 答案的依赖；热情、非常有同理心）。

The excerpt in Figure 9 contains a detailed description of the problem to be solved. Moreover, the dialogue is ChatGPT-driven (the designer’s questions occur based on ChatGPT feedback) and is highly empathic on both sides.
图 9 中的摘录包含了要解决的问题的详细描述。此外，对话是由 ChatGPT 驱动的（设计师的问题基于 ChatGPT 的反馈），双方都表现出高度的共情。

6. Conclusions 6. 结论

The research described in this paper suggests a framework for measuring the impact of ChatGPT on the generation of innovative concepts in product design. Some hypotheses were posed regarding the characteristics of the concepts, and the dataset collected during a design experience involving 18 university students allowed for drawing some conclusions about them. On the one hand, the impact of ChatGPT is clear in terms of the number of concepts suggested. On the other hand, there are both positive and negative aspects to its effectiveness from the perspectives of usefulness, novelty, and variety. ChatGPT proved to be not so helpful in suggesting useful concepts; classic design methods appeared more effective. However, from the novelty perspective, ChatGPT performed quite well, contrary to expectations since its knowledge base contains only pieces of information regarding things that have already happened (and been documented) in the past. Finally, regarding variety, ChatGPT involvement in design does not seem to make a big difference. However, it emerged that there could be a kind of performance leveling among designers who use it, regardless of their individual characteristics. This may be due to the fact that the knowledge base is always the same. Nevertheless, apart from the evaluation of the metrics involved, some interesting considerations emerged during the data analysis. For example, there are analogies between TRIZ and ChatGPT and the use of them in design. It is also clear that the usefulness metric plays an important role in evaluating ChatGPT performance in design.
本文描述的研究提出了一个框架，用于衡量 ChatGPT 对产品设计中创新概念生成的影响。关于这些概念的特征提出了一些假设，通过涉及 18 名大学生的设计体验收集的数据集，得出了一些关于它们的结论。一方面，从所提出的概念数量来看，ChatGPT 的影响是明显的。另一方面，从有用性、新颖性和多样性的角度来看，其有效性既有积极的一面，也有消极的一面。从有用性的角度来看，ChatGPT 在提出有用的概念方面并不那么有帮助；传统的设计方法似乎更有效。然而，从新颖性的角度来看，ChatGPT 表现相当不错，与预期相反，因为其知识库仅包含关于过去已经发生（并已记录）的事物的信息片段。最后，就多样性而言，ChatGPT 在设计中的参与似乎并没有产生很大的差异。然而，出现了一种性能水平化的情况，无论设计师的个体特征如何，使用它的设计师之间似乎存在某种性能水平化。这可能是因为知识库始终保持不变。然而，除了评估所涉及的指标之外，在数据分析过程中出现了一些有趣的考虑。例如，在设计中 TRIZ 和 ChatGPT 之间存在类比，并且它们的使用。同时，明显的是有用性指标在评估 ChatGPT 在设计中的表现方面起着重要作用。

Regarding some research perspectives, the use of the specific four metrics makes the exploitation of previous research about the influence of personality on design activities easier, aiming at linking personality to ChatGPT use, always in the product design domain. Moreover, the availability of the logs of the conversations with ChatGPT suggests exploring the possibility of highlighting one more relationship among personality traits, design performance, and use of ChatGPT. Some metrics are under study to cover this aspect as well. These include the query language, the number of questions, and the percentage of independent questions. Another hint for future work is that the low number of participants in the design experience does not allow the research findings to be considered as definitive. In the future, new design experiences will be conducted to make the dataset richer. Moreover, the Microsoft Excel workbook, already optimized from the usability point of view and made available to anyone wanting to carry out personal reasoning and evaluation, will be further improved by making it able to send, with the consent of the researcher, the results of these personal activities to a common repository in order to make the research results as reliable as possible. Finally, particular attention should be placed on comparing TRIZ and ChatGPT use in different design/redesign situations. This comparison was not possible in this study due to the imposed limitations on TRIZ methods and tools (only 40 principles) and the amount of data available. Ad-hoc research could be conducted on this topic in the near future.
关于一些研究视角，使用特定的四个指标使得利用先前关于个性对设计活动影响的研究更加容易，旨在将个性与 ChatGPT 的使用联系起来，始终处于产品设计领域。此外，ChatGPT 对话记录的可用性表明探索在个性特征、设计表现和 ChatGPT 使用之间再增加一种关系的可能性。一些指标正在研究中以覆盖这一方面。这些指标包括查询语言、问题数量和独立问题的百分比。未来工作的另一个提示是设计体验中参与者数量较少，这不允许将研究结果视为最终结论。未来将进行新的设计体验以使数据集更加丰富。此外，微软 Excel 工作簿已经从可用性角度进行了优化，并提供给任何希望进行个人推理和评估的人使用，将进一步改进，使其能够在研究人员同意的情况下将这些个人活动的结果发送到一个共同的存储库，以使研究结果尽可能可靠。最后，应特别关注在不同的设计/重新设计情况下比较 TRIZ 和 ChatGPT 的使用。由于对 TRIZ 方法和工具（仅有 40 个原则）以及可用数据的限制，本研究无法进行此比较。未来可以就这个主题进行专门研究。

Funding 资金

This research received no external funding.
本研究未接受任何外部资助。

Informed Consent Statement
知情同意声明

Informed consent was obtained from all subjects involved in the study.
研究中涉及的所有受试者均已获得知情同意。

Data Availability Statement
数据可用性声明

The Microsoft Excel workbook used to elaborate the dataset generated during the design experience can be downloaded here: (https://uniudamce-my.sharepoint.com/:x:/g/personal/stefano_filippi_uniud_it/EbphTEZJrZ5Nnke7mANJs3gBeYqOasGBBC_klAP0G91tfA?rtime=rZH55FSZ20g (accessed on 18 June 2023)). It has been optimized from the usability point of view and is available for everybody wanting to conduct similar design experiences, reasoning, and evaluations.
用于详细说明设计经验期间生成的数据集的 Microsoft Excel 工作簿可以在此处下载：(https://uniudamce-my.sharepoint.com/:x:/g/personal/stefano_filippi_uniud_it/EbphTEZJrZ5Nnke7mANJs3gBeYqOasGBBC_klAP0G91tfA?rtime=rZH55FSZ20g (访问日期：2023 年 6 月 18 日))。从可用性的角度进行了优化，并可供所有希望进行类似设计经验、推理和评估的人使用。

Acknowledgments 致谢

The author would like to thank the students of the “Product Interaction and Innovation” course (A.Y. 2022-23) of the Mechanical Engineering and Management Engineering Degrees at the University of Udine (Italy) as valuable participants to the design experience.
作者要感谢乌迪内大学（意大利）机械工程和管理工程学位的“产品互动与创新”课程（2022-23 学年）的学生，作为设计经验中宝贵的参与者。

Conflicts of Interest 利益冲突

The author declares no conflict of interest.
作者声明没有利益冲突。

References 参考文献

Cooper, R.G. Perspective: The Stage-Gate^® Idea-to-Launch Process—Update, What’s New, and NexGen Systems. J. Prod. Innov. Man. 2008, 25, 213–232. [Google Scholar] [CrossRef]
Cooper, R.G. 观点：阶段-门控 ^® 想法到推出流程—更新，新内容和下一代系统。产品创新管理杂志。2008 年，25，213–232。[谷歌学术] [交叉路口]
Liu, Y.-C.; Chakrabarti, A.; Bligh, T. Towards an ‘ideal’ approach for concept generation. Des. Stud. 2003, 24, 341–355. [Google Scholar] [CrossRef]
刘永昌；查克拉巴蒂，A.；布莱，T. 朝向概念生成的“理想”方法。设计研究。2003 年，24，341-355。【谷歌学术】【交叉引用】
Ullman, D.G. The Mechanical Design Process, 4th ed.; McGraw-Hill Series in Mechanical Engineering; McGraw-Hill Higher Education: Boston, MA, USA, 2010. [Google Scholar]
乌尔曼（Ullman, D.G.）著，《机械设计过程》，第 4 版；麦格劳希尔机械工程系列；麦格劳希尔高等教育出版社：美国马萨诸塞州波士顿，2010 年。【谷歌学术】
Chulvi, V.; González-Cruz, M.C.; Mulet, E.; Aguilar-Zambrano, J. Influence of the type of idea-generation method on the creativity of solutions. Res. Eng. Des. 2013, 24, 33–41. [Google Scholar] [CrossRef]
Chulvi, V.; González-Cruz, M.C.; Mulet, E.; Aguilar-Zambrano, J. 创意生成方法类型对解决方案创造力的影响。Res. Eng. Des. 2013, 24, 33–41. [ Google Scholar] [ CrossRef]
Füller, J.; Hutter, K.; Wahl, J.; Bilgram, V.; Tekic, Z. How AI revolutionizes innovation management—Perceptions and implementation preferences of AI-based innovators. Technol. Forecast. Soc. Chang. 2022, 178, 121598. [Google Scholar] [CrossRef]
Füller, J.; Hutter, K.; Wahl, J.; Bilgram, V.; Tekic, Z. 人工智能如何革新创新管理——基于人工智能创新者的感知和实施偏好。 Technol. Forecast. Soc. Chang. 2022, 178, 121598. [ Google Scholar] [ CrossRef]
Ram, B.; Verma, P. Artificial intelligence AI-based Chatbot study of ChatGPT, Google AI Bard and Baidu AI. World J. Adv. Eng. Technol. Sci. 2023, 8, 258–261. [Google Scholar] [CrossRef]
Ram, B.; Verma, P. 基于人工智能 AI 的 Chatbot 研究：ChatGPT，Google AI Bard 和 Baidu AI。World J. Adv. Eng. Technol. Sci. 2023, 8, 258–261. [ Google Scholar] [ CrossRef]
Vargas, S.; Castells, P. Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the Fifth ACM Conference on Recommender Systems. Presented at the RecSys ’11: Fifth ACM Conference on Recommender Systems, ACM, Chicago, IL, USA, 23–27 October 2011; pp. 109–116. [Google Scholar] [CrossRef]
Goldenberg, O.; Wiley, J. Individual and Group Brainstorming: Does the Question Matter? Creat. Res. J. 2019, 31, 261–271. [Google Scholar] [CrossRef]
Novak, J.D.; Gowin, D.B.; Kahle, J.B. Learning How to Learn, 1st ed.; Cambridge University Press: Cambridge, UK, 1984. [Google Scholar] [CrossRef]
Cross, N. Design Thinking: Understanding How Designers Think and Work; Berg: Oxford, UK; New York, NY, USA, 2011. [Google Scholar]
Malycha, C.P.; Maier, G.W. The Random-Map Technique: Enhancing Mind-Mapping with a Conceptual Combination Technique to Foster Creative Potential. Creat. Res. J. 2017, 29, 114–124. [Google Scholar] [CrossRef]
Brown, T.; Katz, B. Change by Design: How Design Thinking Transforms Organizations and Inspires Innovation, 1st ed.; Harper Business: New York, NY, USA, 2009. [Google Scholar]
Casakin, H.P. Assessing the Use of Metaphors in the Design Process. Environ. Plan. B: Plan. Des. 2006, 33, 253–268. [Google Scholar] [CrossRef]
Al’tšuller, G.S.; Shulyak, L.; Rodman, S. The Innovation Algorithm: TRIZ, Systematic Innovation and Technical Creativity, 2nd ed.; Technical Innovation Center: Worcester, MA, USA, 2007. [Google Scholar]
Liu, Z.; Feng, J.; Wang, J. Resource-Constrained Innovation Method for Sustainability: Application of Morphological Analysis and TRIZ Inventive Principles. Sustainability 2020, 12, 917. [Google Scholar] [CrossRef]
Ghane, M.; Ang, M.C.; Cavallucci, D.; Kadir, R.A.; Ng, K.W.; Sorooshian, S. TRIZ trend of engineering system evolution: A review on applications, benefits, challenges and enhancement with computer-aided aspects. Comput. Ind. Eng. 2022, 174, 108833. [Google Scholar] [CrossRef]
Lu, S.; Guo, Y.; Huang, W.; Shen, M. Product Form Evolutionary Design Integrated with TRIZ Contradiction Matrix. Math. Probl. Eng. 2022, 2022, 3844324. [Google Scholar] [CrossRef]
Edward, C.; Labadin, J.; Kulathuramaiyer, N. Mathematical Modelling and Formalization of TRIZ: Trimming for Product Design. In Systematic Innovation Partnerships with Artificial Intelligence and Information Technology, IFIP Advances in Information and Communication Technology; Nowak, R., Chrząszcz, J., Brad, S., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 3–16. [Google Scholar] [CrossRef]
Rico Sesé, J. Nuevos Retos para el Diseño y la Comunicación. La Inteligencia Artificial en Los Procesos Creativos del Diseño GráFico. Ph.D. Thesis, Universidad Politécnica de València, Valencia, Spain, 2023. [Google Scholar]
De Peuter, S.; Oulasvirta, A.; Kaski, S. Toward AI assistants that let designers design. AI Mag. 2023, 44, 85–96. [Google Scholar] [CrossRef]
Cantamessa, M.; Montagna, F.; Altavilla, S.; Casagrande-Seretti, A. Data-driven design: The new challenges of digitalization on product design and development. Des. Sci. 2020, 6, e27. [Google Scholar] [CrossRef]
Siemon, D.; Strohmann, T.; Michalke, S. Creative Potential through Artificial Intelligence: Recommendations for Improving Corporate and Entrepreneurial Innovation Activities. CAIS 2022, 50, 241–260. [Google Scholar] [CrossRef]
Filippi, S.; Barattin, D. Influence of Personality on Shape-Based Design Activities. Adv. Hum. -Comput. Interact. 2019, 2019, 9651369. [Google Scholar] [CrossRef]
Sarkar, P.; Chakrabarti, A. Assessing design creativity. Des. Stud. 2011, 32, 348–383. [Google Scholar] [CrossRef]
Goldberg, L.R. An alternative “description of personality”: The Big-Five factor structure. J. Personal. Soc. Psychol. 1990, 59, 1216–1229. [Google Scholar] [CrossRef] [PubMed]
Sung, S.Y.; Choi, J.N. Do Big Five Personality Factors Affect Individual Creativity? the Moderating Role of Extrinsic Motivation. Soc. Behav. Pers. 2009, 37, 941–956. [Google Scholar] [CrossRef]
Katila, R.; Ahuja, G. Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction. Acad. Manag. J. 2002, 45, 1183–1194. [Google Scholar] [CrossRef]
Jansson, D.G.; Smith, S.M. Design fixation. Des. Stud. 1991, 12, 3–11. [Google Scholar] [CrossRef]
Kohn, N.W.; Smith, S.M. Collaborative fixation: Effects of others’ ideas on brainstorming. Appl. Cognit. Psychol. 2011, 25, 359–371. [Google Scholar] [CrossRef]
Gustafsson, D. Analysing the Double Diamond Design Process through Research & Implementation. Available online: https://aaltodoc.aalto.fi/handle/123456789/39285 (accessed on 18 June 2023).
Rosch, E.H. Natural categories. Cogn. Psychol. 1973, 4, 328–350. [Google Scholar] [CrossRef]
Colgate, M.; Alexander, N. Benefits and Barriers of Product Augmentation: Retailers and Financial Services. J. Mark. Manag. 2002, 18, 105–123. [Google Scholar] [CrossRef]

Figure 1. The functional scheme of the product, an innovative sharpener for classic wooden pencils.

Figure 2. One of the morphologies generated during the design experience.

Figure 3. The bar chart representing the quantity metric.

Figure 4. The performance of the teams regarding the usefulness metric.

Figure 5. The performance of the teams regarding the novelty metric.

Figure 6. The performance of the teams regarding the variety metric.

Figure 7. The user interface of the Microsoft Excel workbook used in this research, which is available to everyone who would like to perform the same study.

Figure 8. Excerpt of a designer–ChatGPT dialogue (generic description of the problem to be solved; independence from ChatGPT answers; unfeeling, cold, and impersonal).

Figure 9. Excerpt of another designer–ChatGPT dialogue (detailed description of the problem to be solved; dependence on ChatGPT answers; warm, very empathic).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Filippi, S. Measuring the Impact of ChatGPT on Fostering Concept Generation in Innovative Product Design. Electronics 2023, 12, 3535. https://doi.org/10.3390/electronics12163535

AMA Style

Filippi S. Measuring the Impact of ChatGPT on Fostering Concept Generation in Innovative Product Design. Electronics. 2023; 12(16):3535. https://doi.org/10.3390/electronics12163535

Chicago/Turabian Style

Filippi, Stefano. 2023. "Measuring the Impact of ChatGPT on Fostering Concept Generation in Innovative Product Design" Electronics 12, no. 16: 3535. https://doi.org/10.3390/electronics12163535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Citations

Crossref

11

Scopus

17

Web of Science

5

Google Scholar

[click to view]

Article Access Statistics

For more information on the journal statistics, click here.

Multiple requests from the same IP address are counted as one view.

Article Menu 文章目录

Measuring the Impact of ChatGPT on Fostering Concept Generation in Innovative Product Design
衡量 ChatGPT 在促进创新产品设计中概念生成方面的影响

Abstract 摘要

1. Introduction 1. 引言