这是用户在 2024-7-18 22:50 为 https://www.anthropic.com/news/evaluate-prompts 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Product 产品

Evaluate prompts in the developer console
在开发者控制台中评估提示命令

Illustration of Claude using tools

When building AI-powered applications, prompt quality significantly impacts results. But crafting high quality prompts is challenging, requiring deep knowledge of your application's needs and expertise with large language models. To speed up development and improve outcomes, we've streamlined this process to make it easier for users to produce high quality prompts.
在构建 AI 驱动的应用程序时,提示的质量会显著影响结果。然而,制作高质量的提示非常具有挑战性,因为它需要对应用程序需求有深入的了解,并且需要掌握大型语言模型的专业知识。为了加快开发进程并改善结果,我们简化了这个过程,使用户更容易创建高质量的提示。

You can now generate, test, and evaluate your prompts in the Anthropic Console. We've added new features, including the ability to generate automatic test cases and compare outputs, that allow you to leverage Claude to generate the very best responses for your needs.
你现在可以在 Anthropic 控制台生成、测试和评估你的提示。我们新增了生成自动测试用例和对比输出的功能,帮助你使用 Claude 生成最佳响应,满足你的需求。

Generate prompts 生成提示语

Writing a great prompt can be as simple as describing a task to Claude. The Console offers a built-in prompt generator, powered by Claude 3.5 Sonnet, that allows you to describe your task (e.g. “Triage inbound customer support requests”) and have Claude generate a high-quality prompt for you.
编写好提示可能非常简单,只需像向 Claude 描述任务一样。控制台提供了 Claude 3.5 Sonnet 驱动的内置提示生成器,您可以描述任务(例如“处理入站客户支持请求”),然后让 Claude 为您生成高质量的提示。

App screen of Anthropic Console prompt generator

You can use Claude’s new test case generation feature to generate input variables for your prompt—for instance, an inbound customer support message—and run the prompt to see Claude’s response. Alternatively, you can enter test cases manually.
您可以使用 Claude 的新测试用例生成功能来生成提示的输入变量,比如一个客户支持消息,然后运行提示以查看 Claude 的响应。您也可以手动输入测试用例。

App screen of prompt generation and Claude response

Generate a test suite 创建一个测试套件

Testing prompts against a range of real-world inputs can help you build confidence in the quality of your prompt before deploying it to production. With the new Evaluate feature you can do this directly in our Console instead of manually managing tests across spreadsheets or code.
在多种真实场景中测试提示可以帮助您在将提示发布到生产环境之前建立信心。通过新的评估功能,您可以直接在我们的控制台中进行这项操作,而无需手动在电子表格或代码中管理测试。

Manually add or import new test cases from a CSV, or ask Claude to auto-generate test cases for you with the ‘Generate Test Case’ feature. Modify your test cases as needed, then run all of the test cases in one click. View and adjust Claude’s understanding of the generation requirements for each variable to get finer-grained control over the test cases Claude generates.
您可以手动添加测试用例或从 CSV 导入,或者使用“生成测试用例”功能让 Claude 自动创建测试用例。根据需要修改测试用例后,一键运行所有测试用例。查看并调整 Claude 对各变量生成要求的理解,以更精细地控制 Claude 生成的测试用例。

App screen of comparison mode of different prompt responses

Evaluate model responses and iterate on prompts
评估模型的回应并改进提示

Refining your prompt now takes fewer steps, since you can create new versions of the prompt and re-run the test suite to quickly iterate and improve your results. We’ve also added the ability to compare the outputs of two or more prompts side by side.
现在,优化提示变得更加简单,因为您可以创建提示的新版本并重新运行测试套件以快速改进结果。我们还增加了并排比较两个或更多提示结果的功能。

You can even have subject matter experts grade response quality on a 5-point scale in order to see whether the changes you’ve made have improved response quality. Both of these features enable a faster and more accessible way to improve model performance.
你甚至可以让专家按 5 分制来评估回复质量,以查看你的更改是否提升了回复质量。这两个功能让提升模型性能变得更快更简单。

Get started 快速入门

Test case generation and output comparison features are available to all users on the Anthropic Console. To learn more about how to generate and evaluate prompts with Claude, check out our docs.
测试用例生成和输出比较功能已向所有用户开放,可以在 Anthropic 控制台上使用。要了解更多关于如何使用 Claude 生成和评估提示,请查阅我们的文档。