Cognition Team 认知团队 June 2025 2025 年 6 月 15 minute read 15 分钟阅读
Coding Agents 101: The Art of Actually Getting Things Done
编码代理入门:真正完成任务的艺术
The year is 2025. Coding agents aren't magic, but they're about the closest thing we have. We've noticed some engineers, in particular at the senior-to-staff level, finding success faster than others. Here we share some top lessons sourced from the experience of our customers and ourselves.
现在是 2025 年。编码代理并非魔法,但它们是我们所拥有的最接近魔法的东西。我们注意到一些工程师,特别是高级到资深级别的工程师,比其他人更快取得成功。在这里,我们分享一些来自客户和我们自身经验的顶级经验教训。
Product-agnostic 产品无关型
We discuss tips that will help you be successful with any coding agent.
我们讨论一些能帮助你成功使用任何编码代理的技巧。
Tactical 战术型
We offer our favorite bits of actionable advice.
我们提供一些最喜欢的可操作建议。
Technical 技术型
While coding agents can be valuable to many, this guide is written with engineers in mind.
虽然编码代理对许多人来说都很有价值,但本指南是为工程师编写的。
Developer tooling has been rapidly evolving. Ten years ago, it was autocomplete and intellisense, capable of suggesting method names and carrying out programmatic refactors. Four years ago, it was copilots and tab complete, capable of writing the next couple lines of code for you. Two years ago, it was generative chatbots, capable of assisting your development and generating entire files for you. Today, it is autonomous agents, capable of taking initial descriptions to final pull requests with little human intervention. We've focused on realizing this vision over the past two years by building Devin. Now, interest in autonomous agents is reaching new heights, especially with recent releases of similar products [1]
开发者工具正在迅速发展。十年前,是自动补全和智能感知,能够建议方法名称并执行程序重构。四年前,是副驾驶和标签补全,能够为你编写接下来的几行代码。两年前,是生成式聊天机器人,能够协助开发并为你生成完整文件。如今,是自主代理,能够从初始描述到最终拉取请求,几乎无需人工干预。过去两年,我们专注于通过构建 Devin 来实现这一愿景。现在,随着类似产品的近期发布[1],对自主代理的兴趣正达到新的高度。 Other than Devin, some recent releases include Codex by OpenAI and Jules by Google. Some local agents like Cursor and Claude Code can be run in parallel workspaces to replicate a similar effect.
除了 Devin,最近发布的一些产品还包括 OpenAI 的 Codex 和 Google 的 Jules。一些本地代理如 Cursor 和 Claude Code 可以在并行工作区中运行,以实现类似的效果。. These agents can appear in many forms, including web apps, mobile apps, and integrations within popular tools like Slack, GitHub, Linear, and Jira.
这些代理可以以多种形式出现,包括网页应用、移动应用,以及集成在像 Slack、GitHub、Linear 和 Jira 等流行工具中的插件。
While a human paired with an AI assistant can achieve more than any AI alone, an autonomous agent's ability to handle tasks end to end allows for a new level of multi-tasking, turning every engineer into an engineering manager.
虽然人类与 AI 助手配合可以实现比任何单独 AI 更高的效率,但自主代理端到端处理任务的能力带来了新的多任务处理水平,使每个工程师都能成为工程经理。
Adapting to working effectively alongside these new AI colleagues can take some time. Interestingly, we've observed that senior-to-staff level engineers tend to adopt and become proficient with these tools the fastest. Ultimately, these tools will become commonplace across all levels of engineering. Based on our experience and customer feedback, we want to share key insights and lessons learned to help everyone successfully integrate these tools into their workflows.
适应与这些新的 AI 同事高效协作可能需要一些时间。有趣的是,我们观察到高级到资深工程师往往是最先采用并熟练使用这些工具的。最终,这些工具将在各个工程层级普及。基于我们的经验和客户反馈,我们希望分享关键见解和经验教训,帮助大家成功将这些工具整合到工作流程中。
(Getting Started) (入门指南)
Prompting Basics 提示基础
These fundamental guidelines will help you effectively interact with coding agents in 2025. If you take nothing else away, you should at least remember these.
这些基本准则将帮助你在 2025 年有效地与编码代理互动。如果你只记住一点,那至少应该是这些。
Say how you want things done, not just what
说明你希望事情如何完成,而不仅仅是做什么
Think of the agent as a junior coding partner whose decision-making can be unreliable. Simple tasks can be described directly, but for more complex tasks, clearly outline your preferred approach from the outset. Providing the agent with the overall architecture and logic upfront not only boosts its chances of success but also reduces your time reviewing code, as you will already be familiar with the intended method.
将代理视为一个初级编码伙伴,其决策可能不够可靠。简单任务可以直接描述,但对于更复杂的任务,应从一开始就清晰地说明你偏好的方法。提前向代理提供整体架构和逻辑,不仅能提高其成功的几率,还能减少你审查代码的时间,因为你已经熟悉了预期的方法。
Example: 示例:
Instead of "add unit tests," specify the functionality to test, identify important edge cases, and clarify what needs mocking, if anything.
与其说“添加单元测试”,不如具体说明要测试的功能,确定重要的边界情况,并明确是否需要模拟任何内容。
Tell the agent where to start
告诉代理从哪里开始
Think about where you'd start if you were handling the task yourself. Even if you don't know specific file or function names, mention the repository, relevant documentation, and key components involved. Clearly indicating these elements minimizes wasted effort and confusion.
想一想如果你自己处理这项任务会从哪里开始。即使你不知道具体的文件或函数名称,也要提及代码库、相关文档和涉及的关键组件。清楚地指出这些要素可以最大限度地减少浪费的时间和混淆。
Example: 示例:
"Please add support for Google models to our code. You should look at the latest docs [here](link) and create a new implementation file in the model groups directory"
“请为我们的代码添加对 Google 模型的支持。你应该查看最新的文档[这里](link),并在 model groups 目录下创建一个新的实现文件。”
Practice defensive prompting
练习防御性提示
Imagine giving the same prompt to a new intern. Where would confusion or errors likely arise? Anticipate these points and proactively clarify your instructions to avoid ambiguity.
想象一下给一个新实习生同样的提示。哪里可能会产生混淆或错误?预见这些点并主动澄清你的指令,以避免歧义。
Example: 示例:
"Please fix the C++ bindings for our search module to pass the new unit tests. Be careful, you will probably need to recompile the bindings each time you change the code before you test."
请修复我们搜索模块的 C++绑定,以通过新的单元测试。请注意,每次修改代码后,测试前可能都需要重新编译绑定。
Give access to CI, tests, types, and linters
提供对持续集成、测试、类型检查和代码规范工具的访问权限
Much of the magic of agents comes from their ability to fix their own mistakes and iterate against error messages. Providing strong feedback loops through tools like type checkers, linters, and unit tests greatly enhances their performance. Consider typed Python over plain Python, or TypeScript over JavaScript. Teach your agent how to run common checks and tests, ensuring it has all necessary packages and access rights. If the agent can interact with a browser, provide clear instructions on running your front-end development environment.
代理的许多魔力来自于它们修正自身错误并根据错误信息进行迭代的能力。通过类型检查器、代码风格检查工具和单元测试等工具提供强有力的反馈循环,可以大大提升它们的性能。比如,使用带类型的 Python 而非普通 Python,或使用 TypeScript 而非 JavaScript。教会你的代理如何运行常见的检查和测试,确保它拥有所有必要的包和访问权限。如果代理能够与浏览器交互,请提供关于如何运行你的前端开发环境的清晰指令。
Example: 示例:
Our team transitioned from mostly untyped Python SDKs to exclusively typed SDKs (this is also a good task ideally for coding agents).
我们的团队从主要使用无类型的 Python SDK 转变为完全使用有类型的 SDK(这也是编码代理的理想任务)。
Leverage your expertise 利用你的专业知识
Everything above becomes easier when you're familiar with your codebase. Even simple tasks benefit from your ability to verify logic and results. Human oversight remains essential—ultimately, you hold responsibility for the final correctness of the code. Ownership and verification will continue to be critical responsibilities for human engineers, even as these tools become increasingly sophisticated.
当你熟悉代码库时,上述所有事情都会变得更容易。即使是简单的任务,也能从你验证逻辑和结果的能力中受益。人工监督依然至关重要——最终,你对代码的最终正确性负有责任。即使这些工具变得越来越先进,所有权和验证仍将是人类工程师的重要职责。
(Getting Started) (入门指南)
Using Agents in your Workflow
在工作流程中使用代理

Once you've got the basics of talking to an agent down, it's time to bring these AI helpers into your daily workflow. Here are some practical ways to make agents part of your routine:
一旦你掌握了与代理交流的基础知识,就该将这些 AI 助手融入你的日常工作流程了。以下是一些将代理融入你日常工作的实用方法:
Take on new tasks immediately
立即承担新任务
Imagine a teammate messaging you, "Hey, could we build X quickly?" or "We need to tweak Y." Instead of letting it interrupt your flow, just send a quick prompt to an autonomous agent to investigate or make the change. This frees you to stay focused on your main tasks. Got an interesting side project idea? Need to quickly prototype something, scrape data, or reproduce research? Delegate to your agent and circle back later.
想象一下,队友给你发消息说:“嘿,我们能快速做个 X 吗?”或者“我们需要调整一下 Y。”与其让这些打断你的工作节奏,不如快速给一个自主代理发送指令,让它去调查或做出更改。这样你就能专注于主要任务。有了有趣的副项目想法?需要快速做个原型、抓取数据或复现研究?交给你的代理去处理,之后再回来查看。
Example: 示例:
Many teams simply tag @Devin on Slack when discussing bug fixes or minor feature updates.
许多团队在讨论修复漏洞或小功能更新时,通常会在 Slack 上@Devin。
Code on the go
随时编码
Picture yourself commuting or traveling when an urgent bug pops up, or you realize you might have left a mistake in your code. No worries! Autonomous agents often support mobile access, letting you address these issues instantly. Whether through Slack's mobile app or a dedicated mobile app, many agents let you resolve problems on the go, even if your wifi is sketchy.
想象一下,当你在通勤或旅行时,突然出现了一个紧急的 bug,或者你意识到代码中可能有个错误。别担心!自主代理通常支持移动端访问,让你能够立即处理这些问题。无论是通过 Slack 的移动应用还是专门的移动应用,许多代理都能让你在路上解决问题,即使你的 wifi 信号不稳定。
Example: 示例:
Having this optionality has made our own team much more productive on car rides and flights.
这种选择性让我们团队在乘车和飞行时的工作效率大大提升。
Hand off your chores
交给别人处理你的琐事
Stuck bisecting for old commits or updating documentation for a new feature? Hand these repetitive tasks off to your agent. You'll save precious time and stay focused on more creative and impactful work.
卡在为旧提交进行二分查找或为新功能更新文档?把这些重复性的任务交给你的代理来做。你将节省宝贵的时间,专注于更具创造性和影响力的工作。
Example: 示例:
In our team, it is common for an engineer to ship a change and then have an agent update all the relevant docs & user-facing copy.
在我们团队中,工程师提交更改后,通常会有一个代理更新所有相关的文档和面向用户的文案。
Skip the analysis paralysis
跳过分析瘫痪
Stuck deciding if a refactor will actually simplify your code? Can't choose between two architectural approaches? Have your agent implement both options. With concrete examples to compare, decision-making becomes straightforward, and you won't hurt any feelings by discarding a solution.
纠结于重构是否真的能简化你的代码?无法在两种架构方案之间做出选择?让你的代理同时实现这两个选项。有了具体的示例进行比较,决策变得简单明了,而且丢弃某个方案时也不会伤害任何感情。
Example: 示例:
When choosing between Lexical and Slate for text boxes, we had agents implement each. Slate won out for delivering the better end result.
在选择 Lexical 和 Slate 作为文本框时,我们让代理分别实现了两者。Slate 因提供了更好的最终效果而胜出。
Set up preview deployments
设置预览部署
Set your CI/CD pipeline to automatically create preview deployments with each new PR, giving you an instant live URL. This is particularly handy when reviewing frontend tasks completed by AI agents.
将你的 CI/CD 管道设置为在每次新的 PR 时自动创建预览部署,给你一个即时的实时 URL。这在审查由 AI 代理完成的前端任务时特别方便。
Example: 示例:
Vercel is a deployment platform that makes preview deployments super easy.
Vercel 是一个部署平台,使预览部署变得非常简单。
(Intermediate) (中级)
Delegating Larger Tickets
委派更大任务


As the size and complexity of your pull requests grow beyond just a few files, handling them in a single pass becomes challenging. Yet, mastering how to delegate medium-to-large tasks (typically 1-6 hours of work) is where autonomous agents give the highest ROI. Rather than saving just a few minutes, you can reclaim hours of productivity. Smaller tasks might work effortlessly, but stretching the capabilities of agents to handle larger tasks brings the biggest returns.
随着你的拉取请求的规模和复杂度超过几个文件,单次处理它们变得具有挑战性。然而,掌握如何委派中等到大型任务(通常需要 1-6 小时的工作)正是自主代理带来最高投资回报的地方。你不仅能节省几分钟,而是能够收回数小时的生产力。较小的任务可能轻松完成,但将代理的能力扩展到处理更大任务则带来最大的回报。
Automate your first drafts
自动生成你的初稿
For substantial tasks, using an autonomous agent to create an initial draft of your PR can kickstart progress and dramatically cut down your workload. Success here depends on clearly communicating your desired approach upfront. Think of yourself as the architect guiding junior developers. Clear, detailed instructions help avoid spending unnecessary time correcting fundamental misunderstandings in the agent's code.
对于重要任务,使用自主代理来创建 PR 的初稿可以启动进展,并大幅减少你的工作量。成功的关键在于事先清晰地传达你期望的方案。把自己当作指导初级开发者的架构师。清晰、详细的指令有助于避免花费不必要的时间去纠正代理代码中的基本误解。
Domain 领域 | Drafting 起草 | Refining 润色 |
---|---|---|
Journalism 新闻业 | Journalist collects initial information, writes first draft of article 记者收集初步信息,撰写文章初稿 |
The editor reviews drafts, fact checks, polishes, and finalizes for publication. 编辑审阅稿件,核实事实,润色,并最终定稿发布。 |
Restaurant 餐厅 | Line cooks prep ingredients and make preliminary dishes. 线厨准备食材并制作初步菜肴。 |
The sous chef adds seasonings and adjusts the dish to taste better, before it is sent to diners. 副厨师长添加调味料并调整菜肴口味,使其更美味,然后送给食客。 |
Coding 编码 | Autonomous agents get started on tasks based on initial plans and creating first draft solutions 自主代理根据初步计划开始任务,并创建初稿解决方案。 |
Human developer reviews the draft PRs, gives feedback, and adds manual refinements before merging 人类开发者审查草稿的拉取请求,提供反馈,并在合并前进行手动完善 |
🛑 Remember, large tasks aren't completely hands-free (yet). Expect multiple feedback cycles for more challenging assignments, and anticipate some manual refinements for polish. A realistic goal is around 80% time savings, not complete automation, with your expertise remaining vital for verification and final quality assurance.
🛑 请记住,大型任务尚未实现完全免手动操作。对于更具挑战性的任务,预计会有多轮反馈循环,并且需要一些手动调整以达到精细效果。一个现实的目标是节省大约 80%的时间,而非完全自动化,你的专业知识仍然对验证和最终质量保证至关重要。
Co-develop a PRD 共同开发产品需求文档(PRD)
For tasks that are complex or vaguely defined, collaborating with your autonomous agent to create a detailed plan can be highly effective. It's perfectly okay if you initially don't know every nuance or requirement. Start by prompting your agent to explore discovery questions, like "How does our authentication system function?" or "Which services might be impacted?" You can also ask the agent to identify specific relevant code targets for you to confirm early on.
对于复杂或定义模糊的任务,与您的自主代理合作制定详细计划非常有效。即使您一开始不了解所有细节或需求,也完全没问题。可以先让代理提出探索性问题,比如“我们的认证系统是如何运作的?”或“哪些服务可能会受到影响?”您还可以让代理帮忙识别具体相关的代码目标,供您尽早确认。
Certain agents, such as Devin and Claude Code, offer dedicated planning modes that focus on reading and exploring existing code rather than immediately modifying it. If you'd prefer deeper preparation before delegating a task, specialized codebase search tools like deepwiki.com and Devin Search can quickly provide insights into your codebase, helping streamline the process.
某些代理,如 Devin 和 Claude Code,提供专门的规划模式,侧重于阅读和探索现有代码,而不是立即进行修改。如果你更倾向于在委派任务之前进行更深入的准备,像 deepwiki.com 和 Devin Search 这样的专业代码库搜索工具可以快速提供对你的代码库的见解,帮助简化流程。
Set checkpoints 设置检查点
For multi-part tasks, especially those involving multiple codebases, establish clear checkpoints along the way:
对于多部分任务,尤其是涉及多个代码库的任务,沿途设立明确的检查点:
Plan → Implement chunk → Test → Fix → Checkpoint review → Next chunk
计划 → 实施部分 → 测试 → 修复 → 检查点回顾 → 下一部分
Explicitly request pauses after each significant phase, particularly for complex features built across multiple layers (e.g., database, backend, frontend). Use these checkpoints to ensure implementation aligns with your expectations, clarify doubts (ex. "Explain the auth process and confirm its security"), and correct course early to avoid cascading issues.
在每个重要阶段后明确请求暂停,尤其是对于跨多个层次构建的复杂功能(例如数据库、后端、前端)。利用这些检查点确保实现符合你的预期,澄清疑问(例如“解释认证过程并确认其安全性”),并及早纠正方向,以避免连锁问题。
Example: 示例:
"I want you to implement this feature that will span our database, backend, and multiple frontend interfaces. Please first plan out the database schema changes needed, and let me know when that is done so I can apply the migration." -> "Now please implement the backend changes and add tests to make sure XYZ works. Let me know when that is done" -> "Now implement the changes in both our web and mobile interfaces to call the new backend endpoint"
“我希望你实现这个功能,它将涉及我们的数据库、后端以及多个前端界面。请先规划所需的数据库模式更改,完成后告诉我,以便我进行迁移。” -> “现在请实现后端的更改,并添加测试以确保 XYZ 功能正常。完成后告诉我。” -> “现在在我们的网页和移动端界面中实现更改,调用新的后端接口。”
Teach it to verify its own work
教它验证自己的工作成果
When giving feedback, go beyond simply pointing out issues ("This function isn't working"). Clearly articulate your testing process to enable the agent to independently verify future tasks. For testing patterns you'll frequently repeat, integrate these into your agent's permanent knowledge base (See Add to your agent's knowledge base).
在提供反馈时,不要仅仅指出问题(“这个函数不起作用”)。要清楚地说明你的测试过程,使代理能够独立验证未来的任务。对于你将频繁重复的测试模式,将其整合到代理的永久知识库中(参见添加到你的代理知识库)。
Example: 示例:
In Devin, we actively prompt users to save essential testing procedures to the agent's ongoing memory, streamlining future interactions.
在 Devin 中,我们积极提示用户将关键测试程序保存到代理的持续记忆中,从而简化未来的交互。
Increase test coverage in AI hot spots
增加 AI 热点区域的测试覆盖率
Currently, agents aren't fully capable of interactively testing all scenarios thoroughly. Enhancing test coverage in areas heavily modified by AI ensures greater confidence in the agent's output. Solid tests mean that code that appears correct can be confidently merged without worry.
目前,智能体尚无法全面互动地彻底测试所有场景。加强对人工智能大量修改部分的测试覆盖,能够提升对智能体输出结果的信心。完善的测试意味着看似正确的代码可以放心合并,无需担忧。
Example: 示例:
Our team strengthened unit tests in a critical section of our codebase before entrusting our AI to translate the implementation from Python to C.
在将我们的 AI 委托将实现从 Python 翻译成 C 之前,我们团队加强了代码库中关键部分的单元测试。
(Advanced) (高级)
Automating Workflows 自动化工作流程

Agents can respond to incoming events much faster than humans, and they're a lot more willing to do boring, repetitive work, than their human counterparts.
代理能够比人类更快地响应传入事件,而且他们比人类更愿意做枯燥、重复的工作。
Create Shortcuts for Your Most Repetitive Work
为你最重复的工作创建快捷方式
Engineering teams frequently encounter repetitive, routine tasks. These are perfect candidates for automation with agents. Common examples include:
工程团队经常遇到重复的、常规的任务。这些任务非常适合通过代理进行自动化。常见的例子包括:
- feature flag removal 功能开关移除
- dependency upgrades 依赖升级
- fixing and adding tests on new feature PRs
修复和添加新功能拉取请求的测试
To set this up efficiently, an experienced engineer typically creates a robust, reusable prompt template [2]
为了高效设置,经验丰富的工程师通常会创建一个强大且可复用的提示模板[2] In Devin, these are called playbooks
在 Devin 中,这些被称为剧本 that can run repeatedly for these scenarios.
该模板可以针对这些场景反复运行。
Example: 示例:
One of our customers automatically triggers three agents dedicated to writing unit tests whenever new features are developed.
我们的一个客户在开发新功能时,会自动触发三个专门负责编写单元测试的代理。
Intelligent Code Review & Enforcement
智能代码审查与执行
While specialized tools for fast code review exist [3]
虽然存在用于快速代码审查的专用工具[3] Such as Greptile and CodeRabbit
例如 Greptile 和 CodeRabbit, autonomous agents can be an interesting option to deliver more accurate insights, particularly if they've already indexed the functionality of your repositories.
但自主代理可以成为提供更准确见解的有趣选择,特别是当它们已经索引了你代码库的功能时。
Example: 示例:
At Cognition, we like to maintain a list of the most common mistakes engineers make and we commit this list to the codebase. Then, instead of writing classical lint rules to catch these (which is often not possible), we have an agent run on every new PRs to check for these mistakes.
在 Cognition,我们喜欢维护一份工程师最常犯错误的清单,并将这份清单提交到代码库中。然后,我们不是编写传统的 lint 规则来捕捉这些错误(这通常不可行),而是让一个代理在每个新的 PR 上运行,以检查这些错误。
Hook into incidents and alerts
钩住事件和警报
You can also set up autonomous agents to trigger automatically in response to specific events. For example, Devin provides an accessible API, and other agents can be integrated into custom workflows via CLI commands. These setups work especially well alongside MCPs to ingest third-party error logs.
你还可以设置自主代理,在特定事件发生时自动触发。例如,Devin 提供了一个易于使用的 API,其他代理可以通过 CLI 命令集成到自定义工作流中。这些设置与 MCP 配合使用效果尤佳,用于摄取第三方错误日志。
⚠️When it comes to triaging issues in production services, AI's debugging skills are not that great. Instead of asking the AI to fix bugs end-to-end as they come up, it is often more practical to ask the AI to just flag the most suspicious errors, changes, etc.
⚠️在处理生产服务中的问题分类时,AI 的调试能力并不算太强。与其让 AI 从头到尾修复出现的所有错误,不如更实际地让 AI 仅标记最可疑的错误、变更等。
(Advanced) (高级)
Customization & Improving Performance
定制与性能提升
Environment Setup 环境设置
Nothing slows down an agent faster than an incomplete or mismatched environment. To keep things running smoothly, align your agent's setup exactly with your team's. This includes language versions, package dependencies, and automated checks. For example, pre-commit should be installed in the agent's environment and environment configurations (secrets, language versions, virtual environments, browser logins) should be sourced automatically using tools like .envrc or custom configuration of .bashrc
没有什么比不完整或不匹配的环境更能让代理运行变慢的了。为了保持顺利运行,请确保代理的设置与团队的设置完全一致。这包括语言版本、包依赖和自动化检查。例如,pre-commit 应该安装在代理的环境中,环境配置(密钥、语言版本、虚拟环境、浏览器登录)应通过 .envrc 或自定义的 .bashrc 配置等工具自动加载。
Example: 示例:
We set up our agent's browser with pre-authenticated logins, removing the hassle of manual authentication and making testing much easier.
我们为代理的浏览器设置了预先认证的登录,免去了手动认证的麻烦,使测试变得更加轻松。
Build Custom CLI Tools and MCPs
构建自定义命令行工具和 MCP
MCPs are widely available and are quick to set up and experiment with connecting your agent to external tools [4]
MCP 已广泛可用,且设置快速,方便您尝试将代理连接到外部工具[4] In Devin, MCPs are still in beta as we're figuring out the best way to support them. Please contact us for access!
在 Devin 中,MCP 仍处于测试阶段,我们正在探索支持它们的最佳方式。请联系我们获取访问权限!. But many people overlook setting up simple CLI scripts for your agents. As a simple example, you could give your agent a script to pull information about a linear ticket given a ticket ID. You might also want to give your agent a tool to perform common parts of a workflow reliably, such as a script for restarting the local development environment.
但许多人忽视了为你的代理设置简单的命令行脚本。举个简单的例子,你可以给你的代理一个脚本,通过工单 ID 获取关于线性工单的信息。你也可能想给你的代理一个工具,用来可靠地执行工作流程中的常见部分,比如一个用于重启本地开发环境的脚本。
Example: 示例:
We have a customer who has had a lot of success with creating a CLI tool that surfaces only the first failing test in a test suite. The CLI prompts the agent to focus on only that test with detailed error information, and this CLI leads the agent to have higher success and faster completion rates on long tasks.
我们有一位客户在创建一个命令行工具方面取得了很大成功,该工具只显示测试套件中第一个失败的测试。该命令行工具会提示代理只关注该测试及其详细的错误信息,这使得代理在处理长任务时成功率更高,完成速度更快。
Add to your agent's knowledge base
添加到你的代理知识库中
If your agent makes some common mistakes, it's a great time to codify your feedback in the agent's knowledge base. In Devin, there is a dedicated knowledge management system. Many products offer .rules files, .md files for the agent to permanently ingest. Don't just give it guidelines on the framework you're using, but also tell it about the overall architecture of your project. Tell it what type of testing is common for different kinds of tasks, how to run important commands and which tools you recommend using.
如果你的代理犯了一些常见错误,现在是将你的反馈编纂到代理知识库中的好时机。在 Devin 中,有一个专门的知识管理系统。许多产品提供.rules 文件、.md 文件供代理永久摄取。不要仅仅给它关于你使用的框架的指导,还要告诉它你项目的整体架构。告诉它不同类型任务常用的测试方式,如何运行重要命令,以及你推荐使用哪些工具。
Example: 示例:
We give our agent knowledge about the specific procedure it should follow when adding a new service route. The information includes every place it needs to add boilerplate in the frontend and backend. As a result, these tasks are now easily delegated to our AI.
我们为代理提供了添加新服务路由时应遵循的具体流程知识。信息包括它需要在前端和后端添加样板代码的每个位置。因此,这些任务现在可以轻松地委派给我们的 AI。
Practical Considerations 实际考虑
Limitations of Autonomous Agents
自主代理的局限性
Limited debugging skills
有限的调试技能
Bugs reports can be deceptively simple. But many bugs often require not only access to databases and logs, but also a level of debugging that is greater than most AI agents today. If using AI to aid in debugging, we recommend asking for a list of probable root causes rather than trying to debug and fix everything itself. Then, a human can decide based on their own experience which one is the real root cause. But once the cause is known, agents can still be quite helpful at implementing the fix.
错误报告看似简单,但许多错误往往不仅需要访问数据库和日志,还需要比大多数现有 AI 代理更高级的调试能力。如果使用 AI 辅助调试,我们建议让 AI 列出可能的根本原因清单,而不是试图自己调试和修复所有问题。然后,由人类根据自身经验判断哪个才是真正的根本原因。但一旦原因确定,代理仍然可以在实施修复方面发挥很大作用。
Poor fine-grained visual reasoning
细粒度视觉推理能力差
Generally models today don't have great visual reasoning capabilities at the level of details needed to match screenshots of designs or Figma mockups. They are most reliable on visuals that can be described at the level of code (ex. giving it code from Figma). If you want it to match your visual style, you should use a good design system with reusable components.
目前的模型通常不具备足够细致的视觉推理能力,无法精确匹配设计截图或 Figma 模型。它们在处理可以用代码描述的视觉内容时最为可靠(例如,提供 Figma 的代码)。如果你希望它匹配你的视觉风格,应该使用包含可复用组件的良好设计系统。
Knowledge Cutoffs 知识截止点
Whenever you want to work with a new library, you should explicitly point it to the latest docs. Otherwise, most agents will assume the old patterns from these libraries due to knowledge cutoffs in the pretrained base models. A good agent can overcome this if you point it to docs, but you must be mindful of this (remember, the agent doesn't even know that there are new versions of these libraries).
每当你想使用一个新的库时,你应该明确指向最新的文档。否则,大多数智能体会因为预训练基础模型的知识截止点,而假设这些库的旧用法。如果你指向文档,一个好的智能体可以克服这个问题,但你必须注意这一点(记住,智能体甚至不知道这些库有新版本)。
Practical Considerations 实际考虑
Managing Time and Minimizing Losses
管理时间与减少损失
Not all times you use an agent will result in success. In 2025, there is some real variance in the outcomes of these agents. Part of the job involves learning how to use agents in such a way to maximize the chance of running into successful outcomes while minimizing wasted time and tokens.
并非每次使用智能体都会取得成功。到了 2025 年,这些智能体的结果存在一定的差异。部分工作内容是学习如何使用智能体,以最大化遇到成功结果的机会,同时最小化浪费的时间和代币。
Be willing to cut your losses earlier
愿意更早地止损
A common mistake for people who are new to using agents is that they commit to making an interaction successful, even when an agent's work is veering off track. If you ever find yourself thinking "it's ignoring my instructions" or "this thing is going in circles", you should be ok discontinuing that conversation or manually taking over. Sending more messages is more likely a sign of the inherent complexity of your task being higher than the agent's capabilities rather than some simple mistake that can be corrected.
对于刚开始使用智能代理的人来说,一个常见的错误是即使代理的工作偏离了轨道,他们仍然坚持要让交互成功。如果你发现自己在想“它忽略了我的指令”或者“这东西在原地打转”,你应该可以放心地中断那次对话或手动接管。发送更多消息更可能是你的任务本身复杂度超过了代理的能力,而不是某个简单错误可以纠正。
Diversifying your experiments
多样化你的实验
If you're new to working with agents, we recommend diversifying your bets at the start. Try a range of different prompts and ideas. Double down on the types of tasks you see the agents naturally performing well on - and cut your losses on the ones they don't. Don't feel a need to force your agents to find success every time.
如果你是刚开始使用智能体,建议一开始多样化尝试。尝试各种不同的提示和想法。加倍投入那些智能体自然表现良好的任务类型——对于表现不佳的任务则及时止损。不要觉得每次都必须强迫智能体取得成功。
Start fresh when you aren't making progress
当你没有取得进展时,重新开始
Starting over is the right answer a lot more often with agents than with humans. If you've given an agent a task and it is struggling to address feedback or correct course, starting fresh with a new agent and all of the instructions up front can often get to success much faster. The ability of an agent to correct a messed-up environment is much worse than its ability to spit out fresh code from scratch.
对于智能体来说,重新开始比对人类来说更常是正确的选择。如果你给智能体布置了任务,而它在应对反馈或纠正方向上遇到了困难,那么从头开始,使用一个新的智能体并提前提供所有指令,通常能更快地取得成功。智能体纠正一个混乱环境的能力远不如它从零开始生成新代码的能力。
Practical Considerations 实际考虑
Security and Permissioning
安全与权限管理
Create accounts for your agent
为你的代理创建账户
A throwaway email is helpful for safe testing of sites. Create custom IAM roles for your agent if it needs to access cloud resources.
一次性邮箱有助于安全测试网站。如果代理需要访问云资源,请为其创建自定义 IAM 角色。
Give it a development / staging environment
为其提供开发/预发布环境
Ideally the agent uses the same testing setup as the engineers on your team. We suggest avoiding giving access to production services entirely. When using remote agents, you can run fully isolated test environments on the agent's remote machine.
理想情况下,代理应使用与你团队工程师相同的测试环境。我们建议完全避免给予访问生产服务的权限。使用远程代理时,可以在代理的远程机器上运行完全隔离的测试环境。
Readonly API keys 只读 API 密钥
Where possible, give it readonly access. We find it is still helpful for humans to manually run any script that interacts with outside services.
尽可能赋予只读权限。我们发现让人类手动运行任何与外部服务交互的脚本仍然很有帮助。
Practical Considerations 实际考虑
Big Changes Ahead 重大变革即将来临
We firmly believe that software engineers aren't going anywhere. Even as coding agents become smarter and more capable, deep technical expertise and intimate knowledge of your codebase remain invaluable. True ownership of your projects, your systems, and your code is more critical now than ever. On our team today, engineers are expected to oversee multiple systems while still maintaining deep understanding and thoughtful judgment. As automation amplifies your impact, the ability to juggle parallel tasks won't just become possible; It'll become essential. We're excited to share the insights we've gathered while preparing our own organization for this shift, so you and your team can also thrive in the evolving world of software development.
我们坚信软件工程师的地位不会消失。即使编码代理变得更智能、更强大,深厚的技术专长和对代码库的深入了解依然无比宝贵。真正拥有你的项目、系统和代码的责任感比以往任何时候都更为重要。在我们团队中,工程师不仅需要管理多个系统,还要保持深刻的理解和审慎的判断。随着自动化放大你的影响力,同时处理多任务的能力不仅会变得可能,更将成为必需。我们很高兴分享在为自身组织迎接这一转变过程中积累的见解,助你和你的团队在不断发展的软件开发世界中蓬勃发展。