Is The End of Prompt Engineering Here?
提示工程的時代結束了嗎?
Techies know well that coaxing a large language model to give you a good answer can feel a bit like tricking a toddler. Developers have been known to write, “if you don’t give me the correct answer, I will be fired” in their prompts—successfully.
科技人員深知,要誘導大型語言模型給出好的答案,感覺有點像在哄騙幼童。開發人員曾試過在提示中寫下:「如果你沒有給我正確的答案,我就會被炒魷魚」——而且成功了。
That led to “prompt engineering,” a side hustle for techies who wanted to make easy money by writing and improving prompts for LLMs. Now, a host of open-source projects and startups have emerged to help developers automatically optimize their LLM prompts using—you guessed it—other LLMs to write and experiment with those prompts.
這就導致了「提示工程」的興起,這成為了科技人員的兼職,他們想藉由撰寫和改進LLMs的提示來賺取輕鬆的錢。現在,許多開源專案和新創公司紛紛出現,以協助開發人員自動優化他們的LLM提示,方法是——你猜對了——使用其他LLMs來撰寫和實驗這些提示。
The common thread between these prompt optimization systems is: if engineers can come up with a scoring system to evaluate the outputs of LLMs, then they can also use AI to experiment with different prompts to maximize those scores.
這些提示優化系統的共同點是:如果工程師可以想出一個評分系統來評估LLMs的輸出結果,那麼他們也可以利用 AI 來實驗不同的提示,以最大化這些分數。
One of the most popular options today is DSPy, open-source prompt optimization software developed by Stanford researchers. First, the software tries to understand what kind of task a user is trying to complete using an LLM, like writing code or classifying flowers based on their petal length.
目前最受歡迎的選項之一是DSPy,這是由史丹佛大學研究人員開發的開源提示優化軟體。首先,該軟體會嘗試了解使用者嘗試使用LLM完成的任務類型,例如撰寫程式碼或根據花瓣長度對花卉進行分類。
Then, the software will suggest several prompt options, as well as common phrases that have been shown to optimize LLM outputs, like telling the model, “don’t be afraid to be creative.” The software will continue tweaking those prompts and suggesting new ones based on which ones score the highest on evaluation metrics. (You can see an example of DSPy in action in this X thread.)
接著,該軟體會建議幾個提示選項,以及一些已被證明可以優化LLM輸出的常用詞彙,例如告訴模型:「別害怕展現創意」。該軟體將持續調整這些提示並根據哪些提示在評估指標上的得分最高來建議新的提示。(您可以在這個 X 串文中看到 DSPy 實際運作的範例。)
Other common open-source prompt optimization systems include TextGrad and AdalFlow, which try different approaches like asking an LLM to reflect on how it made a mistake and suggest a better prompt that doesn’t make the same error.
其他常見的開源提示優化系統包括TextGrad和AdalFlow,它們嘗試不同的方法,例如要求LLM反思它是如何犯錯的,並建議一個不會犯同樣錯誤的更好的提示。
This summer, Cyrus Nouroozi, one of the core contributors to DSPy, decided to try his hand at building a startup based on its tech. He created Zenbase, along with cofounder Amir Mehr, through Y Combinator.
今年夏天,Cyrus Nouroozi(DSPy 的主要貢獻者之一)決定嘗試根據其技術創辦一家新創公司。他與共同創辦人Amir Mehr透過Y Combinator創立了Zenbase。
With Zenbase, users can show an LLM examples of “good responses,” which the LLM then can use to reverse engineer a “good” prompt, Nouroozi said. That makes it especially useful for applications where the concept of “good” may depend on the specific user, he said.
Nouroozi 表示,使用 Zenbase,使用者可以向LLM展示「良好回應」的範例,然後LLM可以使用這些範例來反向工程出「良好」的提示。他說,這使得它對於「良好」的概念可能取決於特定使用者的應用程式特別有用。
For instance, if a developer is trying to build an AI-powered email copilot that surfaces the most important emails to a user, the user could first show the copilot examples of emails that they deem important, which could vary from user to user, Nouroozi said.
例如,如果開發人員正在嘗試構建一個 AI 驅動的電子郵件協同程式,該程式會向使用者顯示最重要的電子郵件,則使用者可以首先向協同程式展示他們認為重要的電子郵件範例,這些範例可能因使用者而異,Nouroozi 說道。
One source of competition for these makers of prompt optimization software is, unsurprisingly, AI developers like OpenAI and Anthropic themselves, which offer similar tools for users to experiment with and improve their prompts. Nouroozi argues, however, that independent startups have an advantage here, because they can recommend actions to users that AI developers might be incentivized against, like using a competitive model.
這些提示優化軟體製造商的競爭對手之一,毫不意外的是,像OpenAI和Anthropic這樣的 AI 開發商本身,它們也為使用者提供類似的工具來實驗和改進他們的提示。然而,Nouroozi 認為,獨立的新創公司在這方面具備優勢,因為它們可以向使用者推薦 AI 開發商可能不願推薦的行動,例如使用競爭對手的模型。
Either way, the rise of automatic prompt optimization software highlights the push to make using conversational AI more science than art.
不管怎樣,自動提示優化軟體的興起突顯了將對話式 AI 的使用從藝術轉向科學的趨勢。
Here’s what else is going on…
以下是其他正在發生的事情…
Deals and Debuts 交易與新產品發表
See The Information’s Generative AI Database for an exclusive list of private companies and their investors.
請參閱《The Information》的生成式 AI 資料庫,以獲取私人公司及其投資者的獨家名單。
Neo4j, a graph database startup, raised $50 million at a $2.2 billion valuation from Noteus Partners.
圖形資料庫新創公司Neo4j從Noteus Partners獲得了 5000 萬美元的融資,估值達 22 億美元。
Roboflow, a computer vision startup, raised a $40 million Series B round led by GV, with participation from Craft Ventures, Y Combinator and others.
電腦視覺新創公司Roboflow獲得了由GV領投的 4000 萬美元 B 輪融資,Craft Ventures、Y Combinator等公司也參與了投資。
Selector, which uses AI to spot IT issues and recommend fixes, raised a $33 million Series B round led by Ansa.
Selector使用 AI 來發現 IT 問題並推薦解決方案,獲得了由Ansa領投的 3300 萬美元 B 輪融資。
Google is committing $20 million in cash and $2 million in cloud credits to researchers from academic and non-profit institutions using AI for scientific breakthroughs.
Google正投入 2,000 萬美元現金和 200 萬美元雲端積分,資助學術和非營利機構的研究人員運用 AI 實現科學突破。
Spines, an AI publishing startup, raised a $16 million Series A round led by Zeev Ventures.
AI 出版新創公司Spines獲得由Zeev Ventures領投的 1,600 萬美元 A 輪融資。
CommBox, an AI customer experience startup, raised $15 million in funding from PSG Equity.
AI 客戶體驗新創公司CommBox獲得PSG Equity 1,500 萬美元的融資。
Pruna AI, which makes software to compress AI models, raised $6.5 million in seed funding from EQT Ventures.
開發壓縮 AI 模型軟體的Pruna AI獲得EQT Ventures 650 萬美元的種子輪融資。
Revisto, which automates marketing compliance for pharmaceutical companies, raised $4 million in seed funding, led by LiveOak Ventures.
Revisto自動化藥廠行銷合規流程,獲得由LiveOak Ventures領投的 400 萬美元種子輪融資。
Mistral, a French model developer, unveiled new updates to its products, such as the ability for its AI chatbot to search the web and image generation.
法國模型開發商Mistral發布了產品更新,例如其 AI 聊天機器人新增網頁搜尋和圖片生成功能。
Perplexity added a shopping feature, following AI-powered shopping features from Amazon and Google.
在亞馬遜和Google推出 AI 購物功能後,Perplexity也增加了購物功能。
New From Our Reporters 編輯精選
Recommended Newsletter 推薦電子報
More than 100,000 subscribers rely on The Information's Creator Economy newsletter for insightful coverage on creator startups making waves, big tech companies' social media playbooks, and scoops on the biggest hires across the sector. Start receiving the weekly newsletter here.
超過 10 萬名訂閱者仰賴《The Information》的創作者經濟電子報,以獲得關於蓬勃發展的創作者新創公司、大型科技公司社群媒體策略,以及該產業最大規模招聘的深入報導。立即訂閱每週電子報 在此。
What We’re Reading 本週閱讀推薦
Thank you for reading the AI Agenda Newsletter! I’d love your feedback, ideas and tips: stephanie@theinformation.com.
感謝您閱讀 AI 議程電子報!歡迎提供您的回饋、想法和建議:stephanie@theinformation.com。
If you think someone else might enjoy this newsletter, please pass it forward or they can sign up here.
如果你認為其他人也可能喜歡這份電子報,請將其轉發,或請他們在此註冊。
Stephanie Palazzolo is a reporter at The Information covering artificial intelligence. She previously worked at Business Insider covering AI and at Morgan Stanley as an investment banker. Based in New York, she can be reached at stephanie@theinformation.com or on Twitter at @steph_palazzolo.
Stephanie Palazzolo 是《The Information》的記者,負責報導人工智慧相關新聞。她先前曾在《商業內幕》(Business Insider)報導 AI 相關新聞,並曾在摩根士丹利擔任投資銀行家。她現居紐約,您可以透過 stephanie@theinformation.com 與她聯繫,或在 Twitter 上關注 @steph_palazzolo。