OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and video. GPT-4o is set to roll out “iteratively” across the company’s developer and consumer-facing products over the next few weeks.
OpenAI 週一發布了一款新的旗艦生成式人工智慧模型,他們稱之為 GPT-4o--"o "代表 "omni",指的是模型能夠處理文字、語音和視訊。 GPT-4o 將在未來幾週內在公司面向開發者和消費者的產品中 "迭代 "推出。
OpenAI CTO Mira Murati said that GPT-4o provides “GPT-4-level” intelligence but improves on GPT-4’s capabilities across multiple modalities and media.
OpenAI 技術長米拉-穆拉提(Mira Murati)說,GPT-4o 提供了 "GPT-4 級 "的智能,但在 GPT-4 的基礎上改進了跨多種模式和媒體的能力。
“GPT-4o reasons across voice, text and vision,” Murati said during a streamed presentation at OpenAI’s offices in San Francisco on Monday. “And this is incredibly important, because we’re looking at the future of interaction between ourselves and machines.”
"穆拉提週一在 OpenAI 位於舊金山的辦公室進行的一次串流媒體演示中說:"GPT-4o 可以跨越語音、文字和視覺。 "這一點非常重要,因為我們正在研究我們自己與機器之間互動的未來。
GPT-4 Turbo, OpenAI’s previous “leading “most advanced” model, was trained on a combination of images and text and could analyze images and text to accomplish tasks like extracting text from images or even describing the content of those images. But GPT-4o adds speech to the mix.
GPT-4 Turbo 是 OpenAI 先前 "最先進 "的領先模型,它是在圖像和文本的組合上進行訓練的,可以分析圖像和文本,完成從圖像中提取文本甚至描述圖像內容等任務。但是,GPT-4o 增加了語音功能。
What does this enable? A variety of things.
這能帶來什麼?很多方面。
![](https://techcrunch.com/wp-content/uploads/2024/05/Screenshot_2024-05-13_at_1.28.44a_¯PM-transformed-1.png)
GPT-4o greatly improves the experience in OpenAI’s AI-powered chatbot, ChatGPT. The platform has long offered a voice mode that transcribes the chatbot’s responses using a text-to-speech model, but GPT-4o supercharges this, allowing users to interact with ChatGPT more like an assistant.
GPT-4o 大大改善了 OpenAI 的人工智慧聊天機器人 ChatGPT 的使用體驗。長期以來,該平台一直提供語音模式,使用文字轉語音模式轉錄聊天機器人的回复,但 GPT-4o 對此進行了改進,讓用戶與 ChatGPT 的互動更像一個助手。
For example, users can ask the GPT-4o-powered ChatGPT a question and interrupt ChatGPT while it’s answering. The model delivers “real-time” responsiveness, OpenAI says, and can even pick up on nuances in a user’s voice, in response generating voices in “a range of different emotive styles” (including singing).
例如,使用者可以向由 GPT-4o 支援的 ChatGPT 提問,並在 ChatGPT 回答問題時打斷它。 OpenAI 表示,該模型可以提供 "即時 "回應,甚至可以捕捉使用者聲音中的細微差別,從而產生 "一系列不同情感風格 "的聲音(包括唱歌)。
Join 10,000 Startup Leaders
加入 10,000 名新創企業領導者的行列
每個階段的創新 舊金山,10 月 28-30 日
Join 10,000 Startup Leaders
GPT-4o also upgrades ChatGPT’s vision capabilities. Given a photo — or a desktop screen — ChatGPT can now quickly answer related questions, from topics ranging from “What’s going on in this software code?” to “What brand of shirt is this person wearing?”
GPT-4o 也升級了 ChatGPT 的視覺功能。如果給定一張照片或一個桌面螢幕,ChatGPT 現在可以快速回答相關問題,從 "這個軟體代碼是怎麼回事?"到 "這個人穿的是什麼牌子的襯衫?"。
![](https://techcrunch.com/wp-content/uploads/2024/05/desktop.jpg?w=680)
ChatGPT 的桌面應用程式在編碼任務中的使用:OpenAI
These features will evolve further in the future, Murati says. While today GPT-4o can look at a picture of a menu in a different language and translate it, in the future, the model could allow ChatGPT to, for instance, “watch” a live sports game and explain the rules to you.
穆拉提說,這些功能將來還會進一步發展。現在,GPT-4o 可以查看不同語言的選單圖片並進行翻譯,而在未來,這種模式可以讓 ChatGPT "觀看 "現場體育比賽,並向您解釋比賽規則。
“We know that these models are getting more and more complex, but we want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with ChatGPT,” Murati said. “For the past couple of years, we’ve been very focused on improving the intelligence of these models … But this is the first time that we are really making a huge step forward when it comes to the ease of use.”
"穆拉提說:"我們知道這些模型越來越複雜,但我們希望互動體驗實際上變得更加自然、輕鬆,讓你完全不必關注用戶介面,而只需專注於與 ChatGPT 的協作。 「過去幾年,我們一直非常專注於提高這些模型的智慧性......但這是我們第一次真正在易用性方面向前邁出一大步。
GPT-4o is more multilingual as well, OpenAI claims, with enhanced performance in around 50 languages. And in OpenAI’s API and Microsoft’s Azure OpenAI Service, GPT-4o is twice as fast as, half the price of and has higher rate limits than GPT-4 Turbo, the company says.
OpenAI 聲稱,GPT-4o 還具有更強的多語言能力,可增強約 50 種語言的效能。該公司表示,在 OpenAI 的 API 和微軟的 Azure OpenAI 服務中,GPT-4o 的速度是 GPT-4 Turbo 的兩倍,價格是後者的一半,速率限制也更高。
At present, voice isn’t a part of the GPT-4o API for all customers. OpenAI, citing the risk of misuse, says that it plans to first launch support for GPT-4o’s new audio capabilities to “a small group of trusted partners” in the coming weeks.
目前,語音還不是 GPT-4o API 面向所有客戶的一部分。 OpenAI 以濫用風險為由表示,計劃在未來幾週內首先向 "一小部分可信賴的合作夥伴 "推出對 GPT-4o 新音頻功能的支持。
GPT-4o is available in the free tier of ChatGPT starting today and to subscribers to OpenAI’s premium ChatGPT Plus and Team plans with “5x higher” message limits. (OpenAI notes that ChatGPT will automatically switch to GPT-3.5, an older and less capable model, when users hit the rate limit.) The improved ChatGPT voice experience underpinned by GPT-4o will arrive in alpha for Plus users in the next month or so, alongside enterprise-focused options.
GPT-4o 從即日起在 ChatGPT 的免費層中推出,OpenAI 的高級 ChatGPT Plus 和 Team 計劃的用戶也可使用 GPT-4o,其訊息限制 "高出 5 倍"。 (OpenAI指出,當使用者達到速率限制時,ChatGPT將自動切換到GPT-3.5,這是一款效能較差的舊型號)。由 GPT-4o 支援的改良版 ChatGPT 語音體驗將在下個月左右與企業級選項一起為 Plus 用戶推出 alpha 版。
In related news, OpenAI announced that it’s releasing a refreshed ChatGPT UI on the web with a new, “more conversational” home screen and message layout, and a desktop version of ChatGPT for macOS that lets users ask questions via a keyboard shortcut or take and discuss screenshots. ChatGPT Plus users will get access to the app first, starting today, and a Windows version will arrive later in the year.
與此相關的消息是,OpenAI 宣布將在網路上發布煥然一新的ChatGPT UI,新的主螢幕和訊息佈局"更具對話性",同時還將發布MacOS 版的桌面ChatGPT,讓用戶可以透過鍵盤快捷鍵提問或截圖討論。 ChatGPT Plus 用戶將從今天開始首先使用該應用程式,Windows 版本將在今年稍後推出。
Elsewhere, the GPT Store, OpenAI’s library of and creation tools for third-party chatbots built on its AI models, is now available to users of ChatGPT’s free tier. And free users can take advantage of ChatGPT features that were formerly paywalled, like a memory capability that allows ChatGPT to “remember” preferences for future interactions, upload files and photos, and search the web for answers to timely questions.
在其他方面,OpenAI 基於其人工智慧模型建立的第三方聊天機器人庫和創建工具 GPT Store 現已向 ChatGPT 的免費用戶開放。免費用戶還可以使用以前需要付費才能使用的 ChatGPT 功能,例如記憶功能,它允許 ChatGPT 為未來的互動 "記住 "偏好、上傳文件和照片,以及在網上搜尋及時問題的答案。
We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.
我們將推出人工智慧通訊!在此註冊,從 6 月 5 日起您就可以在收件匣中收到我們的新聞通訊了。
![Read more about OpenAI's Spring Event on TechCrunch](https://techcrunch.com/wp-content/uploads/2024/05/openAI-spring-event-banner.jpg)
Comment 評論