特古斯

Tegus Transcript

Alphabet Inc - Former Executive at PayPal

Interview conducted on May 17, 2024

Former Executive at PayPal. The expert can speak to their Gen AI use cases.
前 PayPal 高管。专家可以讲述他们的 Gen AI 使用案例。

Former Executive at PayPal, leaving in April 2024. The expert is responsible for Generative AI solutions & Large Language Model (LLM) usage and productization, Machine Learning & AI Systems at a massive scale (ML Solutions & Platforms), Enterprise & Systems Architecture Cloud Native and Hybrid Cloud Solutions and Big Data Platforms and Analytics for business impact.
PayPal 前高管，2024 年 4 月离职。该专家负责生成式人工智能解决方案和大型语言模型（LLM）的使用和产品化、大规模机器学习和人工智能系统（ML 解决方案和平台）、企业和系统架构云原生和混合云解决方案以及大数据平台和分析，以提高业务影响力。

The expert can speak to Cloud Native, Microservices, MLOps, Machine Learning, Generative AI, Large Language Models, Fine-tuning and Evaluation of Foundational LLMs, Retrieval Augmented Generation (RAG) Pipelines, Vector Databases, Data Science, Agile Leadership, Innovation, REST APIs, Software Architecture, Enterprise Architecture, Cloud Computing - Amazon Web Services (AWS), Lambda, EMR, S3, DynamoDB, Kinesis, RDS, RedShift, Google Cloud Platform (GCP) Cloud Functions, Dataproc, BigQuery, BigTable, Spanner etc.
该专家能够讲述云原生、微服务、MLOps、机器学习、生成式人工智能、大型语言模型、基础 LLMs 的微调和评估、检索增强生成 (RAG) 管道、矢量数据库、数据科学、敏捷领导、创新、REST API、软件架构、企业架构、云计算--亚马逊网络服务（AWS）、Lambda、EMR、S3、DynamoDB、Kinesis、RDS、RedShift、谷歌云平台（GCP）云函数、Dataproc、BigQuery、BigTable、Spanner 等。

Q: Do you have any Gen AI use cases in a production/live environment where the entire stack is on-prem?
问：在整个堆栈都是内部部署的生产/运行环境中，您是否有任何 Gen AI 使用案例？
A: Yes multiple use cases where the entire GenAI stack is on-prem. Q/A HR chat bot, Compliance use cases for summarization, Custer Service App for case summarization etc.
答：是的，有多个使用案例，整个 GenAI 堆栈都是内部部署的。Q/A 人力资源聊天机器人、用于总结的合规用例、用于案例总结的 Custer 服务应用等。

Q: Can you speak in detail about the entire tech stack (Services/App/Middleware/Model/Infra) being used for these use cases and what were the decision criteria? Please provide an example.
问：能否详细介绍一下这些用例所使用的整个技术栈（服务/应用程序/中间件/模型/Infra），以及决策标准是什么？请举例说明。
A: Yes: 答：是的：
From LLM outwards: 从LLM向外：
LLM - Llama 13B hosted on A100s with vLLM
LLM - 使用 vLLM 在 A100 上托管的 Llama 13B
LLM Gateway home grown LLM"门户 "本土化
Guardrails for i/o sanitization
i/o 消毒防护栏
LlamaIndex for hooking into data
用于连接数据的 LlamaIndex
Flask App on K8s for End point
K8s 终端上的 Flask 应用程序

Q: Can you speak in detail about some of the technical challenges in driving GenAI use cases to full production, and how you overcame those challenges? Please provide an example.
问：能否详细谈谈在推动 GenAI 用例全面投入生产过程中遇到的一些技术挑战，以及你们是如何克服这些挑战的？请举例说明。
A: Yes 答：是的
Example - How do you improve accuracy over naive Retrieval Augmented Generation (RAG) use cases
示例 - 如何在天真的检索增强生成 (RAG) 使用案例中提高准确性
How do you get LLM Evaluations and whole system evals
如何获得 LLM 评估和整个系统的评估？
etc 等等

Q: Can you speak in detail on how your organization plans to use GenAI in the next few (3-5) years? (e.g. use cases, architectural choices, model decisions)
问：能否详细谈谈贵组织计划在未来几年（3-5 年）内如何使用 GenAI？(例如用例、架构选择、模型决策等）
A: Yes - can provide some overview within confidentiality respecting areas.
答：是的--可以在保密范围内提供一些概况。

Compliance 合规性

This document may not be reproduced, distributed, or transmitted in any form or by any means including resale of any part, unauthorized distribution to a third party or other electronic methods, without the prior written permission of Tegus, Inc.
未经 Tegus, Inc. 事先书面许可，不得以任何形式或通过任何方式复制、分发或传播本文件，包括转售任何部分、未经授权向第三方分发或其他电子方式。

Tegus Client Tegus 客户

Thank you for taking the time to chat today. I’m trying to understand some of the customer adoption patterns in technical architectures for generative AI use cases, so specifically, we want to understand four things in this interview.
感谢您今天抽出时间和我聊天。我想了解客户在生成式人工智能用例的技术架构中采用的一些模式，所以具体来说，我们想在这次访谈中了解四件事。

First is just a discussion on your use cases. So what are the main use cases that you have deployed GenAI for? Second is understanding the end-to-end technologies back for this use case. Third, we want to understand some of the challenges that you have faced in implementation and how you solve them.
首先是关于使用案例的讨论。你们部署 GenAI 的主要用例是什么？其次是了解该用例所需的端到端技术。第三，我们想了解你们在实施过程中遇到的一些挑战，以及你们是如何解决这些挑战的。

And lastly, we want to have a little bit of a future outlook discussion on how PayPal plans to consume generative AI. So coming to the first section on use cases. Could you please talk about the top three or four use cases that are on-prem and are in production right now at PayPal?
最后，我们想就 PayPal 计划如何使用生成式人工智能进行一些未来展望的讨论。下面是关于用例的第一部分。能否请您谈谈 PayPal 目前正在运行的前三、四个用例？

Former Executive at PayPal
前 PayPal 高管

Sure. So before we go to the details of the use cases, PayPal users, I think two modes to deploy any piece of software. The first is fully on-prem, which is we have data centers spread across the country, mainly sort like Citi, there's one in the East Coast, there's one in the West Coast and so on.
在我们讨论使用案例的细节之前，PayPal 用户有两种部署软件的模式。第一种是完全内部部署，即我们在全国各地都有数据中心，主要是像花旗银行那样，东海岸有一个，西海岸有一个，等等。

And the other one is within its GCP tenant, the Google Cloud Service tenant. And since we have done a lot of due diligence with Google, we don't actually treat Google deployment as anything different as long as it is within our tenant as long as it is within PayPal controlled tenant.
另一个是在其 GCP 租户（谷歌云服务租户）内。由于我们对谷歌做了大量尽职调查，因此只要谷歌部署在我们的租户内，只要谷歌部署在贝宝控制的租户内，我们就不会将其视为不同的部署。

Wherever possible, I'll try to call out that this is on GCP. But just keep in mind that when we see on-prem, it could include either of those. The big difference is, of course, if it is a foundation model for generative AI. That is not within our own tenant that's always centrally managed, Google deployment, if you use Gemini or something like that. And that is definitely not treated as on-prem equivalent, that is treated as a third-party foundation model.
只要有可能，我都会尽量说明这是在 GCP 上。但请记住，当我们看到 on-prem 时，它可能包括其中任何一种。当然，最大的区别在于，它是否是生成式人工智能的基础模型。如果你使用双子座或类似的东西，那就不是在我们自己的租户（总是集中管理的谷歌部署）内。这绝对不等同于内部部署，而是第三方基础模型。

Tegus Client Tegus 客户

Got it. So apart from these two modes, then you are not deploying any of your software stack on a public cloud instance at all. It's either private GCP instance or fully on-prem. Is that right?
明白了。因此，除了这两种模式，你根本没有在公共云实例上部署任何软件栈。要么是私有 GCP 实例，要么是完全内部部署。是这样吗？

Former Executive at PayPal
前 PayPal 高管

That is one way to think about it. We do use foundation models, we use GPTs as well. So we are making calls to GPT, but none of our software gets deployed in the, for example, public cloud other than these two modalities.
这是一种思考方式。我们确实使用基础模型，也使用 GPT。因此，我们会调用 GPT，但除了这两种模式外，我们的软件都不会部署到公共云中。

Tegus Client Tegus 客户

Got it. And is this specific to generative AI or is this through across all different use cases you have at PayPal?
明白了。这是生成式人工智能所特有的，还是贯穿于 PayPal 的所有不同用例？

Former Executive at PayPal
前 PayPal 高管

No, this is true for generative AI specifically. PayPal actually has fortunately or unfortunately, a footprint in all major clouds because of when more we have a full AWS stack running because of honey, we have a full secondary GCP stack running.
不，具体到生成式人工智能，情况确实如此。PayPal幸运或不幸地在所有主要云中都留下了足迹，因为我们有一个完整的AWS堆栈在运行，因为蜂蜜，我们有一个完整的二级GCP堆栈在运行。

There is a core PayPal stack running in Google as well. And we have Azure components as well because of some other acquisitions. So we have a footprint in all three clouds. But for generative AI, specifically, it is either fully on-prem or GCP private tenant of PayPal, except for foundation models where the foundation models could be any way.
谷歌也在运行 PayPal 核心堆栈。由于其他一些收购，我们还拥有 Azure 组件。因此，我们在这三种云中都有足迹。但具体到生成式人工智能，除了基础模型外，PayPal 要么完全采用内部部署，要么采用 GCP 私有租户。

Tegus Client Tegus 客户

Got it. No, thanks for laying out that context. So with that context, could you please talk about some of the use cases? And also why you're taking use cases is just to clarify if its fully on-prem or if it's GCP tenant. And I totally understand if the model is housed somewhere. But other parts of those ties will let me know where they are.
明白了不，谢谢你介绍的背景。那么在这种背景下，能否请您谈谈一些用例？还有，你为什么要讲用例，只是为了澄清是完全内部部署还是 GCP 租户。我完全理解，如果模型存放在某个地方。但这些纽带的其他部分会让我知道它们在哪里。

Former Executive at PayPal
前 PayPal 高管

Yes, definitely. So the way it started technically actually was that like everybody else, I think the company was doing very traditional machine learning, and we were using different kinds of machine learning models typically built in-house. There is a large team of data scientists, machine learning engineers, software engineers. We have multiple machine learning platforms. So using all this, we built machine learning models deployed them.
是的，当然。因此，从技术上讲，它的起步方式实际上是，像其他人一样，我认为公司在做非常传统的机器学习，我们在使用不同种类的机器学习模型，这些模型通常都是内部构建的。我们有一支由数据科学家、机器学习工程师和软件工程师组成的庞大团队。我们有多个机器学习平台。因此，利用所有这些，我们建立了机器学习模型，并将其部署到了这些平台上。

They were always deployed in-house on-prem or on GCP, like I mentioned, in different areas. Primarily around risk to make sure that a transaction was not fraudulent, but also around marketing, around customer service and so on. And the most we had gotten was we were using transformer models. They were bot-based models in some use cases. We were never in on the GPT train until the ChatGPT came out 2022 late November.
就像我提到的，它们总是部署在内部或 GCP 上的不同区域。主要围绕风险，以确保交易不是欺诈，但也围绕营销、客户服务等。我们得到的最多的信息是我们正在使用变压器模型。在某些使用案例中，它们是基于机器人的模型。在 ChatGPT 于 2022 年 11 月底问世之前，我们从未加入过 GPT 的行列。

I was among the first people to kind of understand the impact on what's going to happen. So once we started actually building these models, one of the first bottlenecks we had, I'll talk about the use case as well, but good to give you like a little bit of history.
我是第一批意识到这对未来影响的人。因此，一旦我们开始实际构建这些模型，我们遇到的第一个瓶颈之一就是，我也会谈谈用例，但给你们介绍一下历史也不错。

One of the first use cases we had was obviously the customer service chatbot kind of use case. However, I think after lots of discussions, what came out was that the different legal policy Infosec, none of these groups within the company were comfortable with exposing anything to an external customer until we understood how it operated internally until we had some "internal" deployments of generative AI use cases.
我们最初的用例之一显然是客户服务聊天机器人类型的用例。但是，我认为经过多次讨论后，我们发现公司内部不同的法律政策、信息安全和信息安全小组都不愿意将任何东西暴露给外部客户，直到我们了解内部是如何运作的，直到我们有了一些生成式人工智能用例的 "内部 "部署。

So even though we started out as trying to build a customer service chatbot, that effort changed to build internal use cases after a few months after these discussions became clear. So the deployments would be internal deployments first approvals or evaluation followed by whatever external facing deployments, if there were in. Does that make sense?
因此，尽管我们一开始是想开发一个客户服务聊天机器人，但经过几个月的讨论后，我们还是将目标转向了开发内部用例。因此，我们的部署首先是内部部署，然后是审批或评估，最后才是面向外部的部署（如果有的话）。这样做合理吗？

Tegus Client Tegus 客户

Yes. So just to clarify, so you wanted to have customer service chatbots, but you run into legal policy hurdles, and that's why they voted to then creating this chatbot mostly for internal users.
是的，我想说明一下，你们想开发客户服务聊天机器人，但遇到了法律政策障碍，所以他们投票决定主要为内部用户开发这个聊天机器人。

Former Executive at PayPal
前 PayPal 高管

Internal use cases, there could be chatbots or whatever else, but they had to be internal to the company without being any external customer having to being exposed to any generative AI kind of output. And even within internal, there would be select groups, for example, would be just people within the Americas, who will see something could be within HQ or different groups of people. So they wanted to take a measured approach, I think, which made complete sense.
内部用例，可以是聊天机器人或其他任何东西，但必须是公司内部的，不能让任何外部客户接触到任何生成式人工智能类型的输出。即使是在内部，也会有特定的群体，例如，只是美洲地区的人，他们会看到总部或不同群体的东西。因此，他们希望采取一种有分寸的方法，我认为这完全合情合理。

Tegus Client Tegus 客户

So are they planning to then roll it out to external at some point or still not decided?
那么，他们是否计划在某个时候向外部推广，还是仍未决定？

Former Executive at PayPal
前 PayPal 高管

The decision has been made, but it will happen, I think, towards the end of Q2 probably.
我们已经做出了决定，但我认为可能会在第二季度末做出。

Tegus Client Tegus 客户

So now you don't have those legal policy issues that you were having earlier?
所以，现在你们没有之前的那些法律政策问题了？

Former Executive at PayPal
前 PayPal 高管

Now there are policies in place as to how we would deal with the outputs of these. What are the responsible AI principles? What are the guidelines around hallucinations, transparency and so on. So it took time to sort of work through that and develop those guidelines to make sure that consumers are protected in.
现在，我们已经制定了如何处理这些产出的政策。负责任的人工智能原则是什么？关于幻觉、透明度等方面的指导方针是什么？因此，我们需要时间来梳理和制定这些指导方针，以确保消费者得到保护。

Tegus Client Tegus 客户

These are internal guidelines that PayPal developed or is it more like external government-led guidelines?
这些是 PayPal 制定的内部准则，还是更像是政府主导的外部准则？

Former Executive at PayPal
前 PayPal 高管

These are all internal policy and security Infosec, responsible AI guidelines. They have been informed by some of the legislation that is being considered, but I don't think there is anything active yet unless there is some EU thing. And you've taken a blanket stance that nothing goes out in EU, which is generative AI led for now.
这些都是内部政策和安全信息，是负责任的人工智能指导方针。它们已经参考了一些正在审议的立法，但我认为目前还没有任何积极的立法，除非有一些欧盟的东西。而你已经采取了一种一刀切的立场，即欧盟目前还没有任何关于生成式人工智能的规定。

Tegus Client Tegus 客户

Got it. So like the first use case that I'm hearing is then these internal chatbots and these chatbots are on-prem or are these GCP tenant?
明白了。那么，我听到的第一个用例是这些内部聊天机器人，这些聊天机器人是内部的还是 GCP 租户的？

Former Executive at PayPal
前 PayPal 高管

So I'll get into the details of what the internals are. Maybe this is a good segue into that. The internal use cases are around first is Intranet sort of a chatbot, where we had, just like any other company, we have multiple internal documents.
因此，我将详细介绍内部结构。也许这是一个很好的切入点。内部用例首先是内联网聊天机器人，和其他公司一样，我们有多个内部文档。

They could be HR, they could be some policy document this that and so on. And there is a typical search bar, I think, Confluence SharePoint out of these. There is a search bar. And in the search bar, you can go in and search for something and you'll get those blue links, click on this or that, standard. So most I think companies have some variation of this. And one of the use cases was, see, if you can embed a answer widget here in this scenario.
它们可能是人力资源，也可能是一些政策文件，等等。我认为，Confluence SharePoint 有一个典型的搜索栏。有一个搜索栏。在搜索栏里，你可以进去搜索一些东西，然后你会得到那些蓝色链接，点击这个或那个，标准的。因此，我认为大多数公司都有这样的变体。其中一个用例是，看看能否在这种情况下嵌入一个答案小部件。

So this is our traditional Q&A sort of a use case. Where employee wants some information about certain things, and they would go to an intranet and they would instead of trying to click around on 50 things and do keyhole searches, they would get an answer type.
这就是我们传统的问答用例。员工想要了解某些事情的信息，他们会访问内部网，而不是试图点击 50 个东西并进行锁孔搜索，他们会得到一个答案类型。

Right? So this is one use case. The second use case is around a scenario. Let's say, you want to embed a PayPal Pal button on any website. And that puts you in the merchant category. So PayPal is a two-sided network, which means you have customers who are trying to pay on one side and merchants who are trying to get the payments from the customers on the other side. I think PayPal is sort of sitting in the middle. This is one of the Business models.
对不对？所以这是一个用例。第二个用例围绕一个场景展开。比方说，你想在任何网站上嵌入 PayPal Pal 按钮。这样，你就属于商家类别。PayPal是一个双向网络，也就是说，一边是试图付款的客户，另一边是试图从客户那里获得付款的商家。我认为 PayPal 处于中间位置。这是商业模式之一。

PayPal has multiple different Business models, but this is the sort of core. So in this model, customers can just sign up. I can just give you my e-mail ID, you being PayPal, and PayPal will say, okay, we know who you are, we know how to make payments on your behalf, right. On the merchant side, your merchant will say, "Hey, I want to actually integrate with PayPal and have those payments flowing to me."
PayPal 有多种不同的商业模式，但这是最核心的一种。在这种模式下，客户可以直接注册。我只要给你我的电子邮件 ID，你就是 PayPal，PayPal 就会说，好吧，我们知道你是谁，我们知道如何代表你付款，对吧。在商家方面，你的商家会说："嘿，我想与贝宝整合，让这些付款流向我。"

So one of the key things you have to pass is that PayPal needs to make sure that you are, first of all, not doing anything which would fall outside of the acceptable use and stuff like that. You are not a tobacco merchant or you're not some kind of weapons guy or something like that.
因此，你必须通过的一个关键事项是，贝宝需要确保你，首先，没有做任何超出可接受使用范围之类的事情。你不是烟草商，也不是武器商之类的。

Every merchant goes through this compliance validation process. Initially and periodically. Business models might change, people might. So the question is how do you keep track. Let's say I sign up. I am a toy seller of some kind, and I start selling guns, toy guns which is okay. It's a toy. But tomorrow, I might pivot into real weapons of some kind. And typically, we don't want like a PayPal button showing up in a weapon state.
每个商家都要经历这一合规性验证过程。最初和定期进行。商业模式可能会改变，人们也可能会改变。因此，问题是你如何跟踪。假设我注册了。我是一个玩具销售商，我开始卖枪，玩具枪，这没问题。这是玩具。但明天，我可能会转行卖某种真正的武器。通常情况下，我们不希望在武器状态下显示 PayPal 按钮。

Let's assume that that's the policy for whatever reason. Right or wrong, but that's the policy. The question is how do you keep track of this initially as well as ongoing scenario? So there are different mechanisms used and let's go with the initial one, let's say, first, you apply and you say "I'm a merchant, I want to become a PayPal affiliated merchant." The use case is there will be a compliance agent who will go through your documentation, your website and make sure that this is following the rules, does that make sense?
不管出于什么原因，让我们假设这是政策。不管对错，但这就是政策。问题是，你如何在初始阶段和持续阶段跟踪这一情况？因此，有不同的机制可以使用，让我们从最初的机制开始，比方说，首先，你提出申请，你说："我是一个商家，我想成为贝宝的附属商家。"使用情况是，会有一个合规代理来查看你的文件和网站，确保这符合规则，这样做有意义吗？

So in order to do this, typically, this was done by humans. And it takes a person sometimes 30 minutes, sometimes an hour, sometimes more, depending on how extensive your web properties are or your documentation. These days, very few people actually submit documentation as such, they just send over a website link. So assume you have a website link. How would you know that everything there follows policy.
因此，为了完成这项工作，通常都是由人工完成的。一个人有时需要 30 分钟，有时需要一个小时，有时需要更多时间，这取决于你的网络属性或文档有多广泛。如今，很少有人真正提交文档，他们只是发送一个网站链接。所以，假设你有一个网站链接。你怎么知道那里的一切都符合政策规定？

You would click around, you would have a script, maybe you would go through a bunch of things, and you will say, okay, this looks reasonable. So what we wanted to do was we wanted to assist that human in making that call. By using generative AI to sort of summarize and give you good summaries of what was there on the website and put it in different categories and so on. So this was a compliance sort of use case.
你会到处点击，你会有一个脚本，也许你会经历一堆事情，然后你会说，好吧，这看起来很合理。因此，我们想做的就是协助人类做出判断。通过使用生成式人工智能来总结网站上的内容，并将其归入不同的类别等。因此，这是一个合规性的用例。

Tegus Client Tegus 客户

So again, quick question here. So did you use an actual large language model here?
再问一个简单的问题。你是否使用了一个实际的大型语言模型？

Former Executive at PayPal
前 PayPal 高管

Yes. So I'll come to the details of each of these use cases. The first one was an internal like answer bot. The second one was a compliance use case for summarization. The third one was when we have merchants, there is like a long tail phenomenon in pretty much, I think, at the back end of every service provider where a few merchants are responsible for a majority of the customer complaints.
是的，下面我将逐一介绍这些用例的细节。第一个用例是内部答疑机器人。第二个是合规用例，用于总结。第三个用例是当我们有商家时，我认为在每个服务提供商的后端都存在一种长尾现象，即少数商家负责大部分客户投诉。

Typically, large volume merchants, they have the infra to make sure things go smoothly and very small merchants don't even have the volume to have too many problems. It's somewhere in the middle where they don't have a proper infra on their side, which leads to some challenges and then their follow-up, file a problem case or something like that.
通常情况下，数量大的商家有足够的基础设施来确保一切顺利，而规模小的商家甚至没有足够的基础设施来解决太多的问题。在中间的某个地方，他们没有适当的基础设施，这导致了一些挑战，然后他们的后续行动，提交一个问题案例或类似的东西。

And this could be hundreds, thousands. Every single one needs to be addressed. Some customers payment got stuck, something. So there is this long tail Business. I don't know about the details, but and a human has to attend to every one of these because every one of these is some kind of an exception. Most of them are related to things on the merchant side because they are trying to minimize their involvement in the integration, but they want to have a very smooth experience.
这可能是成百上千。每一个问题都需要解决。有些客户的付款被卡住了这就是长尾业务。我不知道具体细节，但每一个问题都需要人工处理，因为每一个问题都是某种例外情况。其中大部分都与商家方面的问题有关，因为他们正试图尽量减少对集成的参与，但他们希望获得非常流畅的体验。

Well, not how anything works. So most of them can be traced back to some challenges that they had and they kind of in fully go through. So anyway, and these calls could get pretty long. So there was a summarization bot for some of these customers. So it will be a customer service agent, you would get on a call, you would say, I'm trying to address this use case, but that customer might have 500 of previous, for example issues.
嗯，不是任何事情都是这样的。所以，他们中的大多数人可以追溯到他们遇到的一些挑战，他们那种在完全通过。所以，无论如何，这些电话可能会变得很长。因此，我们为其中一些客户设计了一个总结机器人。因此，这将是一个客户服务代理，你会接到一个电话，你会说，我正试图解决这个用例，但该客户以前可能有 500 个问题。

So you want a quick summary of the latest ones, because it's a challenge for that customer to even remember how many were filed. And for the customer service agent to be able to use all that in context. So this is sort of a slightly complex use case, but this was very helpful even with whatever we could do. So these were three use cases, I can talk about these. First one.
因此，您需要一份最新投诉的快速摘要，因为要客户记住有多少投诉是一项挑战。对于客服人员来说，能够在上下文中使用所有这些信息也是一个挑战。因此，这是一个略显复杂的用例，但无论我们能做什么，这都很有帮助。我可以谈谈这三个用例。第一个。

Tegus Client Tegus 客户

Sorry. So in terms of like signing the criticality of the use cases that you described, would you use any quantitative measure to kind of give an indication of how material or critical these use cases were to PayPal, because what I'm trying to get to is that like even though enterprises are implementing GenAI, are they implementing GenAI to core Business problems? Or is it still like French problems?
对不起。那么，就像签署你所描述的用例的关键性一样，你是否会使用任何量化措施来说明这些用例对 PayPal 有多重要或多关键，因为我想说的是，即使企业正在实施 GenAI，他们实施 GenAI 是为了解决核心业务问题吗？还是仍然像法国问题一样？

Former Executive at PayPal
前 PayPal 高管

I think what we did was there was a ROI evaluation for everyone, every use case that comes through, we do an ROI evaluation. And in addition to ROI, there are multiple things which are considered. The first one is what is the risks of it. So that becomes a critical evaluation criteria and when I say risk surface, it is well known at least after I was done with it.
我认为我们所做的是对每个人、每个使用案例进行投资回报率评估。除了投资回报率之外，我们还考虑了很多因素。首先是它的风险是什么。因此，这成为一个关键的评估标准，当我说风险面时，至少在我完成评估后，风险面是众所周知的。

It is well known that the large language models are prone to hallucinations. There are ways we could mitigate them, but they are prone to hallucinations. So any scenario where even minimal rate of hallucinations was considered not acceptable. We would not start actively working on them until we had confidence that we could mitigate them to a very, very small percentage. So this was one.
众所周知，大型语言模型容易产生幻觉。我们有办法减轻幻觉，但它们很容易产生幻觉。因此，任何情况下，哪怕是最小的幻觉发生率都是不可接受的。在我们有信心将幻听率降低到极小的比例之前，我们不会开始积极地研究它们。这就是其中之一。

Tegus Client Tegus 客户

Got it. In terms of ROI were the three use cases that you described, are they all ROI positive?
明白了。就投资回报率而言，你所描述的三个用例，它们的投资回报率都是正数吗？

Former Executive at PayPal
前 PayPal 高管

They're all ROI positive. And there are, I think, easily 300 or so different use cases that are being considered. And these three were fitting the criteria of we could start to work on this immediately. Whatever we could do was going to be a net positive unless, of course, the investment turned out to be huge because we were exploring other solutions as well for all of these use cases.
它们的投资回报率都很高。我认为，目前正在考虑的不同用例有 300 种左右。而这三个符合我们可以立即着手的标准。我们所做的一切都会带来净收益，当然，除非投资变得非常巨大，因为我们还在为所有这些用例探索其他解决方案。

So these use cases were not new that, hey, because of generative AI now we can tackle them. We were trying to explore the solutions for these anyway because of different challenges. So generative AI seemed like a natural fit. And the third point was all of these were fully internal. So we could immediately start cranking on that.
因此，这些用例并不新鲜，嘿，因为有了生成式人工智能，我们现在可以解决它们了。无论如何，由于面临不同的挑战，我们一直在努力探索这些问题的解决方案。因此，生成式人工智能似乎是一个天然的契合点。第三点是，所有这些都是完全内部的。因此，我们可以立即着手进行。

Tegus Client Tegus 客户

Got it. So so since you're already working on these use cases and generative AI came along and then the Business decided to just use generative AI to successfully solve these probes because you were spending money to solve these problems anyway.
明白了。所以，既然你们已经在研究这些用例，而生成式人工智能出现了，然后企业决定使用生成式人工智能来成功解决这些问题，因为无论如何，你们都在花钱解决这些问题。

Former Executive at PayPal
前 PayPal 高管

Exactly. And there was automation already in each of these. On the HR internal question and QA bot, there was automation going on. How could we make improvements on what the answers were, for example, in the compliance, we were already doing some scraping and some kind of bucketization.
没错。这些机器人都已经实现了自动化。在人力资源内部问题和 QA 机器人上，自动化正在进行。例如，在合规性方面，我们已经进行了一些刮擦和某种桶化处理。

Tegus Client Tegus 客户

Did GenAI like give you very significant gains over existing automation?
与现有自动化相比，GenAI 是否为您带来了非常显著的收益？

Former Executive at PayPal
前 PayPal 高管

I think the ease with which we were able to implement these and the responses that we see. There were very significant successes.
我认为，我们能够轻松地实施这些计划，而且我们看到了反应。我们取得了巨大的成功。

Tegus Client Tegus 客户

I really quickly wanted to move to the architecture discussion. And so before the architecture discussion, so a quick question on the two modes of deployment you said fully on-prem and GCP, private tenant. So how do you decide between the two of them? How do you decide if you want to deploy something on-prem versus GCP?
我很快就想进入架构讨论。在讨论架构之前，我想问一下你所说的完全内部部署和 GCP 私有租户这两种部署模式。那么，你如何在这两种模式之间做出选择？如何决定在内部部署还是在 GCP 上部署？

Former Executive at PayPal
前 PayPal 高管

I think the first one is, how sensitive is the information that will be sent to the large language model? That's the sort of key decider. The two use cases where, for example, there is compliance stuff or more particularly customer service case summarization, that one we deployed fully internally, because this was all customer information merchant information but still customer information. And it's not trivial to separate out PII versus non-PII kind of stuff from in. Yes?
我认为第一个问题是，将被发送到大型语言模型的信息有多敏感？这是关键的决定因素。我们在内部部署了两种使用案例，一种是合规性案例，另一种是客户服务案例总结，因为这些都是客户信息、商家信息，但仍然是客户信息。要从其中分离出 PII 和非 PII 类型的信息并非易事。有问题吗？

Tegus Client Tegus 客户

But the point is that like even on GCP is a private tenant. So it's not like you are exposing your data to public.
但问题是，即使在 GCP 上也是私人租户。因此，你并没有将数据暴露给公众。

Former Executive at PayPal
前 PayPal 高管

That is true. And it is within our firewall, all that stuff, but I think Infosec still had some lingering concerns. This is why we made sure that ultra cautious approach because it is an education to everybody. Ultimately, what it worked out was just the availability of GPUs, finally.
没错。它就在我们的防火墙内，所有这些东西，但我认为 Infosec 仍有一些挥之不去的顾虑。这就是为什么我们要确保采取极端谨慎的方法，因为这对每个人都是一种教育。最终，GPU 的出现终于解决了这个问题。

Tegus Client Tegus 客户

I see. So now moving to the architecture discussion. So for this discussion, I really want to focus on just one use case and understand that use cases architecture really well. So you said that the third use case, customer authorization for customer service complaints, you said that was kind of the most complex use case that you have implemented?
原来如此。现在开始讨论架构。在这次讨论中，我只想关注一个用例，并很好地理解这个用例的架构。你说第三个用例是客户服务投诉的客户授权，你说这是你实施过的最复杂的用例？

Former Executive at PayPal
前 PayPal 高管

Yes. That is actually still in progress. There are a few moving parts.
是的，实际上还在进行中。有一些变动的部分。

Tegus Client Tegus 客户

That is not in production?
这不是在生产吗？

Former Executive at PayPal
前 PayPal 高管

That is in production in a limited pilot sort of a way where only a few agents have subscribed to it. The answer bot is actually fully in production, it's taking considerable load to probably be, I think, better to talk about that because we have sort of solved that fully.
它是以有限试点的方式投入生产的，只有少数代理订阅了它。答题机器人实际上已经完全投入生产，我认为，它需要相当大的负荷，最好还是谈一下这个问题，因为我们已经完全解决了这个问题。

Tegus Client Tegus 客户

So what are the things that you have not been able to solve for the third use case?
那么，在第三个用例中，您还有哪些问题没有解决呢？

Former Executive at PayPal
前 PayPal 高管

The third use case, there are a few challenges. One, we are running into a context window kind of a challenge. We are using, I think, Llama two, and now we have Meta Llama three, but we are running into context window kind of problems because some of these case summaries and some of these could get very long.
第三个用例有几个挑战。其一，我们遇到了上下文窗口一类的挑战。我们正在使用，我想是 Llama 2，现在我们有了 Meta Llama 3，但我们遇到了上下文窗口一类的问题，因为其中一些案例摘要可能会很长。

And the data that we get to work on is transcripts of people speaking for example. Customers, and separating out from the agent, that's done automatically by software. However, because its transcription based, the accuracy is not very good.
我们所使用的数据是人们的讲话记录。客户，以及与代理的分离，都是由软件自动完成的。不过，由于是基于转录，准确性不是很高。

And so the summary sometimes is not very good at all, for example. And there are notes that the people have taken, but that's hard for even CS agents to do to actually type when they are speaking or something like that.
例如，有时摘要的内容并不是很好。还有人做了笔记，但即使是 CS 代理也很难做到在他们发言或类似内容时真正打字。

Tegus Client Tegus 客户

Got it. And the second use case, which was the merchant compliance thing, is that not in production either?
明白了第二个用例，也就是商户合规问题，也没有投入生产吗？

Former Executive at PayPal
前 PayPal 高管

That is in production Yes. That is in production. The limiting factor there is a lot of image content, which comes in. And image analyzing images is a little bit trickier than analyzing pure text sort of information?
正在生产是的。正在制作中。限制因素是有大量的图片内容。与分析纯文本信息相比，图像分析要棘手一些？

Tegus Client Tegus 客户

So still the agents that actually do the compliance checks are they not fully relying on the tool or the solutions that you have developed?
那么，实际进行合规性检查的代理商是否仍然没有完全依赖你们开发的工具或解决方案？

Former Executive at PayPal
前 PayPal 高管

The way it works is we give a confidence score on every website that didn't get spot. And if the confidence score is low, that's when the agent will actually go in and follow their script fully, if it is in the medium range, they will maybe do spot checks and if it is in the high range, they'll do probably only cursory stuff and approve it.
它的工作原理是，我们会对每个未被抽查的网站进行信心评分。如果信心分数较低，代理就会真正进入网站并完全按照他们的脚本进行操作；如果信心分数在中等范围内，他们可能会进行抽查；如果信心分数在较高范围内，他们可能只会做一些粗略的检查并予以批准。

Tegus Client Tegus 客户

And which model do you use for this one?
您使用的是哪种型号？

Former Executive at PayPal
前 PayPal 高管

For the categorization and summarization of the websites, we actually use the mixture-of-experts model. For the Q&A or the answer but we are using Llama-2-13b, which has been also fine-tuned for that. For the customer service summarization, we are using Llama 70B, but we are getting a little bit compute challenge there, but we are still trying to use that model.
对于网站的分类和摘要，我们实际上使用的是专家混合模型。对于问答或答案，我们使用的是 Llama-2-13b，该模型也进行了微调。对于客户服务摘要，我们使用的是 Llama 70B，但我们在这方面遇到了一些计算挑战，但我们仍在尝试使用该模型。

Tegus Client Tegus 客户

70B, this is Llama two or three?
70B，这是拉玛二号还是三号？

Former Executive at PayPal
前 PayPal 高管

Two. We are switching to three. We are in the process of switching to three because I think three's performance is much better, but that is still in progress.
两个。我们正在改用三号。我们正在改用三号机，因为我认为三号机的性能更好，但这仍在进行中。

Tegus Client Tegus 客户

So it seems like even though we have come across a lot of chatbot cases. Hence, I was hoping that we will also discuss this compliance case that you were talking about. But is that a case that you don't have full information on stack?
因此，尽管我们已经遇到了很多聊天机器人案例，但似乎还没有什么进展。因此，我希望我们也能讨论一下你说的这个合规案例。但这个案例你是否还不完全了解？

Former Executive at PayPal
前 PayPal 高管

It is going through a lot of changes. But if that is what we want to talk about, that's okay. We can talk about that as well.
它正在经历许多变化。但如果这就是我们想谈的，那也没关系。我们也可以谈这个。

Tegus Client Tegus 客户

But is it on-prem? Or is it a GCP tenant?
但它是内部部署的？还是 GCP 租户？

Former Executive at PayPal
前 PayPal 高管

This is on-prem. However, since we have switched to 70B, we are running out of GPUs, which is why we are in the process of switching.
这是内部部署。不过，自从我们转用 70B 后，我们的 GPU 就快用完了，这也是我们正在进行切换的原因。

Tegus Client Tegus 客户

Sorry, I'm talking about the second use case, where you say it's Mistral turmoil.
对不起，我说的是第二种情况，也就是你说的 Mistral 动荡。

Former Executive at PayPal
前 PayPal 高管

The Mistral model is GCP.
Mistral 型号为 GCP。

Tegus Client Tegus 客户

So the compliance use case is on GCP.
因此，合规用例是在 GCP 上。

Former Executive at PayPal
前 PayPal 高管

Yes. 是的。

Tegus Client Tegus 客户

And the chatbot use case, that's fully on run.
而聊天机器人的使用案例，则完全在运行中。

Former Executive at PayPal
前 PayPal 高管

The Q&A bot, yes. 问答机器人，是的。

Tegus Client Tegus 客户

And you said for the first use case, you did some fine-tuning as well?
你说在第一个使用案例中，你也做了一些微调？

Former Executive at PayPal
前 PayPal 高管

Yes. For the Q&A bot we did some fine-tuning as well. That is correct.
是的。对于问答机器人，我们也做了一些微调。没错。

Tegus Client Tegus 客户

And like by fine-tuning, I don't mean RAG necessarily. So it's like you retrain the model on some additional data.
我说的微调，不一定是指 RAG。这就好比你在一些额外的数据上重新训练模型。

Former Executive at PayPal
前 PayPal 高管

Yes, that is correct. It does use a RAG pattern, obviously. Q&A bot uses RAG, but we used a best technique that will actually fine-tune the Llama model.
是的，没错。显然，它使用的是 RAG 模式。Q&A 机器人使用 RAG，但我们使用了一种最佳技术，可以对 Llama 模型进行实际微调。

Tegus Client Tegus 客户

So let's probably talk about the first use case. So if you could provide a detail on the stack of this one.
那我们就来谈谈第一个用例吧。请详细介绍一下这个案例的堆栈。

Former Executive at PayPal
前 PayPal 高管

Yes. The Q&A bot. 是的，问答机器人。

Tegus Client Tegus 客户

Sorry, but also in the questions that we sent, you would have seen that the different comprehends of stack that we are interested in. So starting at the bottom from infra, then data pipeline, the natural model than the middleware and then finally, the application so if you could bought all of those.
很抱歉，但在我们发送的问题中，您也会看到我们感兴趣的不同堆栈。因此，从最底层的基础架构开始，然后是数据管道、自然模型、中间件，最后是应用程序。

Former Executive at PayPal
前 PayPal 高管

Yes. The model used was Llama-13b, Llama-2-13b. This is a fine-tuned version. We didn't start with the fine tune, obviously, but we started with the base model. We fine-tuned it. I'll go into the fine-tuning details a little bit later. This is the base model, it's running fully on-prem, I think because the compute works out. We are using a library called vLLM to run this influence. And running, I think, on four A100s if I remember correctly.
是的，使用的模型是 Llama-13b，Llama-2-13b。这是一个微调版本。显然，我们并不是一开始就进行微调，而是从基础模型开始。我们对其进行了微调。稍后我会详细介绍微调细节。这是基础模型，它完全在内部运行，我想这是因为计算的原因。我们使用一个名为 vLLM 的库来运行这种影响。如果我没记错的话，应该是在四台 A100 上运行。

Tegus Client Tegus 客户

And this vLLM that you call, it's just like a GPU virtualizer. Essentially it is virtualizing the GPU.
你所说的 vLLM 就像 GPU 虚拟化器。从本质上讲，它是 GPU 的虚拟化。

Former Executive at PayPal
前 PayPal 高管

There are, I think, a couple of others also that we evaluated here TGI was another library from Hugging Face, deep speed and a bunch of others. But vLLM was probably the best one.
我想，我们在这里还评估了其他几个库，TGI 是 Hugging Face 的另一个库，还有 deep speed 和其他一些库。但 vLLM 可能是最好的一个。

Tegus Client Tegus 客户

Best in terms of performance?
性能最佳？

Former Executive at PayPal
前 PayPal 高管

Best in terms of the ability to scale up as the number of requests increased. The goal was to have at least 500 TPS, the app being able to serve 500 transactions per second, if necessary. The load was well lower, obviously, but engineers to optimize. So yes, so moving outwards from that, there is a guard rails setup so that any input and output from this LLM goes through a guardrails.
最好是能够随着请求数量的增加而扩展。我们的目标是至少达到 500 TPS，必要时应用程序能够每秒处理 500 笔交易。显然，负载要低得多，但工程师们可以进行优化。是的，从这里向外延伸，有一个防护栏设置，因此来自 LLM 的任何输入和输出都要经过防护栏。

The guardrails themselves is a module that has three things. One is pattern-matching regex and what have you, regular pattern-matching. The second is a model-based approach where the model price to flag, if any sensitive information is flowing back and forth. It tries to put in dummy tokens for that.
护栏本身是一个模块，包含三个部分。其一是模式匹配 regex 和常规模式匹配。第二种是基于模型的方法，如果有任何敏感信息来回流动，模型就会进行标记。为此，它会尝试放入虚拟令牌。

Tegus Client Tegus 客户

So this is the large language model itself. That is flagging.
这就是大型语言模型本身。这就是标记。

Former Executive at PayPal
前 PayPal 高管

No, this is not an LLM, this is a classifier that we have trained. This is fully internal. So I think it's overkill, but anyway, this was a proving ground. So we wanted to have something here to make sure that we understood how many of these are PII based because all of the data that goes in to the RAG is fully vetted to have no sensitive information.
不，这不是 LLM，这是我们训练过的分类器。这是完全内部的。因此，我认为这是矫枉过正，但无论如何，这是一个试验场。因此，我们希望在这里提供一些信息，以确保我们了解其中有多少是基于 PII 的，因为所有进入 RAG 的数据都经过全面审核，不含任何敏感信息。

There won't be IDEs and all the other kind of information passing through, but any information that's coming out of the model, we also need to make sure that, that doesn't contain sensitive information. And since these models are trained on the Internet, we never know what sort of thing it might spit out eventually.
虽然不会有 IDE 和所有其他类型的信息通过，但我们也需要确保从模型中传出的任何信息不包含敏感信息。由于这些模型是在互联网上训练出来的，我们永远不知道它最终会吐出什么样的东西。

So that's why we have this guardrail. It's a classifier, it checks every single response. And the third one is a toxicity and bias ratings filter. This is another a different model, which is an NLU model, which rates every response that is input and output on certain features.
这就是我们设置护栏的原因。这是一个分类器，它会检查每一个回复。第三个是毒性和偏差评级过滤器。这是另一个不同的模型，它是一个 NLU 模型，根据特定的特征对输入和输出的每个响应进行评级。

Tegus Client Tegus 客户

So all these volumes, where do you house them on? Like what are the centering platform, but like if you do it on cloud there's a stage maker. If you do it on GCP is a Vertex. So where are you actually hosting these models? Is there like a platform on-prem that you.
那么，所有这些卷都放在哪里？中心平台是什么？如果在 GCP 上做，则需要 Vertex。那么，这些模型到底放在哪里？是否有一个内部平台？

Former Executive at PayPal
前 PayPal 高管

There is a platform on-prem. Unfortunately, we have multiple ways to host models. I believe that's inefficient, but we have multiple ways. We have NVIDIAs Triton server, we have Sheldon endpoints, which is where these models are hosted.
有一个内部平台。不幸的是，我们有多种托管模型的方式。我认为这是低效的，但我们有多种方式。我们有英伟达™（NVIDIA®）的 Triton 服务器，我们有谢尔顿端点，这些模型就托管在这些端点上。

Tegus Client Tegus 客户

And for this use case, you're using Sheldon endpoints?
在这种情况下，你使用的是谢尔顿端点？

Former Executive at PayPal
前 PayPal 高管

Yes. So input and output goes through those endpoints and we get scores or we get the other things that I mentioned.
是的，输入和输出都会经过这些端点，我们会得到分数或我提到的其他东西。

Tegus Client Tegus 客户

And sorry, one quick question. Sir, the reason again, just to confirm, the reason that you decided to do it fully on-prem and just the internal company data that you didn't want to put on Google?
对不起，还有一个问题。先生，再次确认一下，你们决定完全采用内部部署的方式，而只是不想把公司内部数据放在谷歌上的原因是什么？

Former Executive at PayPal
前 PayPal 高管

Yes. This would be all kinds of HR policies, risk policies and so on. So now this is something that we wanted to expose. It doesn't matter. This is a fully private Google tenant, which has within our firewall and both Google and the company does things like penetration testing, all of that stuff. So personally, I believe other than maybe pure customer financial data like credit card data, it should not be a problem to put anything on the GCP tenant, but Infosec maybe has different ideas. Because if that is penetrated, then we have much.
是的，这包括各种人力资源政策、风险政策等等。所以，现在这是我们想要曝光的东西。这并不重要。这是一个完全私有的谷歌租户，在我们的防火墙内，谷歌和公司都会进行渗透测试等工作。因此，我个人认为，除了纯粹的客户财务数据（如信用卡数据），在 GCP 租户上放置任何东西都不会有问题，但 Infosec 可能有不同的想法。因为如果这些数据被渗透，我们就会损失惨重。

Tegus Client Tegus 客户

Yes. So could you touch the other part of the stack?
是的，那么你能碰一下堆栈的另一部分吗？

Former Executive at PayPal
前 PayPal 高管

Yes, totally. So other part of the stack, this is our traditional RAG implementation. So all of the data gets scrapped and put on local there is a local filer, which receives all of the data that is going to be. So traditional RAG pattern, switching to that for a second, there is an indexing phase, and there is a retrieval phase and there is a generation phase. Three parts. So in the indexing phase, we scrape all the data that has to be indexed, it gets put on a filer. The data could be SharePoint.
是的，完全正确。堆栈的另一部分是我们传统的 RAG 实现。因此，所有数据都会被报废并放到本地，这里有一个本地文件器，它接收所有要处理的数据。因此，传统的 RAG 模式，转换一下，有一个索引阶段，有一个检索阶段，有一个生成阶段。三个部分。在编制索引阶段，我们会抓取所有需要编制索引的数据，并将其放到一个文件器中。数据可以是 SharePoint。

Tegus Client Tegus 客户

Is this usually a vector database?
这通常是矢量数据库吗？

Former Executive at PayPal
前 PayPal 高管

No. This is just a file location. Once it gets put on the file, then we run a batch job. The batch job chunks up this data into chunks, then chunks gets vectorized and stuck into a vector database. We experimented with a few vector databases, Milvus, VV8, Pinecone, a bunch of others, Postgres with pgvector and so on. AlloyDB, again, a version of Postgres with pgvector, Elasticsearch, which has also, I think, vector.
不，这只是一个文件位置。一旦数据存入文件，我们就会运行批处理作业。批处理作业会将这些数据分块，然后将分块矢量化并放入矢量数据库。我们试用了一些矢量数据库，如 Milvus、VV8、Pinecone 等，还有使用 pgvector 的 Postgres 等等。AlloyDB 也是带有 pgvector 的 Postgres 版本，Elasticsearch 我认为也有矢量功能。

Tegus Client Tegus 客户

Which one did you finally select for this use case?
您最终为该用例选择了哪一种？

Former Executive at PayPal
前 PayPal 高管

There are two that we have chosen. The one that we use currently is Milvus. We hosted on-prem. The other one is AlloyDB from Google, which is a Postgres with a pgvector, which is going to be the long-term sort of choice?
我们选择了两种。我们目前使用的是 Milvus。我们在内部托管。另一个是谷歌的 AlloyDB，这是一个带有 pgvector 的 Postgres。

Tegus Client Tegus 客户

So you'll replace Milvus with this?
所以你要用这个替换米尔沃斯？

Former Executive at PayPal
前 PayPal 高管

We'll move the indexes on to this at some future point because that is also still evolving in that space.
我们将在未来的某个时间把索引转移到这上面，因为这个领域还在不断发展。

Tegus Client Tegus 客户

So what is the reason for choosing Milvus, first of all?
那么，选择米尔沃斯的原因是什么呢？

Former Executive at PayPal
前 PayPal 高管

Milvus has, I think, a very mature set of indexes that are supported. We were looking for multiple indexing strategies to play with to make sure that we understood which one does it. Second, they have an architecture which is disaggregated compute and storage.
我认为，Milvus 拥有一套非常成熟的索引支持。我们一直在寻找多种索引策略，以确保我们了解哪种索引策略更有效。其次，他们的架构是分解计算和存储。

So you can add compute without adding storage and you can add storage without moving the compute. If you look at the architecture. So that was important. The world fairly mature pure play in terms of vector indices. This was not a database company which wants to add vector capabilities. This was something which was a vector database to begin with.
因此，你可以在不增加存储的情况下增加计算，也可以在不移动计算的情况下增加存储。如果你看一下架构。这一点很重要。在矢量索引方面，世界上有相当成熟的纯粹游戏。这不是一家想要增加矢量功能的数据库公司。它一开始就是一个矢量数据库。

Tegus Client Tegus 客户

And why are you choosing to move to AlloyDB now?
你为什么现在选择迁移到 AlloyDB？

Former Executive at PayPal
前 PayPal 高管

So the one challenge we have with Milvus is that ultimately for the RAG or retrieval augmentation or what have you, it's not the vector that gets used. It's the underlying content. So how to manage the two together if something is becoming something of a challenge. Let's say I have a text chunk, okay, and there is a vector corresponding to the text chunk and the text chunk changes for whatever reason.
因此，我们在使用 Milvus 时面临的一个挑战是，对于 RAG 或检索增强或其他功能而言，最终使用的并不是载体。而是基础内容。因此，如何将二者结合起来进行管理成为了一项挑战。比方说，我有一个文本块，好吧，有一个向量与文本块相对应，而文本块由于某种原因发生了变化。

We want to make sure that the integrated life cycle works. So the text chunk changes, the vector should also change accordingly. In Postgres or in a database manage thing, this is simpler. You could do a database trigger and so on. With Milvus, you always have to grab something on it. So I think longer term and integrated database, which also helps support some of the indices and some of the filtering patterns that we want, I think there is, of course, a whole other conversation we could do just around vector databases.
我们要确保综合生命周期能够正常工作。因此，文本块发生变化时，矢量也应随之变化。在 Postgres 或数据库管理工具中，这一点更为简单。你可以使用数据库触发器等。而在 Milvus 中，你总是要在上面抓取一些东西。因此，我认为从长远来看，集成数据库还有助于支持我们想要的一些索引和过滤模式，当然，我们还可以围绕矢量数据库展开其他讨论。

So moving on. This was the vector database. We chunk it, we store it in the vector database. This completes the indexing part. In the information retrieval part, we take the question, we generate a hypothetical answer, and we try to do a cosine similarity match on the database, on vector database. And we try to do a keyword match as well on the filer, which is then stored via Elasticsearch.
继续前进。这是矢量数据库。我们将其分块，然后存储到矢量数据库中。这就完成了索引部分。在信息检索部分，我们提出问题，生成假设答案，然后尝试在数据库、向量数据库中进行余弦相似度匹配。我们还尝试在文件器上进行关键词匹配，然后通过 Elasticsearch 进行存储。

So we have these two semantic similarity and keyword similarity based searches that gives us a set of documents. We rerank them. We are experimenting with some rerankers including Cohere Rerank. These reranks the chunks, relevant chunks, we take 15 of them, and we use system prompts the system prompts standard ones summarize these, try to answer this question.
因此，我们有这两种基于语义相似性和关键词相似性的搜索，为我们提供了一组文档。我们对它们进行重新排名。我们正在试用一些重排器，包括 Cohere Rerank。我们从中抽取 15 个相关语块，然后使用系统提示和标准系统提示总结这些语块，尝试回答这个问题。

I think step by step, chain-of-thought kind of but don't give too many details betters and so on. With all those techniques, we stick the question in there, we stick the chunks. We send it to Llama, Llama-2-13b, the large language model tries its best to answer the question. And this answer then gets propagated back through the guardrails, the Q&A widget.
我认为循序渐进、环环相扣，但不要给出太多细节，等等。有了所有这些技术，我们就可以把问题塞进去，把语块塞进去。我们将问题发送给 Llama，Llama-2-13b，大型语言模型会尽力回答问题。然后，这个答案会通过护栏、问答小工具传播回去。

Tegus Client Tegus 客户

I see. So I understood the whole flow. Quick question. So how do you fine-tune the model.
原来如此。所以我明白了整个流程。快速提问。你是如何对模型进行微调的？

Former Executive at PayPal
前 PayPal 高管

The Llama two model, we used we used, first of all, a technique called LoRA, Low Rank Adaptation. 13 billion parameter model for a full fine-tune would require at least 128 GB of memory as some of our calculations. I did this. So we didn't want to do a full fine-tuning.
我们使用的喇嘛二模型，首先使用了一种名为 LoRA 的技术，即低等级自适应。130亿个参数模型的全面微调至少需要128 GB的内存，而我们的一些计算就需要128 GB的内存。这是我做的。所以我们不想做全面微调。

We wanted to do a parameter efficient fine-tuning LoRA after based approach. So we used LoRA technique, and there is a framework called a Axolotl. It is a fine-tuning framework, doesn't do anything too special but it's a good way to set some of the defaults and so on managed their it comes with built-in configurations and so on.
我们希望在基于 LoRA 的方法之后进行参数高效微调。因此，我们使用了 LoRA 技术，还有一个名为 Axolotl 的框架。它是一个微调框架，不会做什么太特别的事情，但它是设置一些默认设置的好方法，等等。

Tegus Client Tegus 客户

Essentially, so the way I think about fine-tuning is that once you get an off-the-shelf model like Llama, then you are kind of retraining it with your data so that it has that knowledge of data that it was not trained on initially. But from what you are describing, it seems like you did more of a model optimization rather than fine-tuning the way I think about it. Is that right?
从本质上讲，我对微调的理解是，一旦你获得了像 Llama 这样的现成模型，你就可以用你的数据对其进行重新训练，这样它就能掌握最初没有训练过的数据知识。但从你的描述来看，你做的更多的是模型优化，而不是我认为的微调。是这样吗？

Former Executive at PayPal
前 PayPal 高管

That's one way to think about it, definitely. For full fine-tuning, it means you are actually changing the initial models weight. And you're impacting all of the weight. So if you are fine-tuning a 13 million parameter model, you're essentially changing 13 million weights. In the LoRA the Low Rank Adaptation weight, what you're doing is you are freezing all of the original model weights. And in the MLP layers, typically, you are adding adapters.
这无疑是一种思路。对于完全微调来说，这意味着你实际上在改变初始模型的重量。你会影响所有权重。因此，如果你要对一个 1300 万参数的模型进行微调，那么你实质上是在改变 1300 万个权重。在 LoRA（低等级自适应权重）中，你要做的是冻结所有原始模型权重。而在 MLP 层中，通常是添加适配器。

Tegus Client Tegus 客户

You're not changing the original weights here.
你在这里并没有改变原来的重量。

Former Executive at PayPal
前 PayPal 高管

You are not changing the original weights here. This is called a parameter efficient fine-tuning a technique for LLMs. This has become popular amongst almost everybody because full fine-tuning is very expensive.
在这里，您并没有改变原始权重。这被称为参数有效微调，是LLMs的一种技术。由于完全微调的成本非常高昂，因此这种方法几乎在每个人中都很流行。

Tegus Client Tegus 客户

But then like even though you are changing the parameters you are not training the model on addition data.
但这样一来，即使你改变了参数，你也没有在额外的数据上训练模型。

Former Executive at PayPal
前 PayPal 高管

You are adding an adapter so that within the inference layer, technically, the inference flows through both the original model and through your adapter.
您正在添加一个适配器，这样在推理层中，从技术上讲，推理会同时流经原始模型和您的适配器。

Tegus Client Tegus 客户

But what is the usefulness of this adapter?
但这种适配器有什么用呢？

Former Executive at PayPal
前 PayPal 高管

The usefulness of this adapter is that the original model might be behaving in a certain way and you want to tweak it to behave in a certain different way. The challenge was that model was no matter how good of a prompt we tried, the model kept answering into wordie of a tone.
这种适配器的用处在于，原始模型可能有某种行为方式，而你想对其进行调整，使其有某种不同的行为方式。该模型面临的挑战是，无论我们尝试了多么好的提示，该模型都一直以 wordie 的语气回答。

The answers were good, but they were too wordie. So what we wanted to do was we wanted to give it examples of how to answer certain questions but we didn't want to spend a full fine-tuning cycle with large memory exposure, and we wanted quick feedback. So in order to do this, the best way is something called LoRA.
答案是不错，但太啰嗦了。所以我们想做的是，我们想给它一些如何回答某些问题的例子，但我们又不想花一个完整的微调周期来占用大量内存，而且我们希望得到快速反馈。因此，要做到这一点，最好的办法就是使用 LoRA。

Tegus Client Tegus 客户

Can you also like would a good substitute for this be also multi-shot prompting or something like that, like you can pass the prompt and be like, can you summarize in fewer words or something.
你能不能也用多镜头提示或类似的东西来代替这个，比如你可以通过提示，然后说，你能不能用更少的词或其他东西来总结一下。

Former Executive at PayPal
前 PayPal 高管

We did. We in fact, do multi-shot. You're talking about a prompt engineering technique. We do multishop. We do chain of thought, we do be brief and so on. So we tried all those. In context learning, we tried. And none of them got us to where we want it to be, which is a certain style of answering, which people felt comfortable with.
我们做了。事实上，我们还进行了多重拍摄。你说的是及时工程技术。我们做的是 "多镜头"。我们做思维链，我们做简短等等。这些我们都试过了。在情境学习中，我们都试过了。但没有一个能达到我们想要的效果，也就是让人们感到舒适的某种答题风格。

Tegus Client Tegus 客户

Got it. And this is a fine-tuning step that you have to keep doing at regular intervals? Or is it one-and-done kind of a deal?
明白了。这是你必须定期进行的微调步骤？还是一劳永逸？

Former Executive at PayPal
前 PayPal 高管

No, this was one-and-done, though the one time that we did it, we did three epochs of training. We wanted to get to a certain loss. We used a technique called DPO, Direct Preference Optimization using the fine-tuning. We quantize the original model and the layer that was being fine-tuned, got unfrozen.
不，这是一劳永逸的，虽然我们做的那一次，我们做了三个时代的训练。我们希望达到一定的损失。我们使用了一种名为 DPO 的技术，即使用微调的直接偏好优化。我们对原始模型进行量化，然后对正在微调的层进行解冻。

We calculated the loss, and we went through a DPO technique. We went for three epochs and after every epoch we tested outputs. And after the third epoch I think things were the tone was what we wanted.
我们计算了损失，并采用了 DPO 技术。我们用了三个纪元，每个纪元后我们都测试了输出。第三个纪元后，我认为音调达到了我们的要求。

Tegus Client Tegus 客户

Good. And a basic question, that I think I missed earlier, was why did you choose the Llama model for this use case?
很好。还有一个基本的问题，我想我之前漏掉了，就是你为什么要为这个用例选择 Llama 模型？

Former Executive at PayPal
前 PayPal 高管

Three criteria. One was we didn't want to use GPT since this was the first one. We didn't want to use GPT because that would mean the data would leave. It would go to a third-party. So that was one. Then we were working with Microsoft to make sure that they wouldn't use any of the data for training and they wouldn't store any of the data. And one of the hedges there was in order to ensure that they didn't fall foul of like toxic or bias or whatever, there were still some questions around whether they would store some of it for 30 days or not.
有三个标准。一个是我们不想使用 GPT，因为这是第一个。我们不想使用 GPT，因为这意味着数据会离开。数据会流向第三方。这是其一。然后，我们与微软合作，确保他们不会将任何数据用于培训，也不会存储任何数据。为了确保他们不会触犯有毒或有偏见之类的法律，其中一个对冲措施是，他们是否会将部分数据存储30天，这仍然存在一些问题。

Tegus Client Tegus 客户

Understood. But that is more like why not GPT. I'm asking like why Llama, because there are other open source models that you could have used as well.
明白了。但这更像是为什么不用 GPT。我问的是 "为什么是 Llama"，因为你也可以使用其他开源模型。

Former Executive at PayPal
前 PayPal 高管

Right. So we tried with Mistral, which was there at that time. We tried, I forget which other one. But the Llama was proving to be the best model for this. We conducted some performance steps and Llama was proving to be the best one.
没错。所以我们试过米斯特拉尔，当时就在那里。我们还试过其他的，我忘了是哪一种。但 Llama 被证明是最好的模型。我们进行了一些性能测试，Llama 被证明是最好的。

Tegus Client Tegus 客户

But do you plan to update into other models in the future.
但你们是否计划将来更新为其他型号。

Former Executive at PayPal
前 PayPal 高管

We are planning to update it to Llama three, so same family, but next version.
我们正计划将其更新为 Llama three，所以是同一个家族，但却是下一个版本。

Tegus Client Tegus 客户

So would you have to do fine-tuning and all again.
那么，你是否需要再次进行微调和所有工作。

Former Executive at PayPal
前 PayPal 高管

That depends on whether the Llama three is terse enough to satisfy our tone our requirements. So we will have to actually conduct the valuation and which is one of the major headaches that we are facing. I think I talked about most of the stack. There is a Flask app, which is like the app which makes the requests into the LLM, the LLM itself is part of a generative AI platform that we have. I talked about the vector database. We use something called as LlamaIndex to do the vectorization pipeline to vectorize data from user questions and so on.
这取决于 "Llama three "是否足够简洁，以满足我们的要求。因此，我们必须进行实际评估，这也是我们面临的主要问题之一。我想我已经谈到了大部分的堆栈。我们有一个 Flask 应用程序，它就像是向 LLM 提出请求的应用程序，而 LLM 本身就是我们所拥有的生成式人工智能平台的一部分。我刚才谈到了向量数据库。我们使用名为 LlamaIndex 的东西来完成矢量化管道，以矢量化来自用户问题等的数据。

Tegus Client Tegus 客户

So a quick question there. So LlamaIndex is kind of a orchestrator.
我有一个简单的问题。LlamaIndex 是一种协调器。

Former Executive at PayPal
前 PayPal 高管

Yes. 是的。

Tegus Client Tegus 客户

Is it just the data pipeline orchestrator or does it orchestrate the entire workflow?
它只是数据管道协调器，还是协调整个工作流程？

Former Executive at PayPal
前 PayPal 高管

It orchestrates most of the workflow until it hits the Flask app.
它负责协调大部分工作流程，直至其进入 Flask 应用程序。

Tegus Client Tegus 客户

So there are other things as well in the market like Langchain. So why you chose LlamaIndex over Langchain?
因此，市场上也有像 Langchain 这样的东西。那么，您为什么选择 LlamaIndex 而不是 Langchain 呢？

Former Executive at PayPal
前 PayPal 高管

We did evaluate it. I personally led that LamaIndex, Langchain, Haystack was the third one, and these were the three main ones. Personally, I think on my team, we found that the chaining concept was a little bit more complex than it needed to be for this use case. Which is why we didn't go with Langchain.
我们确实进行了评估。我个人认为，LamaIndex、Langchain、Haystack 是第三个，这三个是主要的。就我个人而言，我认为在我的团队中，我们发现链的概念比这个用例所需的要复杂一些。这就是我们没有采用 Langchain 的原因。

LamaIndex had a lot of utilities to, for example, different chunking strategies. This is something that we wanted to experiment with. There was like character-based chunking there is semantic chunking.
LamaIndex 有很多实用工具，例如不同的分块策略。这是我们想要尝试的。有基于字符的分块，也有基于语义的分块。

Tegus Client Tegus 客户

So LamaIndex didn't have all those complex features?
那么，LamaIndex 并没有这些复杂的功能？

Former Executive at PayPal
前 PayPal 高管

It did at that time. I think they have started adding some of them, but with LamaIndex, it was essentially out of the box.
当时确实是这样。我想他们已经开始添加其中的一些功能，但 LamaIndex 基本上是 "开箱即用"。

Tegus Client Tegus 客户

I see. And so coming back to the Flask app, is Flask app just like an endpoint to connect the entire workflow to the final application user application?
我明白了。那么，回到 Flask 应用程序，Flask 应用程序是否只是将整个工作流程连接到最终用户应用程序的一个端点？

Former Executive at PayPal
前 PayPal 高管

Yes, exactly. It makes a request to the LLM, the responses then get sent back to the original app.
是的，没错。它向 LLM 提出请求，然后将响应发送回原始应用程序。

Tegus Client Tegus 客户

I see. And then at the top, we also think of one other layer called the services layer. So did you use any kind of GSI technology vendors for building this or helping you with this with this use case.
我明白了。然后在顶层，我们还想到了另一层，叫做服务层。那么，你是否使用了任何 GSI 技术供应商来构建或帮助你完成这个用例。

Former Executive at PayPal
前 PayPal 高管

No. This was built in-house.
不，这是内部建造的。

Tegus Client Tegus 客户

So entire in-house and no service vendors involved?
因此，全部由内部人员负责，不涉及服务供应商？

Former Executive at PayPal
前 PayPal 高管

No service vendors. 没有服务供应商。

Tegus Client Tegus 客户

No Accenture's or like Deloitte or.
没有埃森哲公司，也没有德勤公司。

Former Executive at PayPal
前 PayPal 高管

These are capabilities we wanted to build in-house even the actual capabilities themselves. So the vendors were not involved in this project.
这些都是我们希望在内部建立的能力，甚至是实际能力本身。因此，供应商没有参与这个项目。

Tegus Client Tegus 客户

Got it. So I understand the architecture, can you quickly talk about some of the challenges you faced in this.
明白了。那么，我了解了架构，你能快速谈谈你在其中面临的一些挑战吗？

Former Executive at PayPal
前 PayPal 高管

I think the biggest challenge, which still continues is such an evaluation of the outputs. How do you, for example, if we make some changes, how do you actually evaluate that the changes that you made were beneficial?
我认为最大的挑战仍然是对成果的评估。比如说，如果我们做了一些改变，你如何评估你所做的改变是否有益？

Tegus Client Tegus 客户

So how do you do it now?
那么，现在该怎么做呢？

Former Executive at PayPal
前 PayPal 高管

Right now, the approach we use is we have a Golden Dataset. These are about 5,000 question-answer pairs. And we will check the accuracy of anything that's coming out of the model compare them with the answer suggested and come out with precision and recall scores. We use a framework called Ragas. And we have coded up some eval scripts, which fires question, colors the answer and then gives us a metric of how relevant was the answer.
现在，我们使用的方法是黄金数据集。这些数据集约有 5000 对问题和答案。我们将检查模型中任何内容的准确性，并将其与建议答案进行比较，得出精确度和召回分数。我们使用一个名为 Ragas 的框架。我们还编写了一些评估脚本，这些脚本会触发问题，为答案着色，然后给出答案的相关性指标。

Tegus Client Tegus 客户

So this is, again, not an LLM model. This is some of the machine learning model?
因此，这同样不是一个 LLM 模型。这是某种机器学习模型？

Former Executive at PayPal
前 PayPal 高管

Yes, correct. 是的，没错。

Tegus Client Tegus 客户

So maybe if I had to ask you how did you solve the challenge that you're talking about? Would this be the answer then Golden Dataset using this?
那么，如果我问你，你是如何解决你所说的难题的？那么，黄金数据集会是这样的答案吗？

Former Executive at PayPal
前 PayPal 高管

Yes. This was one Golden Dataset. However, generating this Golden Dataset is enormously difficult. Because you have to have humans who are parallels what you were who go through your thing and say, okay, here is the dataset. The dataset needs to be relevant to the task. So if there is a different task, for example, compliance use case summarization that I talked about this won't work. This Golden Dataset thing. We have to create a different Golden Dataset for that. For customer service, we have to create a third different data set for that.
是的，这是一个黄金数据集。然而，生成这个黄金数据集是非常困难的。因为你必须拥有与你相似的人类，他们通过你的工作，然后说，好吧，这就是数据集。数据集必须与任务相关。因此，如果有不同的任务，例如我谈到的合规用例总结，这就行不通了。这个黄金数据集的事情。我们必须为此创建一个不同的黄金数据集。对于客户服务，我们必须创建第三个不同的数据集。

Tegus Client Tegus 客户

As you were building this, was there any technological challenge that you face that you are fully able to solve.
在建造过程中，你们是否遇到过完全能够解决的技术挑战。

Former Executive at PayPal
前 PayPal 高管

One was we wanted a simple way to annotate and make sure that the answers were reasonable. So for this, we used a homegrown framework to do the evaluations. And right now, we are evaluating two vendors. One is called Arise, I think they are an observability, that was the second challenge. Observability is a big challenge. I think Arise is another framework that we're using to do that. The third thing is we are using a vendor called Snorkel AI to do some of the data manipulations.
其中一个原因是，我们需要一种简单的方法来注释并确保答案是合理的。因此，我们使用了一个自主开发的框架来进行评估。现在，我们正在评估两家供应商。其中一家名为 Arise，我认为他们是一家可观察性公司，这是第二个挑战。可观察性是一个很大的挑战。我认为 Arise 是我们正在使用的另一个框架。第三件事是我们正在使用一家名为 Snorkel AI 的供应商来做一些数据处理。

Tegus Client Tegus 客户

So this challenge is more around data quality and then how you unlock that?
因此，这一挑战更多是围绕数据质量以及如何解锁数据质量？

Former Executive at PayPal
前 PayPal 高管

Exactly. The second challenge was observability. Any production system, it has multiple moving parts. And how do you, for example, observe the vector database, whether that's functioning correctly. If you have a LLM, how do you make sure that the latency is good, the LLM itself is behaving properly because large language models, monitoring them is another challenge.
没错。第二个挑战是可观察性。任何生产系统都有多个活动部件。例如，如何观察矢量数据库是否正常运行。如果您有一个LLM，您如何确保其延迟良好，LLM本身运行正常，因为大型语言模型的监控是另一个挑战。

System observability is something which is, I think, still fairly unsolved. We use Datadog and some of these other monitoring software, but LLM monitoring itself is, I think, a fairly big challenge.
我认为，系统的可观察性仍然是一个尚未解决的问题。我们使用 Datadog 和其他一些监控软件，但我认为LLM监控本身就是一个相当大的挑战。

Tegus Client Tegus 客户

But there's no solution to it yet, you're saying?
但你是说还没有解决办法？

Former Executive at PayPal
前 PayPal 高管

There is no solution to it. There are many vendors which are trying to pitch their services, but I don't think we have ever found a good one yet, at least now.
没有解决办法。有很多供应商都在努力推销他们的服务，但我认为我们还没有找到一个好的供应商，至少现在还没有。

Tegus Client Tegus 客户

And then did you have any trouble down like just on the Business side of things, did you have trouble around having the right talent to build these workflows.
然后，你们在业务方面是否遇到困难，是否在拥有合适的人才来建立这些工作流程方面遇到困难。

Former Executive at PayPal
前 PayPal 高管

Talent was so a huge challenge still continues to be a challenge because the number of people who understand things like vector databases deeply or even large language models or fine-tuning is fairly limited. So that continues to be a challenge today. However, once it's set up, you don't need a lot of talent there, but the selection process itself and setting it up is where the challenge comes in.
人才曾经是一个巨大的挑战，现在仍然是一个挑战，因为深入了解向量数据库或大型语言模型或微调等方面的人才数量相当有限。因此，这在今天仍然是一个挑战。不过，一旦建立起来，你并不需要很多人才，但选拔过程本身和建立过程才是挑战所在。

Tegus Client Tegus 客户

Was it just like hiring staff to solve these challenges then?
当时解决这些难题就像招聘员工一样吗？

Former Executive at PayPal
前 PayPal 高管

We didn't hire. We trained people, so there was no GenAI specific hiring other than, I think, one or two things.
我们没有招聘。我们培训人员，所以除了一两件事之外，没有专门针对 GenAI 的招聘。

Tegus Client Tegus 客户

If we quickly jump into some of the factors, some of the success factors that you would want to attribute to things, that you would want to attribute to this use case, moving to production successfully.
如果我们快速跳转到一些因素，一些你想归因于事情的成功因素，你想归因于这个用例，成功地转移到生产中。

Former Executive at PayPal
前 PayPal 高管

The success factors where I think the quality of the answers, the accuracy was very good. We managed to hit something like 78% or 80% almost. So four out of five questions would be answered to a pretty satisfactory level, which was, frankly, more than what anybody was expecting from the first use case. The second was, the Llama models are surprisingly performant. Four A100s is not a huge investment, and it's able to sustain a fairly high load.
我认为成功的因素在于答案的质量和准确性。我们几乎达到了 78% 或 80%。因此，五个问题中有四个都能得到相当令人满意的回答，坦率地说，这超出了大家对第一个用例的预期。其次，Llama 型号的性能令人惊讶。四台 A100 的投资并不大，却能承受相当高的负载。

The third one was that the entire pipeline from the vector databases to the language model interface and so on. Once we had this up and running, hooking it up into the final, getting it into the customer hand, the customer being the employee was actually because we had an existing application, which showed links this bot just came in at the top of that. So this was not a big lift.
第三个是整个管道，从向量数据库到语言模型接口等等。一旦我们启动并运行了这个系统，将其连接到最终系统中，让它进入客户手中，客户就是员工，这实际上是因为我们有一个现有的应用程序，它显示了链接，而这个机器人就在这个应用程序的顶端。所以这并不是一个大的提升。

Tegus Client Tegus 客户

You mean adding the LLM was not a big lift?
你的意思是添加 LLM 并不是一个很大的提升？

Former Executive at PayPal
前 PayPal 高管

Correct. 正确。

Tegus Client Tegus 客户

And also like did you feel that since it's a RAG architecture, which is pretty much standard across industry now. Did that offer you some kind of.
还有，你是否感觉到，由于它是一个 RAG 架构，现在几乎是整个行业的标准。这是否为你提供了某种

Former Executive at PayPal
前 PayPal 高管

Yes. That was another thing because hallucinations didn't prove to be a huge challenge surprisingly. That was one of the things which was keeping me up at night. But surprisingly, through the eval, we didn't find that hallucinations were a big challenge with a good RAG pattern.
是的，这是另一件事，因为幻觉竟然不是一个巨大的挑战。这也是让我彻夜难眠的原因之一。但令人惊讶的是，通过评估，我们发现幻觉并不是一个具有良好 RAG 模式的巨大挑战。

Tegus Client Tegus 客户

Yes. Got it. So quickly, if we can move to the future outlook section. So we can discuss some of the use cases that you plan to do in future and upgrade, but I was very curious about how you envision your current architecture evolving in the next three to five years. And I would just probably go through the questions that I sent to you. So as you build more and more use cases, do you think if that to be mostly GCP private tenant or on-prem?
是的明白了很快，我们可以进入未来展望部分。我们可以讨论一下你们计划在未来做的一些用例和升级，但我很想知道，在未来三到五年内，你们目前的架构是如何发展的。我大概会把我发给你的问题说一遍。那么，随着你们构建越来越多的用例，你们认为主要是 GCP 私有租户还是内部部署？

Former Executive at PayPal
前 PayPal 高管

I think it will probably be a mix of the two. For some use cases, we still prefer internal, but there is much more acceptance that GCP might be a valid way to go about it, especially because now we are able to get GPUs fairly easily.
我认为这两者可能会混合使用。对于某些用例，我们仍然更倾向于内部，但更多的人接受 GCP 可能是一种有效的方式，尤其是因为现在我们可以很容易地获得 GPU。

Tegus Client Tegus 客户

GPUs from Google, like GPUs in the Google workspace or more like yourself?
谷歌的 GPU，比如谷歌工作区的 GPU，还是更像你自己？

Former Executive at PayPal
前 PayPal 高管

Yes. Otherwise, buying GPUs is a three-month process.
是的。否则，购买 GPU 需要三个月的时间。

Tegus Client Tegus 客户

Yes. Got it. And then for across use cases, do you envision a single model that you think that would be in the minority use cases?
是的明白了。那么对于不同的用例，你是否设想过一种单一的模式，你认为这种模式会出现在少数用例中呢？

Former Executive at PayPal
前 PayPal 高管

No. I think we will continue to be multi-model, open source and vendor-hosted models are there.
不，我认为我们将继续采用多种模式，开源模式和供应商托管模式都存在。

Tegus Client Tegus 客户

But open source models, you're not hosting with the vendor. For this use case, you hosted yourself on-prem.
但开源模式下，你并不托管给供应商。就本使用案例而言，您是在内部自行托管。

Former Executive at PayPal
前 PayPal 高管

Right. And that will continue for some use cases that will continue, especially where there is high sensitivity to information leaving even our data centers to GCP. For those, we will continue to host open source models.
没错。对于某些使用案例，这种情况将继续存在，尤其是在信息敏感度较高的情况下，甚至我们的数据中心也会离开 GCP。对于这些情况，我们将继续托管开源模型。

Tegus Client Tegus 客户

But coming back to the multi-model point, but for a single use case, do you envision using multiple model?
但回到多模型这一点，对于单一用例，您是否设想使用多个模型？

Former Executive at PayPal
前 PayPal 高管

I think that's a little too complex. For a single use case, we'll probably stick with one LLM.
我认为这有点过于复杂。对于单个用例，我们可能会坚持使用一个 LLM。

Tegus Client Tegus 客户

Like we have heard enterprise is using like building a manual logic to kind of route some queries to a small model and some queries, which require more complex reasoning to bigger model.
就像我们听说的，企业正在使用人工逻辑，将一些查询路由到一个小模型，而将一些需要更复杂推理的查询路由到更大的模型。

Former Executive at PayPal
前 PayPal 高管

We are exploring that through multiple techniques. One is quantization, which we didn't get into, but there are quantized versions of models, there are distilled versions of models for use cases, and there is a gateway, which does routing based on how complex the scenario is.
我们正在通过多种技术进行探索。一种是量化技术，我们没有深入探讨，但模型有量化版本，用例模型有提炼版本，还有一种是网关技术，它可以根据场景的复杂程度进行路由选择。

We are exploring those kinds of use cases, but we have found that they have different latency replies. So that caused us a little bit of a challenge to put that thing in production, and we have to worry about how good the gateway is. So my preference is to go with one model for one use case.
我们正在探索这些类型的用例，但我们发现它们有不同的延迟回复。因此，这给我们的生产带来了一些挑战，我们不得不担心网关的性能。因此，我倾向于针对一种用例使用一种模式。

Tegus Client Tegus 客户

I see. And then do you envision a multi or hybrid cloud. Sorry, in your case probably not multi-cloud, but do you envision hybrid cloud kind of orchestration in a single use case.
我明白了。然后，你是否设想了多云或混合云。对不起，您的情况可能不是多云，但您是否设想在单一用例中进行混合云协调。

Former Executive at PayPal
前 PayPal 高管

My preference is not to do that, but there are scenarios where that will become necessary just because I think the on-prem might not be sufficient to fulfill all the requests, which is where the hybrid cloud scenario does come in, especially in batch use cases, we are seeing that.
我倾向于不这样做，但在某些情况下，这样做是必要的，因为我认为内部部署可能不足以满足所有要求，这就是混合云方案的用武之地，尤其是在批量使用案例中，我们看到了这一点。

Tegus Client Tegus 客户

But like even on-prem can't handle these use cases, why not just put them entirely on the cloud, why still stick with on-prem?
但是，就像内部部署也无法处理这些用例一样，为什么不把它们完全放在云上，为什么还要坚持使用内部部署呢？

Former Executive at PayPal
前 PayPal 高管

Fair point. The reason is cost typically. GPUs are expensive. And if we do have on-prem GPUs, you would like to use them as much as possible and not completely rely on cloud, even though that makes it simpler, but the cost is high. So we are keeping our Y angle in mind.
有道理。原因通常是成本。GPU 价格昂贵。如果我们有内部 GPU，你会希望尽可能多地使用它们，而不是完全依赖云，尽管这样做更简单，但成本很高。因此，我们会从 Y 角度考虑问题。

Tegus Client Tegus 客户

Got it. And when you said that your preference is to not have hybrid cloud kind of an architecture. Why is that not the best.
明白了你说你倾向于不采用混合云架构。为什么不是最好的？

Former Executive at PayPal
前 PayPal 高管

It makes the deployment and upgrading a little bit complex because you have to worry about multiple infras now debugging, for example, becomes complex. Latency guarantees become complex with a hybrid deployment. If you have to upgrade something that becomes complex.
这使得部署和升级变得有点复杂，因为你必须担心多个基础架构，例如，现在调试变得复杂。混合部署的延迟保证也变得复杂。如果需要升级，也会变得复杂。

Tegus Client Tegus 客户

Usually with hybrid, you have this consistent platform or consistent kind of software across on driven cloud, so I don't have to worry about managing those infrastructures independently.
通常情况下，在混合云中，你可以在驱动云上拥有一致的平台或一致的软件，因此我不必担心独立管理这些基础设施。

Former Executive at PayPal
前 PayPal 高管

That's what they claim, but I think for an engineer who actually works on this, the devil is in the details.
他们是这么说的，但我认为，对于真正从事这方面工作的工程师来说，魔鬼就在细节中。

Tegus Client Tegus 客户

So if you could talk about some of the other more involved generative AI use cases that your Business is trying to exclude in the next three years or so?
那么，能否请您谈谈贵公司在未来三年左右的时间里试图排除的其他一些涉及面更广的生成式人工智能用例？

Former Executive at PayPal
前 PayPal 高管

Yes. I think there are lots of other use cases which are being explored. Some of them are around coding assistance, we are planning to roll out coding assistance to a large majority of the engineering folks. We are evaluating lots of code assistance. Copilot, CodeWhisperer, Google's offerings, Cody and Duet, Sourcegraph, I think, has another thing.
是的，我认为还有很多其他用例正在探索中。其中一些是围绕编码辅助的，我们正计划向大部分工程人员推出编码辅助。我们正在评估许多代码辅助工具。Copilot, CodeWhisperer, Google's offerings, Cody and Duet, Sourcegraph, I think, has another thing.

So we are evaluating a lot of these for internal developer adoption. They have been pretty popular in some aspects. So we are conducting studies to make sure that they get adopted. Then for marketing literature, I think they are looking at image models as well in order to quickly generate marketing copies.
因此，我们正在评估许多供内部开发人员采用的产品。它们在某些方面已经相当流行。因此，我们正在进行研究，以确保它们被采用。至于营销文献，我认为他们也在研究图像模型，以便快速生成营销副本。

Tegus Client Tegus 客户

These are kind of examples of use cases that we can do with current capabilities out in the market. My question is more around like are there any core Business critical use cases as PayPal that, PayPal is currently exploring using GenAI for. For example, in transactions, fraud detection and so on. So that those kind of uses cases there any scope?
这些都是我们可以利用市场上的现有功能完成的用例。我的问题更多是围绕 PayPal 目前正在探索使用 GenAI 的核心关键业务用例。例如，在交易、欺诈检测等方面。因此，这些使用案例是否有任何范围？

Former Executive at PayPal
前 PayPal 高管

For the fraud and risk use cases. There is a red teaming effort that's going on where we are trying to see if large language models are able to generate attack patterns and how we can mitigate them and so on. But for core fraud and risk right now, the perception is, it is still a little bit too early in the game.
对于欺诈和风险用例。我们正在开展一项红队工作，试图研究大型语言模型是否能够生成攻击模式，以及如何减轻攻击等。但就核心欺诈和风险而言，目前的看法是，这还为时过早。

Tegus Client Tegus 客户

Got it. But like is PayPal spending or investing enough in that effort of red teaming?
明白了。但是，PayPal 在红队工作上的花费或投资是否足够呢？

Former Executive at PayPal
前 PayPal 高管

Yes. We constantly do a lot of red teaming to make sure that the risk surface is manageable because every successful risk attack is directly bottom line impact.
是的。我们一直在进行大量的红队工作，以确保风险面是可控的，因为每一次成功的风险攻击都会直接影响底线。

Tegus Client Tegus 客户

Got it. So then coming to your original list, you had coding assistance, you had marketing use cases. Any other kind of use cases in the next.
明白了。那么，回到你最初的清单，你有编码协助，你有营销用例。接下来还有其他类型的用例吗。

Former Executive at PayPal
前 PayPal 高管

There is a lot of agenting use cases that are being studied. Automatically multiple agents based on different scenarios get active. So the compliance, customer service scenarios where automation based on top of a customer complaint, for example, is being explored. A customer might say, I have a problem accessing this payment and then the LLM generates a plan for how that should be handled.
目前正在研究大量的代理用例。根据不同的场景，多个代理会自动活跃起来。例如，正在探索基于客户投诉的自动化合规性和客户服务场景。客户可能会说，我在访问该付款时遇到了问题，然后 LLM 会生成一个计划，说明应该如何处理。

You might want to go and check into the SAP system or you might want to go and check into some other sort of ERP systems, see where the problem is, that might generate a summary which is then sent to some other approval system and so on. We are looking at agenting that is, I think, the next big thing for the next year or so. The challenges, some of those plans are not fully reliable. So that is still not very solid.
你可能想去 SAP 系统检查，也可能想去其他ERP 系统检查，看看问题出在哪里，这可能会生成一个摘要，然后发送到其他审批系统等等。我们正在研究代理，我认为这是未来一年左右的大事。面临的挑战是，其中一些计划并不完全可靠。因此，这一点仍不十分可靠。

Tegus Client Tegus 客户

Great. Well, thanks for your time and insight. Have a wonderful rest of your day.
太好了。感谢您抽出宝贵的时间，发表真知灼见。祝您度过愉快的一天。

Tegus is not a registered investment advisor or broker-dealer, and is not licensed nor qualified to provide investment advice. The information published in this transcript (“Content”) is for information purposes only and should not be used as the sole basis for making any investment decision. Tegus, Inc. (“Tegus”) makes no representations and accepts no liability for the Content or for any errors, omissions, or inaccuracies will in no way be held liable for any potential or actual violations of United States laws, including without limitation any securities laws, based on Information sent to you by Tegus. The views of the expert expressed in the Content are those of the expert and they are not endorsed by, nor do they represent the opinion of, Tegus. Tegus reserves all copyright, intellectual and other property rights in the Content. The Content is protected by the Copyright Laws of the United States and may not be copied, reproduced, sold, published, modified or exploited in any way without the express written consent of Tegus.
Tegus 不是注册投资顾问或经纪交易商，没有提供投资建议的许可或资格。本报告中发布的信息（"内容"）仅供参考，不应作为做出任何投资决策的唯一依据。Tegus, Inc.（以下简称 "Tegus"）对内容或任何错误、遗漏或不准确之处不作任何陈述，也不承担任何责任，对根据 Tegus 发送给您的信息可能或实际违反美国法律（包括但不限于任何证券法）的行为不承担任何责任。专家在内容中表达的观点仅代表专家本人，并不代表Tegus的观点，也未得到Tegus的认可。Tegus保留内容的所有版权、知识产权和其他财产权。内容受美国版权法保护，未经特格斯明确书面同意，不得以任何方式复制、翻印、出售、出版、修改或利用。