这是用户在 2024-8-31 16:52 为 https://www.pinecone.io/learn/search-with-pinecone/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Announcement 公告Pinecone serverless is now generally available on Microsoft Azure and Google CloudGet started
Pinecone 无服务器现已在 Microsoft Azure 和 Google Cloud 上普遍可用。开始使用

Make your search app get it.
让您的搜索应用获取它。

We expect the search applications we interact with to provide us with relevant information and answer the questions we have. We expect this process to be fast, easy, and accurate. In fact, 71% of consumers expect personalized results, and 76% get frustrated when they don’t find it. 1
我们期望与之互动的搜索应用能够为我们提供相关信息并回答我们的问题。我们希望这个过程快速、简单且准确。事实上,71%的消费者期望个性化的结果,而 76%的人在找不到时会感到沮丧。1

When searching with a keyword-based solution, applications will look for exact word or string matches. If your users aren’t sure what exactly to search for, searching by keywords won’t always get them the right answer. It can also be time-consuming for them to find the information they need, especially when there’s a large amount of unstructured data to search through. As a result, they won’t find the answers they’re looking for or they’ll get incomplete answers.
当使用基于关键字的解决方案进行搜索时,应用程序将查找确切的单词或字符串匹配。如果您的用户不确定具体要搜索什么,基于关键字的搜索并不总能给他们正确的答案。尤其是在需要搜索大量非结构化数据时,找到所需信息可能会耗费他们大量时间。因此,他们可能找不到所需的答案,或者得到的答案不完整。

Turning to AI for better search results and user experience
转向人工智能以获得更好的搜索结果和用户体验

Unlike keyword-based search, semantic search uses the meaning of the search query. It finds relevant results even if they don’t exactly match the query. This works by combining the power of Large Language Models (LLMs) to generate vector embeddings with the long-term memory of a vector database.
与基于关键词的搜索不同,语义搜索使用搜索查询的含义。即使结果与查询不完全匹配,它也能找到相关结果。这是通过结合大型语言模型(LLMs)生成向量嵌入的能力与向量数据库的长期记忆来实现的。

Once the embeddings are stored inside a vector database like Pinecone, they can be searched by semantic similarity to power applications for a variety of use cases.
一旦嵌入存储在像 Pinecone 这样的向量数据库中,就可以通过语义相似性进行搜索,以支持各种用例的应用程序。

Semantic search diagram

Semantic search use cases
语义搜索用例

  • Knowledge management: Save time and boost productivity for your internal teams by enabling them to self-serve and search through various internal data and documents to answer their questions. With semantic search, they can more quickly find what they are looking for. For example, a leading telecom company uses Pinecone to enable their customer service teams to search through their internal knowledge base to respond to customer inquiries quicker and with more accuracy.
    知识管理:通过使内部团队能够自助服务并搜索各种内部数据和文档以回答他们的问题,节省时间并提高生产力。借助语义搜索,他们可以更快地找到所需的信息。例如,一家领先的电信公司使用 Pinecone,使其客户服务团队能够搜索内部知识库,以更快且更准确地回应客户咨询。
  • End-user applications: Gain a competitive advantage by building and providing a solution to increase relevance of search results for end-users. With semantic search, users will be able to increase productivity by finding answers to their questions faster. For example, a global ERP software company uses Pinecone to let their customers get insights from employee feedback using semantic search.
    最终用户应用程序:通过构建和提供解决方案来提高搜索结果对最终用户的相关性,从而获得竞争优势。通过语义搜索,用户将能够更快地找到问题的答案,从而提高生产力。例如,一家全球 ERP 软件公司使用 Pinecone,让他们的客户通过语义搜索从员工反馈中获取洞察。
  • Aggregated data services: Enable your end-users to make more informed, data-driven decisions by compiling various data sources and identifying valuable insights using semantic search. For example, an online-learning company uses Pinecone to power their core search and question-answering feature for millions of users.
    聚合数据服务:通过汇编各种数据源并使用语义搜索识别有价值的见解,使您的最终用户能够做出更明智、以数据为驱动的决策。例如,一家在线学习公司使用 Pinecone 为数百万用户提供核心搜索和问答功能。

Companies are increasingly turning to AI to power their search applications, but self-managing the complex infrastructure — from the vector database to the LLMs needed to generate embeddings — can lead to challenges.
公司越来越多地依赖人工智能来推动其搜索应用,但自我管理复杂的基础设施——从向量数据库到生成嵌入所需的LLMs——可能会带来挑战。

Without the necessary infrastructure or dedicated ML engineering and data science teams, companies self-hosting vector databases to power semantic search applications can face:
没有必要的基础设施或专门的机器学习工程和数据科学团队,自托管向量数据库以支持语义搜索应用的公司可能面临:

  • High query latencies: Storing and searching through large numbers of embeddings on traditional databases is prohibitively slow or expensive.
    高查询延迟:在传统数据库中存储和搜索大量嵌入是极其缓慢或昂贵的。
  • Less relevant results: Answers can be improved by fine-tuning the LLM, but that requires data science expertise and ML engineering. Search results may be less accurate without the necessary AI and ML resources.
    相关性较低的结果:通过微调LLM可以改善答案,但这需要数据科学专业知识和机器学习工程。没有必要的人工智能和机器学习资源,搜索结果可能不够准确。
  • Capacity and freshness tradeoffs: While it’s important for a search solution to provide the most up to date information, running frequent batch jobs to maintain a fresh index leads to high compute and storage costs.
    容量和新鲜度权衡:虽然搜索解决方案提供最新信息很重要,但频繁运行批处理作业以维护新鲜索引会导致高计算和存储成本。

Search like you mean it with Pinecone
用 Pinecone 认真搜索

Providing fast, fresh, and filtered results, Pinecone customers don’t need to make tradeoffs between performance, scale, and query speed. Pinecone is a fully-managed vector database trusted by some of the world’s largest enterprises. We provide the necessary infrastructure to support your semantic search use cases reliably at scale.
提供快速、新鲜和过滤的结果,Pinecone 的客户无需在性能、规模和查询速度之间做出权衡。Pinecone 是一个完全托管的向量数据库,受到全球一些大型企业的信任。我们提供必要的基础设施,以可靠的方式支持您的语义搜索用例。

Benefits of semantic search with Pinecone
使用 Pinecone 的语义搜索的好处

  • Ultra-low query latencies: Power search across billions of documents in milliseconds, combined with usage-based pricing for high-volume production applications. Partition indexes into namespaces to further reduce search scope and query latency.
    超低查询延迟:在毫秒内对数十亿文档进行强大搜索,并结合基于使用的定价,适用于高容量生产应用。将索引分区到命名空间中,以进一步减少搜索范围和查询延迟。
  • Better search results: With Pinecone, you can trust that you are searching the most up-to-date information with live index updates. Combine semantic search with metadata filters to increase relevance, and for hybrid search use cases, leverage our sparse-dense index support (using any LLM or sparse model) for the best results.
    更好的搜索结果:使用 Pinecone,您可以信任您正在搜索最新的信息,实时索引更新。结合语义搜索和元数据过滤器以提高相关性,对于混合搜索用例,利用我们的稀疏-密集索引支持(使用任何LLM或稀疏模型)以获得最佳结果。
  • Easy to use: Get started in no time with our free plan, and access Pinecone through the console, an easy-to-use REST API or one of our clients (Python, Node, Java, Go). Jumpstart your project by referencing our extensive documentation, example notebooks and applications, and many integrations.
    易于使用:通过我们的免费计划快速入门,并通过控制台、易于使用的 REST API 或我们的客户端(Python、Node、Java、Go)访问 Pinecone。通过参考我们丰富的文档、示例笔记本和应用程序以及众多集成来快速启动您的项目。
  • Fully-managed: Launch, use, and scale your search solution without needing to maintain infrastructure, monitor services, or troubleshoot algorithms. Pinecone supports both GCP and AWS — choose the provider and region that works best for you.
    完全托管:启动、使用和扩展您的搜索解决方案,无需维护基础设施、监控服务或排除算法故障。Pinecone 支持 GCP 和 AWS — 选择最适合您的提供商和区域。

Incorporating Pinecone into your search stack:
将 Pinecone 纳入您的搜索堆栈:

Adding semantic search to your search stack is easy with Pinecone. To get started, you need Pinecone plus the following components: data warehouse (or any source of truth for data), AI model, and your application. You can also refer to our example notebook and NLP for Semantic Search guide for more information.
将语义搜索添加到您的搜索堆栈中,使用 Pinecone 非常简单。要开始,您需要 Pinecone 以及以下组件:数据仓库(或任何数据的真实来源)、AI 模型和您的应用程序。您还可以参考我们的示例笔记本和语义搜索的 NLP 指南以获取更多信息。

Step 1: Take data from the data warehouse and generate vector embeddings using an AI model (e.g. sentence transformers or OpenAI’s embedding models).
步骤 1:从数据仓库中提取数据,并使用 AI 模型(例如句子变换器或 OpenAI 的嵌入模型)生成向量嵌入。

Step 2: Save those embeddings in Pinecone.
步骤 2:将这些嵌入保存到 Pinecone。

Step 3: From your application, embed queries using the same AI model to create a “query vector.”
步骤 3:在您的应用程序中,使用相同的 AI 模型嵌入查询以创建“查询向量”。

Step 4: Search through Pinecone using the query embedding, and receive ranked results based on semantic similarity.
步骤 4:使用查询嵌入在 Pinecone 中搜索,并根据语义相似性接收排名结果。

Complete your search stack with our integrations:
通过我们的集成来完善您的搜索堆栈:

Pinecone works with embeddings from any AI model or LLM. We recommend getting started with either OpenAI or Hugging Face. We also have integrations for LLM frameworks (e.g. LangChain) and data infrastructure (e.g. Databricks) to take your search applications to the next level.
Pinecone 可以与任何 AI 模型或 LLM 的嵌入一起使用。我们建议从 OpenAI 或 Hugging Face 开始。我们还提供 LLM 框架(例如 LangChain)和数据基础设施(例如 Databricks)的集成,以将您的搜索应用提升到一个新水平。

Get started today 今天就开始吧

Ready to start building with Pinecone? Create an account today to get started or contact us to talk to an expert.
准备好开始使用 Pinecone 构建了吗?今天就创建一个账户开始,或联系我们与专家交谈。

References 参考文献

  1. The value of getting personalization right—or wrong—is multiplying | McKinsey ↩︎
    正确或错误地实现个性化的价值正在倍增 | 麦肯锡 ↩︎
Share: