你說:
請解釋
##
Don’t Embed Wrong!
8 hours ago (Nov 1, 2024) — 11:42
https://youtube.com/watch?v=76EIC_RaDNw
0:00
You are doing embedding wrong. I was in the founding olama team and until this week I did it wrong too. I read the docs, I even made videos about it, but then Aaron on the anomic team introduced me to prefixes. Depending on your use case and your content, adding a prefix to your content can make a massive difference to the success of your rag application. You may get results that are twice as accurate as before.
0:25
What do I mean by a prefix? Well, before you send a chunk to the embedding model, you insert a piece of text in front describing its purpose. three of the five embedding models in the official olama Library support prefixes, and they're different for each one, and none of the prefixes are part of the olama documentation. So let's take a look at them and see how using them Compares. This is the olama course.
0:50
Every week I put out another video that teaches you a bit more of everything you need to know about using olama to do everything you can do with AI. We're a few episodes in and I have lots more to come, okay, so I chunked up the scripts to my last few videos and then embedded them with five models without prefixes, then repeated that with the three that support prefixes.
1:14
Then I asked the question: how do I install n8n with Docker compose when using nomic embed text without prefixes? I didn't get an answer, but with prefixes in place I got this, and that's a much more complete answer, and this was when I asked it to give me a single dock or chunk back from the vector database. In just a second I'll go through all the models and a bunch more questions.
1:40
So stay on for that. So what are the prefixes? Well, with nomic embed text there are two main prefixes to use for the source documents that you add to the vector store: add search doent colon before the chunk of text, and to the query that you want to run the similarity search against, you add search _ query, colon, snowflake, Arctic and mixed bread. Both use the phrase. represent the sentence for searching relevant passages.
2:10
Colon instead of searchquery. Colon. And don't use any prefix for the documents Snowflake and mix, Brad, just use that one. But actually nomic uses a few others as well. If you're doing a classification, then use the prefix classification colon. But if you're trying to discover common topics in the text or eliminate semantic duplicates in the text, then you want clustering and the prefix for that is clustering colon.
2:39
If you're using the nomic API with their hosted service, then there's an option in the API call, but with AMA it really is just sticking the text, the prefix text, in front of the rest of the text. So that's pretty easy. But does it really make a difference? we'll find out if prefixes make a difference here in just a sec. But you know what makes a massive difference to me personally.
3:03
That would be you going down and clicking the like button and then subscribing to the channel. It helps me know I'm on the right track, helping you learn about AMA. Well, in the GitHub repo for this video, which I've linked to in the description below, I have five main bits of code. First there's a step to prepare the database, Vector prep. I think this uses chroma DB to create 16 collections.
3:26
So there are five models and three of them support prefixing. So that's eight Collections. And then I have one set of those that includes the questions in the vector store and then another set that don't include the questions. so that's where the 16 comes from. Is it appropriate to include the questions? Well, there are many documents you might want to add that include a question, say as a heading. This simulates that case.
3:52
There are four scripts that have been added and there are 13 questions based on them. I chunked up the scripts by paragraph, looking for new lines and then getting rid of any empty chunks, and then I embed them accordingly for each model. Creating embeddings is a core part of any rag application, as well as performing clustering, classification and other processes.
4:15
For rag, you generally get some Source text, split it up into smaller pieces, create embeddings from those pieces and then store that in a data store. Then you ask a question and find pieces similar to the question based on those embeddings. then the matching plain text Source chunks get handed off to the model for processing. Again it's the plain text source that gets sent to the model and not the embeddings. That's what the docs say.
4:41
That's what the my videos have said, and that's generally true. But prefixes is the little wrinkle that I just learned about. Now, for the first test, I embed the question with and without the prefixes as needed by each model, and then I find the top two results from each. Since the questions are in the vector store, the question I asked will always be the top result because it's 100% match.
5:05
So I remove that from the resulting docs. The results of this test show just the docs output from the vector store that come up as a good match. it does this for all the combinations, which is 13 * 8, which is about 104, not about it is 104. Then I go through each one and grade it for the first test. I'm just figuring out if the information provided could po eventually answer the question.
5:31
So let's take a look at the results. So the first question is: what is n8n and the docs I get back aren't that promising? Either I get nothing, or I get something about installing it, or I get one of the other questions, but nothing about what it is. Next question is about installing it with Docker.compose nomic doesn't give me anything, but nomic with the prefixes is much more useful.
5:55
The prefix snowflakes also gave me a good result. Okay, so how do I run n8n on my Mac prefix? nomic is the only one that gives an okay result, especially considering it only has one doc to work with. Now this video will get really boring really quickly if I go through every result with you. So let me speed it up and just get to the overall results.
6:20
So we see in this small data set that is definitely not statistically very interesting, that nomic embed text with the prefixes comes out ahead and snowflake with prefixes comes second and nomic without prefixes is third. But then again these numbers are so small, it's not all that conclusive.
6:40
If I could hire a young kid to do the grading or, even cheaper, a grad student, then testing on 20 scripts with 200 questions that get asked 10 times each could be more interesting, but I'm not going to sit through grading 3.000 results. So this is what we have, and the prefixes seem to come ahead. Of course, the obvious concern here is that we didn't actually test if a model can answer the question.
7:05
So I run this test again, but instead of outputting docks, I output an answer. The answer is being generated by IBM's new granite 3 DSE 8 billion parameter model, which seems to be really good at this task. You can see the prompt I use here from the source code. I've asked it to only use the info provided by the docs, but these are llm models, So the instructions not always followed.
7:29
So if there is an answer, I haven't output the documents to verify. The answer came from the documents and as usual the source code is in the same repo I always use and the link is in the description below. So let's see how that does and things here seem to be generally the same.
7:46
There were a few instances where it came up with the right answer, but that had to be the knowledge in the model because there was nothing relevant in the docs provided from the database. So I Mark those as a fail. This is interesting, but not really a real world test. In most cases I'm giving the model a single document from the vector store.
8:08
In reality I'd probably retrieve more docks and the question wouldn't be in there. So that's what we tried next, with the top five docks given to the model, and the questions were not added to the database. This is test three. In the repo we see that the prefixed versions of nomic and snowflake still won, with the unprefixed nomic still coming in third.
8:30
But one of the interesting things that we have seen each time is how prefix nomic is a little bit better than unprefixed nomic, but prefixed snowflake is a lot better than unprefixed snowflake. But generally all the embedding models seem to perform better. When we allow more documents to be used from the data store, they're all kind of catching up to the leaders. the final test.
8:55
Four just increases the number of documents pulled from the vector store from 5 to 10, and now most of the models perform a lot better. In fact, they all seem about equal, with the one exception being the unprefixed snowflake, which was definitely worse. Of course, all the numbers were dealing with are small, and to really understand the differences, we should be asking a lot more questions and asking them over and over and over again.
9:24
We should play with different chunking sizes and number of chunks delivered from the data store, but at a high level. It does seem that adding the prefixes does make a bit of difference most of the time, especially with snowflake. Now, one of the questions I get every now and then is: well, how about using the Llama models for embedding? they're a lot bigger and although they may be slower, maybe they get better results.
9:50
So I added llama 3,1.8B and llama 3,2.3B and mistol, which is 72b, to my model config. I ran Vector prep to get the vectors added to the collections and then ran test two. That's the one where it generated an answer based on one or two documents pulled from the data store. None of those three models could get any of the questions answered. So then I tried with test three.
10:20
That was the one where the questions were removed and five docks were pulled from the model. Llama 3,1 was a little bit better with than snowflake with PR prefixes, but all of the other embedding models perform so much better. so the embedding models are orders of magnitude faster and they just come up with better results. Please don't use regular llms for generating embeddings.
10:45
If you're still doing that, stop and use a model meant for embedding now. I would love for you to try this out with your own documents. Play with the chunking sizes. My code is written using doo2, which is using typescript, but it's not that difficult to switch over to using python or whatever you would like to use. What do you think? Were you surprised by anything you saw? It's so easy to add prefixes.
11:12
There's no reason not to use them, and they definitely get better results. I was so surprised when I heard about prefixes the other day, and I knew in my heart that it wouldn't make a lick of difference, and then I was shocked when I saw the difference they made. So hopefully you learned something. I certainly did. Prefixes, huh the way of the future. thanks so much for watching goodbye.
Transcribed with Scribe https://www.appblit.com/scribe