This is a bilingual snapshot page saved by the user at 2024-12-23 18:05 for https://zhuanlan.zhihu.com/p/721097691, provided with bilingual support by Immersive Translate. Learn how to save?
  First published onLLM Reasonable
LLM Reasoning(十):STaR的变种们

LLM Reasoning(十):STaR的变种们

12 人赞同了该文章
  Directory
  Fold up
  Past Review
  Introduction
  Detailed introduction
STaR
Quiet-STaR
V-STaR
Kwai-STaR
Lean-STaR
RL-STaR

  Past Review


Jarlene: LLM Reasoning (Part 1): STaR


Jarlene: LLM Reasoning (Part 2): Quiet-STaR


Jarlene:LLM Reasoning (3): Q*


Jarlene: LLM Reasoning (Part 4): rStar


Jarlene:LLM Reasoning (5): TTC


Jarlene:LLM Reasoning (6): Let's Verify Step by Step

Jarlene:LLM Reasoning(七):造数据(MiPS 、Math-Shepherd、OmegaPRM)

Jarlene:LLM Reasoning(八): MCTS

Jarlene:LLM Reasoning(九): MCTS+Self-Refine/DPO...

  Introduction


Originally planned to elaborate on llm agent for Reasoning techniques in the tenth issue, but recently while conducting various experiments, I found that there are many variants of the methods in STaR, so I want to discuss the variants of STaR separately, and then elaborate on the agent for Reasoning in the next issue.

  Detailed introduction

STaR


STaR aims to address how to improve the performance of language models in complex reasoning tasks, such as solving math problems or answering common sense questions. Its main feature is not relying on labeled data, but generating data through self-iterative models (CoT). The specific implementation steps are as follows:


  1. Initialization: Given a pre-trained large language model (LLM) M $$ M $$ and an initial set of questions D={(xi,yi)}i=1D $$ D = \{(x_i, y_i)\}_{i=1}^D $$, where xi $$ x_i$$ is the question, and yi $$y_i$$ is the answer. Additionally, there is a small example set P={(xpi,rpi,ypi)}i=1P P = \{(x_p^i, r_p^i, y_p^i)\}_{i=1}^P , where rpi $$ r_p^i $$ is the corresponding reasoning process.

  2. Rationale Generation: Using a small set of example prompts P $$ P $$ to guide the model M $$ M $$ to self-generate the rationale xi $$ x_i$$ for each question r^i $$\hat{r}_i $$ and the answer