LLM Reasoning(十):STaR的变种们
Past Review
Jarlene: LLM Reasoning (Part 1): STaR
Jarlene: LLM Reasoning (Part 2): Quiet-STaR
Jarlene: LLM Reasoning (Part 4): rStar
Jarlene:LLM Reasoning (5): TTC
Jarlene:LLM Reasoning (6): Let's Verify Step by Step
Jarlene:LLM Reasoning(七):造数据(MiPS 、Math-Shepherd、OmegaPRM)
Jarlene:LLM Reasoning(八): MCTS
Jarlene:LLM Reasoning(九): MCTS+Self-Refine/DPO...
Introduction
Originally planned to elaborate on llm agent for Reasoning techniques in the tenth issue, but recently while conducting various experiments, I found that there are many variants of the methods in STaR, so I want to discuss the variants of STaR separately, and then elaborate on the agent for Reasoning in the next issue.
Detailed introduction
STaR
STaR aims to address how to improve the performance of language models in complex reasoning tasks, such as solving math problems or answering common sense questions. Its main feature is not relying on labeled data, but generating data through self-iterative models (CoT). The specific implementation steps are as follows:
Initialization: Given a pre-trained large language model (LLM) $$ M $$ and an initial set of questions $$ D = \{(x_i, y_i)\}_{i=1}^D $$, where $$ x_i$$ is the question, and $$y_i$$ is the answer. Additionally, there is a small example set P = \{(x_p^i, r_p^i, y_p^i)\}_{i=1}^P , where $$ r_p^i $$ is the corresponding reasoning process.
Rationale Generation: Using a small set of example prompts $$ P $$ to guide the model $$ M $$ to self-generate the rationale $$ x_i$$ for each question $$\hat{r}_i $$ and the answer