Chain-of-Thought Reasoning: Teaching LLMs to Think Step by Step
Large language models can solve complex problems when guided to reason explicitly. Our research advances the science of machine reasoning.
The Reasoning Gap
LLMs often fail on problems requiring:
- Multi-step deduction
- Mathematical manipulation
- Causal reasoning
- Counterfactual thinking
Chain-of-Thought Prompting
By including reasoning steps in prompts, we unlock latent capabilities:
- Show worked examples
- Request explicit reasoning
- Verify intermediate steps
- Encourage self-correction
Verification Methods
We developed techniques to check reasoning:
- Self-consistency - Sample multiple paths
- Process reward models - Score each step
- Formal verification - Check logical validity
- Execution - Run code snippets
Results
| Benchmark | Base | + CoT | + Verification |
|---|---|---|---|
| GSM8K | 45% | 78% | 92% |
| MATH | 23% | 51% | 73% |
| ARC-Challenge | 67% | 85% | 94% |
Applications
This enables:
- Automated theorem proving
- Scientific hypothesis generation
- Complex planning tasks
- Educational tutoring systems