Chain of Draft Prompting: better result, less Tokens, less Energy Consumption, les costs - AI Consultants Belgium (Belgium) - Chatbots - AI Agents

Large Language Models (LLMs) have transformed the way we tackle challenging tasks like math word problems or nuanced reasoning questions. One popular approach called Chain-of-Thought (CoT) prompting helps LLMs lay out each reasoning step in detail. While that often improves accuracy, it can also lead to unnecessarily long responses and slow performance.

Silei Xu, Wenhao Xie, Lingxiao Zhao and Pengcheng He published a paper ‘Chain of Draft: Thinking Faster by Writing Less’ on 25 February 2025. Chain of Draft (CoD) prompting offers a more concise alternative. Instead of producing detailed paragraph-long justifications, the model writes down only minimal “draft” steps needed to reach a correct answer. This shrinks the length of the output drastically—leading to cost and time savings—yet still preserves reasoning transparency.

How It Works

Focus on Essentials. Humans often solve problems by scribbling only the key points (“draft”) on paper. Similarly, CoD asks an LLM to keep each intermediate step short and data-rich.
Clear but Minimal. CoD avoids flowery details. Each step might be just a few words or an equation, but it remains a fully correct reasoning path.
Token-Efficiency. Because models are billed (or measured) by their input and output tokens, fewer words mean lower cost and latency. In real-time or resource-constrained scenarios, that difference can be significant.

Example: Simple Arithmetic Problem

Let’s compare three prompting styles—Standard, Chain-of-Thought (CoT), and Chain-of-Draft (CoD)—using a straightforward math question. Our question:

“Jason had 20 lollipops. He gave Denny some. Now Jason has 12 left. How many did he give to Denny?”

1. Standard Prompting

Prompt to Model:
“Answer the question directly. Do not return any reasoning.”

Model’s Response:

Pros: Fast, direct answer.
Cons: No explanation, so there’s no insight into how the result was found. If the question were more complex, direct-only answers sometimes fail.

2. Chain-of-Thought (CoT) Prompting

Prompt to Model (shortened version):
“Think step by step to answer. Provide your reasoning and then the final answer after ‘####’.”

Model’s Response (excerpted):

1. Jason started with 20 lollipops.
2. He gave some to Denny.
3. Now he has 12 left.
20 - 12 = 8
#### 8

Pros: Clear step-by-step logic.
Cons: In more complex problems, CoT can become very verbose—consuming time, space, and cost.

3. Chain-of-Draft (CoD) Prompting

Prompt to Model (shortened version):
“Think step by step, but limit each step to just a few words. Provide only the essential calculations, then give the final answer after ‘####’.”

Model’s Response:

20 - x = 12; x = 8. #### 8

Pros:

Concise steps show the logic without unnecessary detail.
Much shorter output, improving speed and reducing token usage.

Cons:

Sometimes, extremely brief steps can be too minimal if the problem is very complicated.
Works best when the model has seen similar examples in its training or in-context “few-shot” prompts, so it knows how to keep reasoning succinct yet accurate.

Performance and Benefits

In research experiments on math and reasoning benchmarks (such as GSM8K – see image source arxiv.org), CoD often achieves comparable accuracy to CoT while reducing token usage by 70–90%. This translates to both lower compute cost and faster response times, which is crucial for time-sensitive applications like real-time customer support or interactive tools.

Here’s a simplified summary of how CoT compares to CoD in a math problem set (numbers just for illustration):

Prompt Method	Accuracy	Average Tokens/Response
Standard	~65%	1–2
Chain-of-Thought (CoT)	~95%	~200
Chain-of-Draft (CoD)	~91–95%	~40

The Chain-of-Draft method slashes the number of tokens while still delivering high accuracy—striking a balance between the no-explanation “Standard” approach and the sometimes excessively detailed “Chain-of-Thought.”

Limitations and Tips

Few-Shot Examples Help: Most LLMs need to see a couple of “draft style” examples first. Zero-shot CoD prompts (where you show no examples of how to do short drafts) can reduce its effectiveness.
Model Size Matters: Very small or older models may not handle short, symbolic “drafts” gracefully; they often benefit from CoT or specialized training for CoD.
Complex Problems: If the problem is extremely intricate, you might need a bit more detail than CoD typically provides. Aim for a happy medium that remains concise but thorough enough.

Conclusion

Chain of Draft (CoD) prompting is a lightweight, efficient way to guide large language models through reasoning tasks. By cutting the fluff and only writing down what’s truly important, you often get similar (or even better) results compared to verbose strategies—with a fraction of the cost and wait time.

For teams wanting to balance interpretability, accuracy, and speed, CoD can be a potent trick. Just remember to include a few examples in your prompts so that the model learns exactly how concise you want it to be.

References
• “Chain of Draft: Thinking Faster by Writing Less,” Xu et al., 2025.