Thinking & Reasoning
Some AI models can "think" before responding, showing their internal chain-of-thought reasoning process. AI Supreme Council supports extended thinking across multiple providers, letting you see how the model reasons through complex problems.
What Is Thinking Mode?
When reasoning is enabled, the model generates an internal "thinking" process before producing its final answer. This thinking output shows the model's step-by-step reasoning, which can include:
- Breaking down complex problems
- Considering multiple approaches
- Self-correcting mistakes
- Working through mathematical calculations
- Evaluating trade-offs
The thinking output is displayed in a collapsible section above the final response. Click to expand it and see the full reasoning chain.
Supported Providers
| Provider | Implementation | Models |
|---|---|---|
| Anthropic | Extended thinking (thinking.budget_tokens) | Claude Opus 4, Claude Sonnet 4, Claude 3.5 Sonnet |
| Google Gemini | ThinkingConfig (thinkingConfig.thinkingBudget) | Gemini 2.5 Pro, Gemini 2.5 Flash |
| OpenAI | Reasoning effort (reasoning_effort) | o1, o3, o3-mini, o4-mini |
| DeepSeek | Built-in reasoning | DeepSeek R1, DeepSeek R1 (via OpenRouter) |
| OpenRouter | Passes reasoning_effort to underlying model | Any reasoning-capable model |
Effort Levels
The reasoning effort dropdown controls how much "thinking" the model does before responding:
| Level | Token Budget | When to Use |
|---|---|---|
| Default (Off) | 0 | Standard responses, simple questions |
| Low | ~8,192 tokens | Quick reasoning, straightforward logic |
| Medium | ~32,768 tokens | Moderate complexity, code generation |
| High | ~128,000 tokens | Complex analysis, detailed problem-solving |
| Highest (Model Max) | Model's maximum output | Maximum reasoning depth for the hardest problems |
Token budgets are approximate. "Low", "Medium", and "High" map to specific token counts. "Highest (Model Max)" dynamically looks up the model's maximum output from the registry and allocates nearly all of it to thinking.
How "Highest (Model Max)" Works
When you select "Highest (Model Max)", the platform looks up the selected model's maximum output capacity from the community model registry at call time. For example:
- Claude Opus 4: up to ~127,000 thinking tokens
- Gemini 2.5 Pro: up to ~64,512 thinking tokens
- Gemini 2.5 Flash: up to ~64,512 thinking tokens
This ensures you always get the maximum reasoning depth the model supports, even as models are updated with new limits.
Custom Budget
For Anthropic and Gemini, you can specify an exact numeric token budget by entering a number in the reasoning field. For example, entering 50000 allocates exactly 50,000 tokens for thinking.
Where to Configure
Per-Bot (Config Panel)
- Open the config panel (right sidebar)
- Expand Advanced Settings
- Find the Reasoning Effort dropdown
- Select your desired level: Default, Low, Medium, or High
Per-Profile (Settings)
- Open Settings > Profile
- Expand Advanced Settings for the profile
- Set the Reasoning Effort dropdown
- Options include: Default, Low, Medium, High, and Highest (Model Max)
Per-Council-Member
- Open the council member settings (expand a member row)
- Find the Reasoning dropdown
- Set independently for each council member
In a council, you can enable reasoning for only specific members. For example, give the chairman "High" reasoning effort while keeping other members on "Default" to balance cost and quality.
How Thinking Output Is Displayed
During streaming, when a model is in its thinking phase, the chat shows a "Thinking..." indicator. Once thinking completes and the model begins its actual response, the thinking output appears as a collapsible details section:
[Thinking (12,847 chars)] <-- click to expand
The model's actual response appears here...
In council mode, each member's thinking output is shown in its own collapsible section within that member's response card.
Provider-Specific Behavior
Anthropic (Extended Thinking)
- Uses the
thinkingparameter:{ type: "enabled", budget_tokens: N } - Important: Anthropic requires
temperature: 1when extended thinking is enabled. The platform handles this automatically -- your configured temperature is overridden. - Thinking output arrives via
content_block_start(typethinking) andthinking_deltaevents in the SSE stream - The
max_tokensparameter is automatically increased to accommodate both thinking and response tokens
Google Gemini (ThinkingConfig)
- Uses
generationConfig.thinkingConfig.thinkingBudget - The
maxOutputTokensis automatically increased when thinking is enabled - Thinking output is included in the Gemini response stream
OpenAI-Compatible (Reasoning Effort)
- Uses the
reasoning_effortparameter with string values:"low","medium","high" - Numeric budgets and
"max"are mapped to"high"for OpenAI-compatible APIs - Reasoning output arrives via
delta.reasoning_contentin the SSE stream - Works with OpenAI, xAI (Grok), OpenRouter, and other OpenAI-compatible providers
When to Use Thinking Mode
| Task | Recommended Level |
|---|---|
| Simple Q&A, casual chat | Default (Off) |
| Code generation | Medium |
| Debugging complex code | High |
| Mathematical proofs | High |
| Multi-step analysis | High |
| Research synthesis | Medium to High |
| Creative writing | Default or Low |
| Hardest reasoning puzzles | Highest (Model Max) |
Cost Implications
Thinking tokens count toward output tokens and are billed accordingly. A model that "thinks" for 100,000 tokens before producing a 2,000-token response is billed for 102,000 output tokens. This can significantly increase costs, especially at the High and Highest levels.
Rough cost multipliers compared to Default:
| Level | Approximate Cost Multiplier |
|---|---|
| Default | 1x |
| Low | 2-4x |
| Medium | 5-15x |
| High | 15-50x |
| Highest | 30-100x+ |
The exact multiplier depends on the complexity of the question. Simple questions with High reasoning may use only a fraction of the budget, while complex problems may use the full allocation.
Start with "Medium" for most tasks and only increase to "High" or "Highest" when you need the model to work through particularly difficult problems. The quality improvement from Low to Medium is usually more noticeable than from High to Highest.