Thinking & Reasoning

Some AI models can "think" before responding, showing their internal chain-of-thought reasoning process. AI Supreme Council supports extended thinking across multiple providers, letting you see how the model reasons through complex problems.

What Is Thinking Mode?

When reasoning is enabled, the model generates an internal "thinking" process before producing its final answer. This thinking output shows the model's step-by-step reasoning, which can include:

Breaking down complex problems
Considering multiple approaches
Self-correcting mistakes
Working through mathematical calculations
Evaluating trade-offs

The thinking output is displayed in a collapsible section above the final response. Click to expand it and see the full reasoning chain.

Supported Providers

Provider	Implementation	Models
Anthropic	Extended thinking (`thinking.budget_tokens`)	Claude Opus 4, Claude Sonnet 4, Claude 3.5 Sonnet
Google Gemini	ThinkingConfig (`thinkingConfig.thinkingBudget`)	Gemini 2.5 Pro, Gemini 2.5 Flash
OpenAI	Reasoning effort (`reasoning_effort`)	o1, o3, o3-mini, o4-mini
DeepSeek	Built-in reasoning	DeepSeek R1, DeepSeek R1 (via OpenRouter)
OpenRouter	Passes reasoning_effort to underlying model	Any reasoning-capable model

Effort Levels

The reasoning effort dropdown controls how much "thinking" the model does before responding:

Level	Token Budget	When to Use
Default (Off)	0	Standard responses, simple questions
Low	~8,192 tokens	Quick reasoning, straightforward logic
Medium	~32,768 tokens	Moderate complexity, code generation
High	~128,000 tokens	Complex analysis, detailed problem-solving
Highest (Model Max)	Model's maximum output	Maximum reasoning depth for the hardest problems

info

Token budgets are approximate. "Low", "Medium", and "High" map to specific token counts. "Highest (Model Max)" dynamically looks up the model's maximum output from the registry and allocates nearly all of it to thinking.

How "Highest (Model Max)" Works

When you select "Highest (Model Max)", the platform looks up the selected model's maximum output capacity from the community model registry at call time. For example:

Claude Opus 4: up to ~127,000 thinking tokens
Gemini 2.5 Pro: up to ~64,512 thinking tokens
Gemini 2.5 Flash: up to ~64,512 thinking tokens

This ensures you always get the maximum reasoning depth the model supports, even as models are updated with new limits.

Custom Budget

For Anthropic and Gemini, you can specify an exact numeric token budget by entering a number in the reasoning field. For example, entering 50000 allocates exactly 50,000 tokens for thinking.

Where to Configure

Per-Bot (Config Panel)

Open the config panel (right sidebar)
Expand Advanced Settings
Find the Reasoning Effort dropdown
Select your desired level: Default, Low, Medium, or High

Per-Profile (Settings)

Open Settings > Profile
Expand Advanced Settings for the profile
Set the Reasoning Effort dropdown
Options include: Default, Low, Medium, High, and Highest (Model Max)

Per-Council-Member

Open the council member settings (expand a member row)
Find the Reasoning dropdown
Set independently for each council member

tip

In a council, you can enable reasoning for only specific members. For example, give the chairman "High" reasoning effort while keeping other members on "Default" to balance cost and quality.

How Thinking Output Is Displayed

During streaming, when a model is in its thinking phase, the chat shows a "Thinking..." indicator. Once thinking completes and the model begins its actual response, the thinking output appears as a collapsible details section:

[Thinking (12,847 chars)]     <-- click to expand

The model's actual response appears here...

In council mode, each member's thinking output is shown in its own collapsible section within that member's response card.

Provider-Specific Behavior

Anthropic (Extended Thinking)

Uses the thinking parameter: { type: "enabled", budget_tokens: N }
Important: Anthropic requires temperature: 1 when extended thinking is enabled. The platform handles this automatically -- your configured temperature is overridden.
Thinking output arrives via content_block_start (type thinking) and thinking_delta events in the SSE stream
The max_tokens parameter is automatically increased to accommodate both thinking and response tokens

Google Gemini (ThinkingConfig)

Uses generationConfig.thinkingConfig.thinkingBudget
The maxOutputTokens is automatically increased when thinking is enabled
Thinking output is included in the Gemini response stream

OpenAI-Compatible (Reasoning Effort)

Uses the reasoning_effort parameter with string values: "low", "medium", "high"
Numeric budgets and "max" are mapped to "high" for OpenAI-compatible APIs
Reasoning output arrives via delta.reasoning_content in the SSE stream
Works with OpenAI, xAI (Grok), OpenRouter, and other OpenAI-compatible providers

When to Use Thinking Mode

Task	Recommended Level
Simple Q&A, casual chat	Default (Off)
Code generation	Medium
Debugging complex code	High
Mathematical proofs	High
Multi-step analysis	High
Research synthesis	Medium to High
Creative writing	Default or Low
Hardest reasoning puzzles	Highest (Model Max)

Cost Implications

aviso

Thinking tokens count toward output tokens and are billed accordingly. A model that "thinks" for 100,000 tokens before producing a 2,000-token response is billed for 102,000 output tokens. This can significantly increase costs, especially at the High and Highest levels.

Rough cost multipliers compared to Default:

Level	Approximate Cost Multiplier
Default	1x
Low	2-4x
Medium	5-15x
High	15-50x
Highest	30-100x+

The exact multiplier depends on the complexity of the question. Simple questions with High reasoning may use only a fraction of the budget, while complex problems may use the full allocation.

tip

Start with "Medium" for most tasks and only increase to "High" or "Highest" when you need the model to work through particularly difficult problems. The quality improvement from Low to Medium is usually more noticeable than from High to Highest.

What Is Thinking Mode?​

Supported Providers​

Effort Levels​

How "Highest (Model Max)" Works​

Custom Budget​

Where to Configure​

Per-Bot (Config Panel)​

Per-Profile (Settings)​

Per-Council-Member​

How Thinking Output Is Displayed​

Provider-Specific Behavior​

Anthropic (Extended Thinking)​

Google Gemini (ThinkingConfig)​

OpenAI-Compatible (Reasoning Effort)​

When to Use Thinking Mode​

Cost Implications​