跳到主要内容

Thinking & Reasoning

Some AI models can "think" before responding, showing their internal chain-of-thought reasoning process. AI Supreme Council supports extended thinking across multiple providers, letting you see how the model reasons through complex problems.

What Is Thinking Mode?

When reasoning is enabled, the model generates an internal "thinking" process before producing its final answer. This thinking output shows the model's step-by-step reasoning, which can include:

  • Breaking down complex problems
  • Considering multiple approaches
  • Self-correcting mistakes
  • Working through mathematical calculations
  • Evaluating trade-offs

The thinking output is displayed in a collapsible section above the final response. Click to expand it and see the full reasoning chain.

Supported Providers

ProviderImplementationModels
AnthropicExtended thinking (thinking.budget_tokens)Claude Opus 4, Claude Sonnet 4, Claude 3.5 Sonnet
Google GeminiThinkingConfig (thinkingConfig.thinkingBudget)Gemini 2.5 Pro, Gemini 2.5 Flash
OpenAIReasoning effort (reasoning_effort)o1, o3, o3-mini, o4-mini
DeepSeekBuilt-in reasoningDeepSeek R1, DeepSeek R1 (via OpenRouter)
OpenRouterPasses reasoning_effort to underlying modelAny reasoning-capable model

Effort Levels

The reasoning effort dropdown controls how much "thinking" the model does before responding:

LevelToken BudgetWhen to Use
Default (Off)0Standard responses, simple questions
Low~8,192 tokensQuick reasoning, straightforward logic
Medium~32,768 tokensModerate complexity, code generation
High~128,000 tokensComplex analysis, detailed problem-solving
Highest (Model Max)Model's maximum outputMaximum reasoning depth for the hardest problems
信息

Token budgets are approximate. "Low", "Medium", and "High" map to specific token counts. "Highest (Model Max)" dynamically looks up the model's maximum output from the registry and allocates nearly all of it to thinking.

How "Highest (Model Max)" Works

When you select "Highest (Model Max)", the platform looks up the selected model's maximum output capacity from the community model registry at call time. For example:

  • Claude Opus 4: up to ~127,000 thinking tokens
  • Gemini 2.5 Pro: up to ~64,512 thinking tokens
  • Gemini 2.5 Flash: up to ~64,512 thinking tokens

This ensures you always get the maximum reasoning depth the model supports, even as models are updated with new limits.

Custom Budget

For Anthropic and Gemini, you can specify an exact numeric token budget by entering a number in the reasoning field. For example, entering 50000 allocates exactly 50,000 tokens for thinking.

Where to Configure

Per-Bot (Config Panel)

  1. Open the config panel (right sidebar)
  2. Expand Advanced Settings
  3. Find the Reasoning Effort dropdown
  4. Select your desired level: Default, Low, Medium, or High

Per-Profile (Settings)

  1. Open Settings > Profile
  2. Expand Advanced Settings for the profile
  3. Set the Reasoning Effort dropdown
  4. Options include: Default, Low, Medium, High, and Highest (Model Max)

Per-Council-Member

  1. Open the council member settings (expand a member row)
  2. Find the Reasoning dropdown
  3. Set independently for each council member
提示

In a council, you can enable reasoning for only specific members. For example, give the chairman "High" reasoning effort while keeping other members on "Default" to balance cost and quality.

How Thinking Output Is Displayed

During streaming, when a model is in its thinking phase, the chat shows a "Thinking..." indicator. Once thinking completes and the model begins its actual response, the thinking output appears as a collapsible details section:

[Thinking (12,847 chars)]     <-- click to expand

The model's actual response appears here...

In council mode, each member's thinking output is shown in its own collapsible section within that member's response card.

Provider-Specific Behavior

Anthropic (Extended Thinking)

  • Uses the thinking parameter: { type: "enabled", budget_tokens: N }
  • Important: Anthropic requires temperature: 1 when extended thinking is enabled. The platform handles this automatically -- your configured temperature is overridden.
  • Thinking output arrives via content_block_start (type thinking) and thinking_delta events in the SSE stream
  • The max_tokens parameter is automatically increased to accommodate both thinking and response tokens

Google Gemini (ThinkingConfig)

  • Uses generationConfig.thinkingConfig.thinkingBudget
  • The maxOutputTokens is automatically increased when thinking is enabled
  • Thinking output is included in the Gemini response stream

OpenAI-Compatible (Reasoning Effort)

  • Uses the reasoning_effort parameter with string values: "low", "medium", "high"
  • Numeric budgets and "max" are mapped to "high" for OpenAI-compatible APIs
  • Reasoning output arrives via delta.reasoning_content in the SSE stream
  • Works with OpenAI, xAI (Grok), OpenRouter, and other OpenAI-compatible providers

When to Use Thinking Mode

TaskRecommended Level
Simple Q&A, casual chatDefault (Off)
Code generationMedium
Debugging complex codeHigh
Mathematical proofsHigh
Multi-step analysisHigh
Research synthesisMedium to High
Creative writingDefault or Low
Hardest reasoning puzzlesHighest (Model Max)

Cost Implications

注意

Thinking tokens count toward output tokens and are billed accordingly. A model that "thinks" for 100,000 tokens before producing a 2,000-token response is billed for 102,000 output tokens. This can significantly increase costs, especially at the High and Highest levels.

Rough cost multipliers compared to Default:

LevelApproximate Cost Multiplier
Default1x
Low2-4x
Medium5-15x
High15-50x
Highest30-100x+

The exact multiplier depends on the complexity of the question. Simple questions with High reasoning may use only a fraction of the budget, while complex problems may use the full allocation.

提示

Start with "Medium" for most tasks and only increase to "High" or "Highest" when you need the model to work through particularly difficult problems. The quality improvement from Low to Medium is usually more noticeable than from High to Highest.