Groq

Groq provides ultra-fast AI inference powered by custom LPU (Language Processing Unit) hardware. Groq does not train its own models -- instead, it runs popular open-source models at dramatically higher speeds than traditional GPU infrastructure. Several models are available for free with rate limits.

Getting an API Key

Visit console.groq.com/keys
Sign in or create an account (free)
Generate a new API key (starts with gsk_...)
Paste the key into AI Supreme Council under Settings > AI Model > Groq

Free Tier

Groq offers a free tier with generous rate limits. No credit card is required to create an account and start using free models.

API keys are stored locally in your browser (localStorage) and are never included in shared bot URLs.

Supported Models

Free Models

Model	Context Window	Max Output	Capabilities
Llama 3.3 70B	128K	32K	Tools, code, streaming
DeepSeek R1 Distill 70B	128K	16K	Reasoning, code, streaming
Compound Beta	128K	32K	Tools, reasoning, streaming
Llama 3.1 8B Instant	128K	8K	Tools, code, streaming
Gemma 2 9B	8K	8K	Streaming

Paid Models

Model	Context Window	Max Output	Input Price	Output Price	Capabilities
Llama 4 Scout	128K	8K	$0.11/MTok	$0.34/MTok	Vision, tools, code
Llama 4 Maverick	128K	8K	$0.50/MTok	$0.77/MTok	Vision, tools, code
Qwen3 32B	128K	8K	$0.29/MTok	$0.59/MTok	Tools, reasoning

Prices are per million tokens (MTok).

Free Model Rate Limits

Free models have rate limits that vary by model and account tier. Typical limits are:

Requests per minute: 30
Tokens per minute: 6,000-15,000
Requests per day: 1,000-14,400

Check console.groq.com for current limits on your account.

Why Groq is Fast

Groq uses custom-designed LPU (Language Processing Unit) chips instead of GPUs. LPUs are purpose-built for sequential token generation, which is the bottleneck in LLM inference. The result:

Time to first token: Often under 100ms
Token generation speed: 500-800+ tokens/second on many models
Consistent latency: Predictable performance without the variability of GPU batching

This makes Groq ideal for applications where response speed matters more than model size.

Reasoning Support

DeepSeek R1 Distill 70B and Compound Beta support reasoning, showing step-by-step thinking before delivering a final answer. Qwen3 32B (paid) also supports reasoning.

Since Groq uses the OpenAI-compatible API format, reasoning output streams as reasoning_content and appears in a collapsible thinking block in the chat.

Compound Beta (Agentic AI)

Compound Beta is Groq's compound AI system that combines reasoning with tool use. It can execute multi-step tasks by planning, reasoning, and using tools in sequence. This model is free and available with a Groq API key.

Vision Support

The paid Llama 4 Scout and Llama 4 Maverick models support vision input. You can paste, upload, or drag and drop images for these models.

Free models on Groq do not currently support vision.

Tool Calling

Most Groq models support function/tool calling via the OpenAI-compatible format. This includes the free Llama 3.3 70B and Compound Beta models.

OpenAI-Compatible API

Groq uses a fully OpenAI-compatible API:

Standard POST /openai/v1/chat/completions endpoint at api.groq.com
Bearer token authentication
SSE streaming
Tool/function calling

No special configuration is needed.

Configuration

When creating a bot profile, select Groq as the provider and choose your preferred model. You can set a per-bot API key in the bot configuration panel to override the global key.

The Groq provider uses the Chat Completions API at api.groq.com/openai/v1/chat/completions.

Best For

Use Case	Recommended Model
Speed-critical chat	Llama 3.3 70B (free)
Fast reasoning	DeepSeek R1 Distill 70B (free)
Agentic workflows	Compound Beta (free)
Lightweight tasks	Llama 3.1 8B Instant (free)
Vision tasks	Llama 4 Scout or Maverick (paid)
Code + reasoning	Qwen3 32B (paid)

Tips for Best Results

Use Groq when speed matters. If you need the fastest possible responses and can work with open-source models, Groq is the best choice.
Start with Llama 3.3 70B. It is free, fast, and capable -- the best general-purpose free model on Groq.
Use DeepSeek R1 Distill for reasoning. It provides strong chain-of-thought reasoning for free, at Groq speeds.
Pair with other providers in councils. Groq's speed makes it an excellent fast-response member in multi-model councils, where it can provide quick initial answers that slower, more powerful models refine.
Be mindful of rate limits. Free tier rate limits can be hit quickly in high-volume usage. Spread requests across time or upgrade to a paid plan for higher limits.

Getting an API Key​

Supported Models​

Free Models​

Paid Models​

Why Groq is Fast​

Reasoning Support​

Compound Beta (Agentic AI)​

Vision Support​

Tool Calling​

OpenAI-Compatible API​

Configuration​

Best For​

Tips for Best Results​