Groq
Groq provides ultra-fast AI inference powered by custom LPU (Language Processing Unit) hardware. Groq does not train its own models -- instead, it runs popular open-source models at dramatically higher speeds than traditional GPU infrastructure. Several models are available for free with rate limits.
Getting an API Key
- Visit console.groq.com/keys
- Sign in or create an account (free)
- Generate a new API key (starts with
gsk_...) - Paste the key into AI Supreme Council under Settings > AI Model > Groq
Groq offers a free tier with generous rate limits. No credit card is required to create an account and start using free models.
API keys are stored locally in your browser (localStorage) and are never included in shared bot URLs.
Supported Models
Free Models
| Model | Context Window | Max Output | Capabilities |
|---|---|---|---|
| Llama 3.3 70B | 128K | 32K | Tools, code, streaming |
| DeepSeek R1 Distill 70B | 128K | 16K | Reasoning, code, streaming |
| Compound Beta | 128K | 32K | Tools, reasoning, streaming |
| Llama 3.1 8B Instant | 128K | 8K | Tools, code, streaming |
| Gemma 2 9B | 8K | 8K | Streaming |
Paid Models
| Model | Context Window | Max Output | Input Price | Output Price | Capabilities |
|---|---|---|---|---|---|
| Llama 4 Scout | 128K | 8K | $0.11/MTok | $0.34/MTok | Vision, tools, code |
| Llama 4 Maverick | 128K | 8K | $0.50/MTok | $0.77/MTok | Vision, tools, code |
| Qwen3 32B | 128K | 8K | $0.29/MTok | $0.59/MTok | Tools, reasoning |
Prices are per million tokens (MTok).
Free models have rate limits that vary by model and account tier. Typical limits are:
- Requests per minute: 30
- Tokens per minute: 6,000-15,000
- Requests per day: 1,000-14,400
Check console.groq.com for current limits on your account.
Why Groq is Fast
Groq uses custom-designed LPU (Language Processing Unit) chips instead of GPUs. LPUs are purpose-built for sequential token generation, which is the bottleneck in LLM inference. The result:
- Time to first token: Often under 100ms
- Token generation speed: 500-800+ tokens/second on many models
- Consistent latency: Predictable performance without the variability of GPU batching
This makes Groq ideal for applications where response speed matters more than model size.
Reasoning Support
DeepSeek R1 Distill 70B and Compound Beta support reasoning, showing step-by-step thinking before delivering a final answer. Qwen3 32B (paid) also supports reasoning.
Since Groq uses the OpenAI-compatible API format, reasoning output streams as reasoning_content and appears in a collapsible thinking block in the chat.
Compound Beta (Agentic AI)
Compound Beta is Groq's compound AI system that combines reasoning with tool use. It can execute multi-step tasks by planning, reasoning, and using tools in sequence. This model is free and available with a Groq API key.
Vision Support
The paid Llama 4 Scout and Llama 4 Maverick models support vision input. You can paste, upload, or drag and drop images for these models.
Free models on Groq do not currently support vision.
Tool Calling
Most Groq models support function/tool calling via the OpenAI-compatible format. This includes the free Llama 3.3 70B and Compound Beta models.
OpenAI-Compatible API
Groq uses a fully OpenAI-compatible API:
- Standard
POST /openai/v1/chat/completionsendpoint atapi.groq.com - Bearer token authentication
- SSE streaming
- Tool/function calling
No special configuration is needed.
Configuration
When creating a bot profile, select Groq as the provider and choose your preferred model. You can set a per-bot API key in the bot configuration panel to override the global key.
The Groq provider uses the Chat Completions API at api.groq.com/openai/v1/chat/completions.
Best For
| Use Case | Recommended Model |
|---|---|
| Speed-critical chat | Llama 3.3 70B (free) |
| Fast reasoning | DeepSeek R1 Distill 70B (free) |
| Agentic workflows | Compound Beta (free) |
| Lightweight tasks | Llama 3.1 8B Instant (free) |
| Vision tasks | Llama 4 Scout or Maverick (paid) |
| Code + reasoning | Qwen3 32B (paid) |
Tips for Best Results
- Use Groq when speed matters. If you need the fastest possible responses and can work with open-source models, Groq is the best choice.
- Start with Llama 3.3 70B. It is free, fast, and capable -- the best general-purpose free model on Groq.
- Use DeepSeek R1 Distill for reasoning. It provides strong chain-of-thought reasoning for free, at Groq speeds.
- Pair with other providers in councils. Groq's speed makes it an excellent fast-response member in multi-model councils, where it can provide quick initial answers that slower, more powerful models refine.
- Be mindful of rate limits. Free tier rate limits can be hit quickly in high-volume usage. Spread requests across time or upgrade to a paid plan for higher limits.