Zum Hauptinhalt springen

Groq

Groq provides ultra-fast AI inference powered by custom LPU (Language Processing Unit) hardware. Groq does not train its own models -- instead, it runs popular open-source models at dramatically higher speeds than traditional GPU infrastructure. Several models are available for free with rate limits.

Getting an API Key

  1. Visit console.groq.com/keys
  2. Sign in or create an account (free)
  3. Generate a new API key (starts with gsk_...)
  4. Paste the key into AI Supreme Council under Settings > AI Model > Groq
Free Tier

Groq offers a free tier with generous rate limits. No credit card is required to create an account and start using free models.

API keys are stored locally in your browser (localStorage) and are never included in shared bot URLs.

Supported Models

Free Models

ModelContext WindowMax OutputCapabilities
Llama 3.3 70B128K32KTools, code, streaming
DeepSeek R1 Distill 70B128K16KReasoning, code, streaming
Compound Beta128K32KTools, reasoning, streaming
Llama 3.1 8B Instant128K8KTools, code, streaming
Gemma 2 9B8K8KStreaming
ModelContext WindowMax OutputInput PriceOutput PriceCapabilities
Llama 4 Scout128K8K$0.11/MTok$0.34/MTokVision, tools, code
Llama 4 Maverick128K8K$0.50/MTok$0.77/MTokVision, tools, code
Qwen3 32B128K8K$0.29/MTok$0.59/MTokTools, reasoning

Prices are per million tokens (MTok).

Free Model Rate Limits

Free models have rate limits that vary by model and account tier. Typical limits are:

  • Requests per minute: 30
  • Tokens per minute: 6,000-15,000
  • Requests per day: 1,000-14,400

Check console.groq.com for current limits on your account.

Why Groq is Fast

Groq uses custom-designed LPU (Language Processing Unit) chips instead of GPUs. LPUs are purpose-built for sequential token generation, which is the bottleneck in LLM inference. The result:

  • Time to first token: Often under 100ms
  • Token generation speed: 500-800+ tokens/second on many models
  • Consistent latency: Predictable performance without the variability of GPU batching

This makes Groq ideal for applications where response speed matters more than model size.

Reasoning Support

DeepSeek R1 Distill 70B and Compound Beta support reasoning, showing step-by-step thinking before delivering a final answer. Qwen3 32B (paid) also supports reasoning.

Since Groq uses the OpenAI-compatible API format, reasoning output streams as reasoning_content and appears in a collapsible thinking block in the chat.

Compound Beta (Agentic AI)

Compound Beta is Groq's compound AI system that combines reasoning with tool use. It can execute multi-step tasks by planning, reasoning, and using tools in sequence. This model is free and available with a Groq API key.

Vision Support

The paid Llama 4 Scout and Llama 4 Maverick models support vision input. You can paste, upload, or drag and drop images for these models.

Free models on Groq do not currently support vision.

Tool Calling

Most Groq models support function/tool calling via the OpenAI-compatible format. This includes the free Llama 3.3 70B and Compound Beta models.

OpenAI-Compatible API

Groq uses a fully OpenAI-compatible API:

  • Standard POST /openai/v1/chat/completions endpoint at api.groq.com
  • Bearer token authentication
  • SSE streaming
  • Tool/function calling

No special configuration is needed.

Configuration

When creating a bot profile, select Groq as the provider and choose your preferred model. You can set a per-bot API key in the bot configuration panel to override the global key.

The Groq provider uses the Chat Completions API at api.groq.com/openai/v1/chat/completions.

Best For

Use CaseRecommended Model
Speed-critical chatLlama 3.3 70B (free)
Fast reasoningDeepSeek R1 Distill 70B (free)
Agentic workflowsCompound Beta (free)
Lightweight tasksLlama 3.1 8B Instant (free)
Vision tasksLlama 4 Scout or Maverick (paid)
Code + reasoningQwen3 32B (paid)

Tips for Best Results

  • Use Groq when speed matters. If you need the fastest possible responses and can work with open-source models, Groq is the best choice.
  • Start with Llama 3.3 70B. It is free, fast, and capable -- the best general-purpose free model on Groq.
  • Use DeepSeek R1 Distill for reasoning. It provides strong chain-of-thought reasoning for free, at Groq speeds.
  • Pair with other providers in councils. Groq's speed makes it an excellent fast-response member in multi-model councils, where it can provide quick initial answers that slower, more powerful models refine.
  • Be mindful of rate limits. Free tier rate limits can be hit quickly in high-volume usage. Spread requests across time or upgrade to a paid plan for higher limits.