Ollama (Local)
Ollama lets you run open-source LLMs locally on your own machine. AI Supreme Council connects to your local Ollama instance directly from the browser -- no cloud API key required, and your data never leaves your device.
Why Run Locally?
- Complete privacy -- your conversations never leave your machine
- No API costs -- free to use, no per-token billing
- No rate limits -- limited only by your hardware
- Works offline -- no internet connection needed after model download
- No API key -- nothing to manage or rotate
Installing Ollama
- Download and install Ollama from ollama.com
- Pull at least one model:
# Popular general-purpose models
ollama pull llama3.3 # Meta Llama 3.3 (70B)
ollama pull llama3.2 # Meta Llama 3.2 (3B, lightweight)
ollama pull mistral # Mistral 7B
ollama pull gemma2 # Google Gemma 2
# Code-focused models
ollama pull codellama # Meta Code Llama
ollama pull deepseek-coder # DeepSeek Coder
ollama pull qwen2.5-coder # Alibaba Qwen 2.5 Coder
# Reasoning models
ollama pull deepseek-r1 # DeepSeek R1 (various sizes)
# Vision models
ollama pull llava # LLaVA (vision + language)
ollama pull llama3.2-vision # Llama 3.2 Vision
- Start the Ollama server (it runs automatically after install on most systems):
ollama serve
The server runs on http://localhost:11434 by default.
CORS Configuration
Browsers enforce cross-origin restrictions, so you must allow the AI Supreme Council origin before Ollama will work. Set the OLLAMA_ORIGINS environment variable before starting Ollama:
macOS / Linux:
OLLAMA_ORIGINS=* ollama serve
To make it permanent (macOS/Linux), add to your shell profile (~/.bashrc, ~/.zshrc):
export OLLAMA_ORIGINS=*
Windows (PowerShell):
$env:OLLAMA_ORIGINS="*"
ollama serve
Windows (permanently): Set OLLAMA_ORIGINS as a system environment variable via System Properties > Environment Variables.
Without this setting, the browser will block all requests to the Ollama API with a CORS error.
No API Key Needed
Ollama does not require an API key. AI Supreme Council uses an internal placeholder value (ollama) for the key field. You do not need to enter anything in the API key settings.
Auto Model Detection
On page load, AI Supreme Council queries GET /api/tags on the Ollama endpoint to discover all locally installed models. These models appear automatically in the model selector when you choose Ollama as the provider.
No models are hardcoded -- whatever you have pulled locally will be available. If you pull new models while the app is open, reload the page to detect them.
Custom Endpoint
If Ollama is running on a non-default address (e.g., a different port, a remote machine, or behind a reverse proxy), you can configure the endpoint:
- Open Settings > AI Model
- Find the Ollama section
- Enter your custom endpoint URL (e.g.,
http://192.168.1.100:11434)
The custom endpoint is persisted to localStorage under the key ais-ollama-endpoint.
If running Ollama on a remote machine, ensure:
- The Ollama server binds to
0.0.0.0(not justlocalhost):OLLAMA_HOST=0.0.0.0 ollama serve OLLAMA_ORIGINS=*is set on the remote machine- The port (default 11434) is accessible from your browser's network
Supported Models
Any model available in the Ollama model library can be used. Popular choices include:
| Category | Models | Description |
|---|---|---|
| General | Llama 3.3, Mistral, Gemma 2, Phi-3 | All-purpose chat and reasoning |
| Code | CodeLlama, DeepSeek Coder, Qwen 2.5 Coder, StarCoder | Code generation and analysis |
| Reasoning | DeepSeek R1, Qwen2.5 | Chain-of-thought reasoning |
| Vision | LLaVA, Llama 3.2 Vision | Image understanding |
| Small | Phi-3 Mini, Gemma 2B, TinyLlama | Low-resource devices |
Hardware Requirements
Ollama performance depends entirely on your local hardware:
| Model Size | RAM Required | GPU Recommended | Example Models |
|---|---|---|---|
| 1-3B | 4 GB | Optional | TinyLlama, Phi-3 Mini |
| 7-8B | 8 GB | 6+ GB VRAM | Mistral 7B, Llama 3.1 8B |
| 13B | 16 GB | 8+ GB VRAM | CodeLlama 13B |
| 70B | 64 GB | 40+ GB VRAM | Llama 3.3 70B |
For the best experience, use a model that fits in your GPU's VRAM. CPU-only inference works but is significantly slower. Models quantized to 4-bit (Q4) require roughly half the RAM of full-precision versions.
Configuration
When creating a bot profile, select Ollama as the provider and choose from your locally available models. Ollama uses the OpenAI-compatible Chat Completions API with SSE streaming, so it behaves identically to cloud providers from the chat interface perspective.
Limitations
- Ollama must be running and reachable from the browser
- Model quality and speed depend entirely on your local hardware
- Vision and tool-calling support varies by model -- not all Ollama models support these features
- First response after model load may be slow (model loads into memory on first use)
- No thinking/reasoning UI integration for local reasoning models (reasoning output appears inline)
Troubleshooting
| Problem | Solution |
|---|---|
| "Failed to fetch" or CORS error | Set OLLAMA_ORIGINS=* and restart Ollama |
| No models in dropdown | Ensure ollama serve is running and you have pulled at least one model |
| Very slow responses | Model may not fit in GPU VRAM; try a smaller model or quantized version |
| Connection refused | Check that Ollama is running on the expected port (default: 11434) |
| Custom endpoint not working | Ensure the URL includes the protocol (http://) and no trailing slash |