Ollama (Local)

Ollama lets you run open-source LLMs locally on your own machine. AI Supreme Council connects to your local Ollama instance directly from the browser -- no cloud API key required, and your data never leaves your device.

Why Run Locally?

Complete privacy -- your conversations never leave your machine
No API costs -- free to use, no per-token billing
No rate limits -- limited only by your hardware
Works offline -- no internet connection needed after model download
No API key -- nothing to manage or rotate

Installing Ollama

Download and install Ollama from ollama.com
Pull at least one model:

# Popular general-purpose models
ollama pull llama3.3          # Meta Llama 3.3 (70B)
ollama pull llama3.2          # Meta Llama 3.2 (3B, lightweight)
ollama pull mistral           # Mistral 7B
ollama pull gemma2            # Google Gemma 2

# Code-focused models
ollama pull codellama         # Meta Code Llama
ollama pull deepseek-coder    # DeepSeek Coder
ollama pull qwen2.5-coder     # Alibaba Qwen 2.5 Coder

# Reasoning models
ollama pull deepseek-r1       # DeepSeek R1 (various sizes)

# Vision models
ollama pull llava             # LLaVA (vision + language)
ollama pull llama3.2-vision   # Llama 3.2 Vision

Start the Ollama server (it runs automatically after install on most systems):

ollama serve

The server runs on http://localhost:11434 by default.

CORS Configuration

Required Step

Browsers enforce cross-origin restrictions, so you must allow the AI Supreme Council origin before Ollama will work. Set the OLLAMA_ORIGINS environment variable before starting Ollama:

macOS / Linux:

OLLAMA_ORIGINS=* ollama serve

To make it permanent (macOS/Linux), add to your shell profile (~/.bashrc, ~/.zshrc):

export OLLAMA_ORIGINS=*

Windows (PowerShell):

$env:OLLAMA_ORIGINS="*"
ollama serve

Windows (permanently): Set OLLAMA_ORIGINS as a system environment variable via System Properties > Environment Variables.

Without this setting, the browser will block all requests to the Ollama API with a CORS error.

No API Key Needed

Ollama does not require an API key. AI Supreme Council uses an internal placeholder value (ollama) for the key field. You do not need to enter anything in the API key settings.

Auto Model Detection

On page load, AI Supreme Council queries GET /api/tags on the Ollama endpoint to discover all locally installed models. These models appear automatically in the model selector when you choose Ollama as the provider.

No models are hardcoded -- whatever you have pulled locally will be available. If you pull new models while the app is open, reload the page to detect them.

Custom Endpoint

If Ollama is running on a non-default address (e.g., a different port, a remote machine, or behind a reverse proxy), you can configure the endpoint:

Open Settings > AI Model
Find the Ollama section
Enter your custom endpoint URL (e.g., http://192.168.1.100:11434)

The custom endpoint is persisted to localStorage under the key ais-ollama-endpoint.

Remote Access

If running Ollama on a remote machine, ensure:

The Ollama server binds to 0.0.0.0 (not just localhost): OLLAMA_HOST=0.0.0.0 ollama serve
OLLAMA_ORIGINS=* is set on the remote machine
The port (default 11434) is accessible from your browser's network

Supported Models

Any model available in the Ollama model library can be used. Popular choices include:

Category	Models	Description
General	Llama 3.3, Mistral, Gemma 2, Phi-3	All-purpose chat and reasoning
Code	CodeLlama, DeepSeek Coder, Qwen 2.5 Coder, StarCoder	Code generation and analysis
Reasoning	DeepSeek R1, Qwen2.5	Chain-of-thought reasoning
Vision	LLaVA, Llama 3.2 Vision	Image understanding
Small	Phi-3 Mini, Gemma 2B, TinyLlama	Low-resource devices

Hardware Requirements

Ollama performance depends entirely on your local hardware:

Model Size	RAM Required	GPU Recommended	Example Models
1-3B	4 GB	Optional	TinyLlama, Phi-3 Mini
7-8B	8 GB	6+ GB VRAM	Mistral 7B, Llama 3.1 8B
13B	16 GB	8+ GB VRAM	CodeLlama 13B
70B	64 GB	40+ GB VRAM	Llama 3.3 70B

dica

For the best experience, use a model that fits in your GPU's VRAM. CPU-only inference works but is significantly slower. Models quantized to 4-bit (Q4) require roughly half the RAM of full-precision versions.

Configuration

When creating a bot profile, select Ollama as the provider and choose from your locally available models. Ollama uses the OpenAI-compatible Chat Completions API with SSE streaming, so it behaves identically to cloud providers from the chat interface perspective.

Limitations

Ollama must be running and reachable from the browser
Model quality and speed depend entirely on your local hardware
Vision and tool-calling support varies by model -- not all Ollama models support these features
First response after model load may be slow (model loads into memory on first use)
No thinking/reasoning UI integration for local reasoning models (reasoning output appears inline)

Troubleshooting

Problem	Solution
"Failed to fetch" or CORS error	Set `OLLAMA_ORIGINS=*` and restart Ollama
No models in dropdown	Ensure `ollama serve` is running and you have pulled at least one model
Very slow responses	Model may not fit in GPU VRAM; try a smaller model or quantized version
Connection refused	Check that Ollama is running on the expected port (default: 11434)
Custom endpoint not working	Ensure the URL includes the protocol (`http://`) and no trailing slash

Why Run Locally?​

Installing Ollama​

CORS Configuration​

No API Key Needed​

Auto Model Detection​

Custom Endpoint​

Supported Models​

Hardware Requirements​

Configuration​

Limitations​

Troubleshooting​