Pular para o conteúdo principal

Ollama (Local)

Ollama lets you run open-source LLMs locally on your own machine. AI Supreme Council connects to your local Ollama instance directly from the browser -- no cloud API key required, and your data never leaves your device.

Why Run Locally?

  • Complete privacy -- your conversations never leave your machine
  • No API costs -- free to use, no per-token billing
  • No rate limits -- limited only by your hardware
  • Works offline -- no internet connection needed after model download
  • No API key -- nothing to manage or rotate

Installing Ollama

  1. Download and install Ollama from ollama.com
  2. Pull at least one model:
# Popular general-purpose models
ollama pull llama3.3 # Meta Llama 3.3 (70B)
ollama pull llama3.2 # Meta Llama 3.2 (3B, lightweight)
ollama pull mistral # Mistral 7B
ollama pull gemma2 # Google Gemma 2

# Code-focused models
ollama pull codellama # Meta Code Llama
ollama pull deepseek-coder # DeepSeek Coder
ollama pull qwen2.5-coder # Alibaba Qwen 2.5 Coder

# Reasoning models
ollama pull deepseek-r1 # DeepSeek R1 (various sizes)

# Vision models
ollama pull llava # LLaVA (vision + language)
ollama pull llama3.2-vision # Llama 3.2 Vision
  1. Start the Ollama server (it runs automatically after install on most systems):
ollama serve

The server runs on http://localhost:11434 by default.

CORS Configuration

Required Step

Browsers enforce cross-origin restrictions, so you must allow the AI Supreme Council origin before Ollama will work. Set the OLLAMA_ORIGINS environment variable before starting Ollama:

macOS / Linux:

OLLAMA_ORIGINS=* ollama serve

To make it permanent (macOS/Linux), add to your shell profile (~/.bashrc, ~/.zshrc):

export OLLAMA_ORIGINS=*

Windows (PowerShell):

$env:OLLAMA_ORIGINS="*"
ollama serve

Windows (permanently): Set OLLAMA_ORIGINS as a system environment variable via System Properties > Environment Variables.

Without this setting, the browser will block all requests to the Ollama API with a CORS error.

No API Key Needed

Ollama does not require an API key. AI Supreme Council uses an internal placeholder value (ollama) for the key field. You do not need to enter anything in the API key settings.

Auto Model Detection

On page load, AI Supreme Council queries GET /api/tags on the Ollama endpoint to discover all locally installed models. These models appear automatically in the model selector when you choose Ollama as the provider.

No models are hardcoded -- whatever you have pulled locally will be available. If you pull new models while the app is open, reload the page to detect them.

Custom Endpoint

If Ollama is running on a non-default address (e.g., a different port, a remote machine, or behind a reverse proxy), you can configure the endpoint:

  1. Open Settings > AI Model
  2. Find the Ollama section
  3. Enter your custom endpoint URL (e.g., http://192.168.1.100:11434)

The custom endpoint is persisted to localStorage under the key ais-ollama-endpoint.

Remote Access

If running Ollama on a remote machine, ensure:

  1. The Ollama server binds to 0.0.0.0 (not just localhost): OLLAMA_HOST=0.0.0.0 ollama serve
  2. OLLAMA_ORIGINS=* is set on the remote machine
  3. The port (default 11434) is accessible from your browser's network

Supported Models

Any model available in the Ollama model library can be used. Popular choices include:

CategoryModelsDescription
GeneralLlama 3.3, Mistral, Gemma 2, Phi-3All-purpose chat and reasoning
CodeCodeLlama, DeepSeek Coder, Qwen 2.5 Coder, StarCoderCode generation and analysis
ReasoningDeepSeek R1, Qwen2.5Chain-of-thought reasoning
VisionLLaVA, Llama 3.2 VisionImage understanding
SmallPhi-3 Mini, Gemma 2B, TinyLlamaLow-resource devices

Hardware Requirements

Ollama performance depends entirely on your local hardware:

Model SizeRAM RequiredGPU RecommendedExample Models
1-3B4 GBOptionalTinyLlama, Phi-3 Mini
7-8B8 GB6+ GB VRAMMistral 7B, Llama 3.1 8B
13B16 GB8+ GB VRAMCodeLlama 13B
70B64 GB40+ GB VRAMLlama 3.3 70B
dica

For the best experience, use a model that fits in your GPU's VRAM. CPU-only inference works but is significantly slower. Models quantized to 4-bit (Q4) require roughly half the RAM of full-precision versions.

Configuration

When creating a bot profile, select Ollama as the provider and choose from your locally available models. Ollama uses the OpenAI-compatible Chat Completions API with SSE streaming, so it behaves identically to cloud providers from the chat interface perspective.

Limitations

  • Ollama must be running and reachable from the browser
  • Model quality and speed depend entirely on your local hardware
  • Vision and tool-calling support varies by model -- not all Ollama models support these features
  • First response after model load may be slow (model loads into memory on first use)
  • No thinking/reasoning UI integration for local reasoning models (reasoning output appears inline)

Troubleshooting

ProblemSolution
"Failed to fetch" or CORS errorSet OLLAMA_ORIGINS=* and restart Ollama
No models in dropdownEnsure ollama serve is running and you have pulled at least one model
Very slow responsesModel may not fit in GPU VRAM; try a smaller model or quantized version
Connection refusedCheck that Ollama is running on the expected port (default: 11434)
Custom endpoint not workingEnsure the URL includes the protocol (http://) and no trailing slash