The Full Setup Guide — Everything Connected
This post brings together everything covered in the series into a single, working local AI setup. By the end, you'll have a Fedora workstation running local models, accessible remotely via Tailscale, with editor integration and agent tooling ready to use.
What you'll need
- A Fedora Workstation installation (Fedora 39 or later recommended)
- A GPU with at least 8 GB VRAM (12–16 GB recommended), or Apple Silicon with 32 GB+ unified memory
- A Tailscale account (free tier is sufficient)
- A Hugging Face account (free)
Step 1: Prepare your GPU environment
For NVIDIA:
Enable RPM Fusion and install the proprietary driver:
sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda
Reboot after installation. Verify with nvidia-smi.
For AMD:
The amdgpu driver ships in the Fedora kernel. To enable ROCm for compute workloads:
sudo dnf install rocm-opencl rocm-hip
Verify ROCm is available with rocminfo.
Optional — TTM configuration for VRAM overflow:
To allow your GPU VRAM to spill into system RAM when full, add the following to your kernel parameters:
sudo grubby --update-kernel=ALL --args="amdgpu.vm_size=256"
Adjust the value based on how much system RAM you want to make available as overflow. Reboot for this to take effect.
Step 2: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Pull your first model:
ollama pull llama3.2
ollama pull nomic-embed-text # useful for embeddings later
Test it's working:
ollama run llama3.2 "Hello, are you running locally?"
For a Hermes model with strong tool-use capability:
ollama pull hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF
Step 3: Configure Ollama as a persistent service
Edit the Ollama systemd service to bind to your network interface:
sudo systemctl edit ollama
Add the following:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Then enable and start:
sudo systemctl enable ollama
sudo systemctl start ollama
We'll tighten the network binding to Tailscale only in the next step.
Step 4: Set up Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
Authenticate via the URL printed in the terminal. Note your machine's Tailscale IP from the dashboard (Settings → Machines).
Now update the Ollama service to bind only to your Tailscale IP rather than all interfaces:
sudo systemctl edit ollama
[Service]
Environment="OLLAMA_HOST=100.x.x.x:11434"
Replace 100.x.x.x with your actual Tailscale IP. Restart Ollama:
sudo systemctl restart ollama
From any other device on your Tailscale network, you can now reach Ollama at http://100.x.x.x:11434.
Step 5: Install LM Studio (optional, for GUI access)
Download the AppImage from lmstudio.ai, make it executable, and run it:
chmod +x LM_Studio-*.AppImage
./LM_Studio-*.AppImage
In LM Studio's settings, enable the local server on the same Tailscale IP to expose a second OpenAI-compatible endpoint if you want a GUI-managed alternative to Ollama.
Step 6: Install Kilocode for VS Code
In VS Code, search for "Kilocode" in the Extensions panel and install it. In its settings, configure the API provider as "Ollama" and set the base URL to your Tailscale IP:
http://100.x.x.x:11434
Select your preferred model from the dropdown. Kilocode will now use your local model for all in-editor AI assistance.
Step 7: Install OpenCode for terminal workflows
npm install -g opencode-ai
Configure it to use your local Ollama endpoint:
opencode config set api_base http://100.x.x.x:11434/v1
opencode config set model llama3.2
Step 8: Verify the full stack
From your workstation:
ollama list— confirms models are loadedcurl http://100.x.x.x:11434/api/tags— confirms Ollama is accessible over Tailscale
From a remote device on your Tailscale network:
- Open a terminal and run
curl http://100.x.x.x:11434/api/tags— should return the same model list - Open VS Code with Kilocode — AI assistance should work via your remote workstation
What you've built
At this point you have: a Fedora workstation running local AI models via Ollama, accessible over a private Tailscale network, integrated into your editor via Kilocode, and available from the terminal via OpenCode. Your data stays on your hardware. There are no per-token costs. It works offline and across every device you own.
The next step is exploring agent workflows — using the Hermes model for tool-use tasks, experimenting with OpenClaw for multi-step pipelines, and building out whatever your actual use cases demand.