Part 8 of 8guide~12 min

The Full Setup Guide — Everything Connected

This post brings together everything covered in the series into a single, working local AI setup. By the end, you'll have a Fedora workstation running local models, accessible remotely via Tailscale, with editor integration and agent tooling ready to use.

What you'll need

A Fedora Workstation installation (Fedora 39 or later recommended)
A GPU with at least 8 GB VRAM (12–16 GB recommended), or Apple Silicon with 32 GB+ unified memory
A Tailscale account (free tier is sufficient)
A Hugging Face account (free)

Step 1: Prepare your GPU environment

For NVIDIA:

Enable RPM Fusion and install the proprietary driver:

sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
  https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda

Reboot after installation. Verify with nvidia-smi.

For AMD:

The amdgpu driver ships in the Fedora kernel. To enable ROCm for compute workloads:

sudo dnf install rocm-opencl rocm-hip

Verify ROCm is available with rocminfo.

Optional — TTM configuration for VRAM overflow:

To allow your GPU VRAM to spill into system RAM when full, add the following to your kernel parameters:

sudo grubby --update-kernel=ALL --args="amdgpu.vm_size=256"

Adjust the value based on how much system RAM you want to make available as overflow. Reboot for this to take effect.

Step 2: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Pull your first model:

ollama pull llama3.2
ollama pull nomic-embed-text  # useful for embeddings later

Test it's working:

ollama run llama3.2 "Hello, are you running locally?"

For a Hermes model with strong tool-use capability:

ollama pull hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF

Step 3: Configure Ollama as a persistent service

Edit the Ollama systemd service to bind to your network interface:

sudo systemctl edit ollama

Add the following:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Then enable and start:

sudo systemctl enable ollama
sudo systemctl start ollama

We'll tighten the network binding to Tailscale only in the next step.

Step 4: Set up Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Authenticate via the URL printed in the terminal. Note your machine's Tailscale IP from the dashboard (Settings → Machines).

Now update the Ollama service to bind only to your Tailscale IP rather than all interfaces:

sudo systemctl edit ollama

[Service]
Environment="OLLAMA_HOST=100.x.x.x:11434"

Replace 100.x.x.x with your actual Tailscale IP. Restart Ollama:

sudo systemctl restart ollama

From any other device on your Tailscale network, you can now reach Ollama at http://100.x.x.x:11434.

Step 5: Install LM Studio (optional, for GUI access)

Download the AppImage from lmstudio.ai, make it executable, and run it:

chmod +x LM_Studio-*.AppImage
./LM_Studio-*.AppImage

In LM Studio's settings, enable the local server on the same Tailscale IP to expose a second OpenAI-compatible endpoint if you want a GUI-managed alternative to Ollama.

Step 6: Install Kilocode for VS Code

In VS Code, search for "Kilocode" in the Extensions panel and install it. In its settings, configure the API provider as "Ollama" and set the base URL to your Tailscale IP:

http://100.x.x.x:11434

Select your preferred model from the dropdown. Kilocode will now use your local model for all in-editor AI assistance.

Step 7: Install OpenCode for terminal workflows

npm install -g opencode-ai

Configure it to use your local Ollama endpoint:

opencode config set api_base http://100.x.x.x:11434/v1
opencode config set model llama3.2

Step 8: Verify the full stack

From your workstation:

ollama list — confirms models are loaded
curl http://100.x.x.x:11434/api/tags — confirms Ollama is accessible over Tailscale

From a remote device on your Tailscale network:

Open a terminal and run curl http://100.x.x.x:11434/api/tags — should return the same model list
Open VS Code with Kilocode — AI assistance should work via your remote workstation

What you've built

At this point you have: a Fedora workstation running local AI models via Ollama, accessible over a private Tailscale network, integrated into your editor via Kilocode, and available from the terminal via OpenCode. Your data stays on your hardware. There are no per-token costs. It works offline and across every device you own.

The next step is exploring agent workflows — using the Hermes model for tool-use tasks, experimenting with OpenClaw for multi-step pipelines, and building out whatever your actual use cases demand.

All posts in this series