Part 1 of 8hardware~7 min

Hardware Requirements for an AI Lab

If you want to run AI models locally, the first question isn't which model to use — it's whether your hardware can handle it. This post breaks down what actually matters, so you can make an informed decision before spending money or time on a setup that won't work.

The one thing that matters most: memory

When people talk about running LLMs locally, memory is the central constraint. Not storage, not CPU speed — memory. Specifically, the memory your GPU can access, because inference (the process of generating a response from a model) is almost entirely GPU-bound.

A 7B parameter model in 4-bit quantization needs roughly 4–5 GB of GPU memory just to load. A 13B model needs around 8–9 GB. A 70B model needs upwards of 40 GB. These numbers don't include the overhead for context window size or concurrent requests. The math is unforgiving.

Dedicated VRAM vs unified memory

On a traditional setup — a desktop or laptop with a discrete GPU — you have dedicated VRAM. This is memory physically on the GPU card, fast and exclusive to GPU tasks. The limitation is that it's fixed. If your GPU has 8 GB of VRAM and your model needs 10 GB, you're stuck unless you start offloading layers to system RAM, which tanks performance significantly.

Apple Silicon takes a different approach with unified memory. The CPU and GPU share a single memory pool. A MacBook Pro with 64 GB of unified memory can make most of that available to GPU workloads. This is genuinely useful for AI work — a 64 GB M3 Max can run a 70B model in 4-bit quantization without breaking a sweat. The trade-off is cost: Apple Silicon machines with large memory configurations are expensive.

What about Fedora with a dedicated GPU?

On a Fedora-based system, you're typically working with an AMD or NVIDIA GPU with fixed VRAM. But there's a useful trick worth knowing: TTM (Translation Table Maps) is the Linux kernel's memory manager for GPUs. By configuring TTM settings, you can allow VRAM to spill over into system RAM dynamically when it's full — somewhat similar in effect to how unified memory works on macOS, though not as seamless.

This won't give you the same bandwidth as true unified memory, but it can make the difference between a model loading or refusing to run. It's particularly useful when a model is just slightly over your VRAM budget.

For AMD GPUs on Fedora, this is enabled through the amdgpu kernel module. NVIDIA users can look into CUDA's unified memory API, though the practical benefit varies.

How to decide how much VRAM you need

Start with the models you actually want to run. If you're targeting 7B–13B models for coding assistance or local chat, 12–16 GB of VRAM is comfortable. If you want to run 30B+ models or do fine-tuning, you're looking at 24 GB or more.

A practical rule of thumb: take the model size in billions of parameters, multiply by 0.6 for 4-bit quantization, and add 2 GB for overhead. That's your minimum VRAM in gigabytes.

Don't over-invest upfront. Start with what you have, test it with the models you care about, and upgrade based on real bottlenecks rather than theoretical maximums.

Strong alternatives without a discrete GPU

Large-memory machines with integrated graphics (and sometimes an NPU) follow the same rule as Apple Silicon: memory bandwidth and capacity matter more than whether the accelerator is on a PCIe card. AMD Ryzen™ AI Max+ 395 is a credible Mac Studio alternative for many local AI workloads. This series focuses on Fedora + discrete GPU, but you still apply memory first, then validate your inference stack.

Recommended starting points

For a Fedora-based setup, an NVIDIA RTX 3090 (24 GB VRAM) or AMD RX 7900 XTX (24 GB VRAM) are strong choices that offer good price-to-performance for local AI work. Apple Silicon (Mac Mini, Mac Studio, MacBook Pro) with plenty of unified memory fills the same niche. For a Ryzen™ AI Max+ / NPU-focused mini PC without a discrete GPU, Beelink GTR9 Pro (AI Max+ 395, 128 GB RAM) is a strong prebuilt option — pick by budget, OS, and toolchain. The rest of this series walks through Fedora + GPU in detail.

All posts in this series