LLM VRAM Requirement Calculator

Enter Calculation

Formula

Results

Estimated VRAM Required

16.8

Raw model weights	14 GB
Overhead (KV cache, activations, etc.)	2.8 GB

What is the LLM VRAM Requirement Calculator?

This tool estimates how much GPU video memory (VRAM) you need to load and run a large language model (LLM) for inference. The amount of memory is driven primarily by the number of model parameters and the numeric precision used to store each weight. A safety/overhead factor accounts for the KV cache, activations, and CUDA context that consume memory beyond the raw weights.

How to use it

Enter the model size in billions of parameters (e.g. 7 for a 7B model, 70 for Llama-3 70B). Choose the precision: FP32 uses 4 bytes per weight, FP16/BF16 uses 2 bytes, INT8 uses 1 byte, and INT4 quantization uses 0.5 bytes. Finally set the overhead factor — 1.2 (a 20% buffer) is a reasonable default for short-context inference; raise it for long contexts or batching.

The formula explained

$$\text{VRAM (GB)} = \text{Params (B)} \times \text{Bytes/Param} \times \text{Overhead}$$ Because 1 billion bytes ≈ 1 GB, multiplying parameters in billions by bytes per parameter gives gigabytes directly. The overhead factor then scales that up to cover runtime memory.

Bar comparison of bytes per parameter for FP32, FP16, INT8 and INT4 precisions — Lower precision formats use fewer bytes per parameter, reducing VRAM.

Diagram showing model parameter count multiplied by bytes per parameter and an overhead factor producing total GPU VRAM — VRAM equals parameter count times bytes per parameter times the overhead factor.

Worked example

For a 7B model in FP16 with a 1.2 overhead factor: $$7 \times 2 \times 1.2 = 16.8 \text{ GB}$$ That comfortably fits on a 24 GB card. The same model in INT4: $$7 \times 0.5 \times 1.2 = 4.2 \text{ GB}$$ easily running on an 8 GB GPU.

FAQ

Why is the actual usage higher than the raw weights? The KV cache grows with context length and batch size, and the framework reserves memory for activations and buffers — that is what the overhead factor approximates.

Does this include training? No. Training needs roughly 3–4× more memory for optimizer states and gradients; this estimate targets inference.

What overhead should I use? Use ~1.2 for short prompts, and 1.5–2.0+ for long contexts or heavy batching.

Last updated: June 19, 2026

Related calculators

LLM Tokens to Words Calculator

Convert between LLM tokens and words instantly. Estimate how many words a token count holds (or vice versa) using the ~0.75 words-per-token rule.
Internet Speed Requirement Calculator

Find out how many Mbps your home internet needs. Add up streaming, gaming, video calls and browsing across all devices with a headroom factor.
LLM API Cost Calculator

Estimate LLM API costs from input/output tokens and per-1K-token prices. Calculate cost per request and total spend across many requests.
AI/LLM Token Cost Calculator

Estimate the cost of LLM API calls from input/output tokens and per-1K-token prices. Calculate cost per call and total spend across many calls.
LLM Tokens to Words & Cost Calculator

Convert LLM tokens to an estimated word count and compute API cost from a price per 1,000 tokens. Quick estimate for GPT, Claude and other models.
LLM VRAM Requirement Calculator

Estimate the GPU VRAM needed to run a large language model. Enter parameters in billions and choose FP16, 8-bit, or 4-bit quantization.

Discover

PC Power Supply (PSU) Wattage Calculator

Estimate the right PSU wattage for your PC build. Add CPU, GPU, RAM, storage and fans to get a recommended power supply size with 20% headroom.
SLA Uptime Percentage Calculator

Calculate SLA uptime and availability percentage from downtime over any period. Convert minutes of outage into 99.9% style availability figures.
Cloud VM / EC2 Instance Cost Calculator

Estimate the monthly, daily and yearly cost of running cloud VMs or AWS EC2 instances from instance count, hourly rate and hours per month.
Mbps to Gbps Converter

Convert Mbps to Gbps instantly. Enter your internet speed in megabits per second and get the equivalent in gigabits per second (Gbps = Mbps / 1000).
Server Power Consumption Cost Calculator

Calculate the monthly and yearly electricity cost of running servers 24/7. Enter watts, price per kWh and server count to estimate power expenses.