
6/3/2026
NVFP4 Inference on Blackwell SM120 GPUs: vLLM, FlashInfer & What Worked
Field notes from serving a large ModelOpt NVFP4 model on Blackwell SM120 GPUs with vLLM, FlashInfer, FP8 KV cache, speculative decoding, and production-shaped benchmarks — including the target/drafter boundary that made the deployment stable and why the early peak did not hold under reproduction.
Read article







