6/3/2026
NVFP4 Inference on Blackwell SM120 GPUs: vLLM, FlashInfer & What Worked
Field notes from serving a large ModelOpt NVFP4 model on Blackwell SM120 GPUs with vLLM, FlashInfer, FP8 KV cache, speculative decoding, and production-shaped benchmarks...







