You can now predict LLM inference efficiency on GPUs you've never tested by combining public model specs with GPU specifications—no profiling needed, and it works 4x better than physics-based estimates.
WattGPU predicts GPU power consumption and inference latency for large language models without requiring hardware profiling. Using only public LLM metadata and GPU specs, it generalizes to unseen hardware combinations, achieving 3-4x better accuracy than traditional baselines and helping operators choose efficient GPU-LLM pairings.