NVIDIA logo

NVIDIA

Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

Input / 1M tokens
$0.040
Output / 1M tokens
$0.160
Context window
131K tokens
Provider
NVIDIA
Knowledge cutoff
2025-03-31

Performance

Median streaming throughput and first-token latency measured by Artificial Analysis.

Output tokens / sec
Time to first token