IBM has launched Granite 4.0 – a new family of open weights language models ranging in size from 3B to 32B. Artificial Analysis was provided pre-release access, and our benchmarking shows Granite 4.0 H Small (32B/9B total/active parameters) scoring an Intelligence Index of 23, with a particular strength in token efficiency
Today IBM released four new models: Granite 4.0 H Small (32B/9B total/active parameters), Granite 4.0 H Tiny (7B/1B), Granite 4.0 H Micro (3B/3B) and Granite 4.0 Micro (3B/3B). We evaluated Granite 4.0 Small (in non-reasoning mode) and Granite 4.0 Micro using the Artificial Analysis Intelligence Index. Granite 4.0 model architecture combines a small amount of standard transformer-style attention layers with a majority of Mamba layers which claims to reduce memory requirements without impacting performance
Key benchmarking takeaways:
Granite 4.0 H Small Intelligence: In non-reasoning, Granite 4.0 H Small scores 23 on the Artificial Analysis Intelligence index – a jump of +8 points on the Index compared to IBM Granite 3.3 8B (Non Reasoning). Granite 4.0 H Small places ahead of Gemma 3 27B (22) but behind Mistral Small 3.2 (29), EXAONE 4.0 32B (Non-Reasoning, 30) and Qwen3 30B A3B 2507 (Non-Reasoning, 37) in intelligence
Granite 4.0 Micro Intelligence: On the Artificial Analysis Intelligence Index, Granite 4.0 Micro scores 16. It places ahead of Gemma 3 4B (15) and LFM 2 2.6B (12).
Token efficiency: Granite 4.0 H Small and Micro demonstrate impressive token efficiency – Granite 4.0 Small uses 5.2M, while Granite 4.0 Micro uses 6.7M tokens to run the Artificial Analysis Intelligence Index. Both models fewer tokens than Granite 3.3 8B (Non-Reasoning) and most other open weights non-reasoning models smaller than 40B total parameters (except Qwen3 0.6B which uses 1.9M output tokens)
Availability: All four models are available on Hugging Face. Granite 4.0 H Small is available on Replicate and is priced at $0.06/$0.25 per 1M input/output tokens
Context Window: 128K tokens
Licensing: The Granite 4.0 models are available under the Apache 2.0 license
Granite 4.0 H Small’s (Non Reasoning) output token efficiency and per token pricing offers a compelling tradeoff between intelligence and Cost to Run Artificial Analysis Intelligence Index
In the category of Open Weights Non-Reasoning models smaller than 40B total parameters, Granite 4.0 H Small is on the frontier tradeoff between intelligence and Output Tokens Used in Artificial Analysis Intelligence Index
In the category of Open Weights Non-Reasoning models smaller than 4B total parameters, Granite 4.0 Micro is on the frontier of tradeoff between intelligence and Output Tokens Used in Artificial Analysis Intelligence Index