editorially independent. We may make money when you click on links
to our partners.
Learn More
Google has expanded its Gemma AI family with the launch of Gemma 3 270M, a lightweight 270-million-parameter model designed for developers who need fast, task-specific fine-tuning without heavy hardware demands.
The announcement follows a string of recent releases in the Gemma 3 line, including Gemma 3 QAT and the mobile-first Gemma 3n. According to Google, downloads across the “Gemmaverse” have now surpassed 200 million.
The most power-efficient Gemma model to date
Gemma 3 270M has a distinctive architecture: 170 million parameters are dedicated to a massive 256,000-token vocabulary, while 100 million parameters power its transformer blocks.
Google says the model’s design makes it “a strong base model to be further fine-tuned in specific domains and languages.” Internal tests showed that an INT4-quantized version consumed just 0.75% of a Pixel 9 Pro’s battery for 25 conversations, making it the company’s most power-efficient Gemma model to date.
Available in both pre-trained and instruction-tuned variants, Gemma 3 270M follows structured instructions effectively right out of the box. It is intended for targeted tasks, such as text classification, entity extraction, query routing, structured text generation, and compliance checks, rather than lengthy, open-ended chats.
Google frames Gemma 3 270M as an example of choosing efficiency over brute force.
“You wouldn’t use a sledgehammer to hang a picture frame,” Google wrote in its Gemma 3 270M introductory post, explaining that smaller, specialized models can outperform larger general-purpose systems when tuned for well-defined tasks.
This strategy is already proving effective. Google highlighted the work of Adaptive ML with SK Telecom, where a fine-tuned Gemma 3 4B model for multilingual content moderation outperformed much larger proprietary systems. The 270M version aims to enable similar results on an even smaller scale, opening the door for fleets of specialized models.
Performance and benchmarks
On the IFEval benchmark, which tests instruction-following ability, the instruction-tuned Gemma 3 270M scored 51.2%, higher than other small models like SmolLM2 135M Instruct and Qwen 2.5 0.5B Instruct, and close to the range of some billion-parameter systems.
Because it can run entirely on-device, Gemma 3 270M allows developers to build AI tools that process sensitive data without sending information to the cloud. Quantization-Aware Training (QAT) checkpoints make INT4 deployment possible with minimal performance loss, enabling smooth operation on resource-constrained hardware.
Beyond enterprise use, developers are experimenting with creative applications. One demo, a Bedtime Story Generator built with Transformers.js, runs entirely in a web browser. It allows users to set a main character, setting, plot twist, theme, and story length, and then generates a coherent, imaginative tale, all without cloud processing.
Google has made the model available through Hugging Face, Ollama, Kaggle, LM Studio, and Docker, with trial options on Vertex AI and compatibility with tools like llama.cpp, Gemma.cpp, LiteRT, Keras, and MLX.
For more on how Gemma 3 fits into Google’s broader AI strategy, including details on the larger 4B model and its real-world performance, check out our in-depth coverage of Google’s Gemma 3 AI model.