On August 27, Smart Things reported that today, Google launched the Gemini 2.5 Flash Image, the company’s most advanced image generation and editing model.
The core highlight of this model is its image editing capabilities. Google claims that this model can blend multiple images into a single image, maintaining high character consistency, and can perform targeted modifications using natural language, fully leveraging Gemini’s global knowledge.
Nobel Prize winner and CEO of Google DeepMind, Demis Hassabis, promoted the new model using his own photo, demonstrating the character consistency of Gemini 2.5 Flash Image. He modified the background of his photo to a classical style, while his appearance remained unchanged.
This capability has unlocked many interesting use cases, such as designing “star player cards” based on specific visual templates, allowing ordinary people to experience the treatment usually reserved for top athletes with just one click.
This model pairs perfectly with Google Veo 3 and other video generation models, creating rich video effects when used together. The overseas AI creative platform Kera AI has already used a similar model to produce a major advertisement.
This model actually appeared in the large model arena last week under the codename “nano-banana” and received over two million votes from users. Now officially revealed, Gemini 2.5 Flash Image has achieved global first placein both text-to-image and image editing scenarios, scoring an impressive 1362 on the image editing leaderboard, leading the second place by nearly 15%.
In Google’s published benchmark tests, Gemini 2.5 Flash Image outperformed GPT-4o image generation, Flux.1 Kontext (max), Qwen Image Edit, and other models in user preference, character, creativity, infographic, object, and environment generation, though it still lags behind GPT-4o image generation in stylization capabilities.
Gemini 2.5 Flash Image is primarily aimed at developers and is currently available through the Gemini API, Google AI Studio, and the enterprise-focused Vertex AI.
The pricing for this model is $30 per 1 million output tokens, with each image costing 1290 output tokens, approximately $0.039 per image (equivalent to 0.28 RMB).All other input and output modalities follow the Gemini 2.5 Flash pricing.
To make it easier to create AI applications using Gemini 2.5 Flash Image, Google has also made significant updates to the AI Studio’s “Build Mode”. Developers can use AI to create applications and quickly test the features of new models like Gemini 2.5 Flash Image.
When ready to deploy an application, developers can deploy directly from Google AI Studio or save the code to GitHub. Google has also showcased several cases on their blog:
Superb Character Consistency Helps Altman Time Travel with One Click
Maintaining the consistency of character and object appearance in multi-turn dialogue and editing is a significant challenge in image generation and editing. Google’s Gemini 2.5 Flash Image allows users to place the same character in different environments, showcasing a single product from multiple angles in a new environment, or generating consistent brand assets while retaining the subject.
In the example application below, users only need to upload a selfie to generate six portraits from the 1950s to the 2000s, each with the style of the respective era, with no significant deviation in the user’s appearance.
Smart Things also uploaded a photo of OpenAI co-founder and CEO Sam Altman, and Google’s new model allowed Altman to time travel back to the past with one click, achieving a super realistic image quality, accurately restoring the clothing styles of each era.
This consistency can also be applied in professional design scenarios. For example, users can provide the model with a specific texture and request a replacement. The model can complete the texture replacement without altering the shape and details.
Experience link:
https://aistudio.google.com/apps/bundled/past_forward?showPreview=true&showAssistant=true
Precise Image Editing with One Sentence, Customizable Light and Color
Gemini 2.5 Flash Image supports image transformation and editing using natural language. For example, the model can blur the background of an image, remove stains from a T-shirt, delete entire people from photos, change the pose of the photographed subject, and add color to black and white photos.
To showcase the practical applications of these features, Google built a photo editing template application in AI Studio. This photo editing application supports selecting and modifying specific areas or making broad adjustments and filter processing.
Smart Things uploaded a photo of Zuckerberg and asked the model to fine-tune it to make his teeth look whiter.
The final generated result is as follows, showing that Zuckerberg’s other facial features did not undergo significant changes after the modification.
Users can also customize the light, background, and more through preset prompts. In the image below, the lighting of the portrait has been adjusted to be warmer.
Experience link:
https://aistudio.google.com/apps/bundled/pixshop
Rich World Knowledge and Ability to Understand Hand-Drawn Illustrations
In the past, many image generation models could create beautiful visuals but lacked a deep semantic understanding of the real world. Google claims that Gemini 2.5 Flash Image possesses Gemini’s world knowledge. To demonstrate this, they created a template application that turns a simple canvas into an interactive educational mentor.
In the demonstration, Gemini 2.5 Flash Image can understand various hand-drawn images and answer a wide range of questions posed by users.
This world knowledge also enables the model to predict future changes in images and possesses a certain level of image reasoning ability. For example, when seeing a balloon flying next to a cactus, the model can generate an image of the balloon bursting based on the user’s command to “predict the next possible scene.”
Experience link:
https://aistudio.google.com/apps/bundled/codrawing?showAssistant=true&showPreview=true
Outstanding Multi-Image Fusion Capabilities for Precise Product Display
Gemini 2.5 Flash Image can understand and merge multiple input images, which holds significant practical value in scenarios like e-commerce. For instance, merchants can use AI to generate promotional photos of different products in the same scene or provide customers with images of furniture and other products placed in real settings.
Below is a case provided by Google, where users only need to drag the lamp from the left into the scene on the right, and after a short wait, they can see the placement effect. The model not only adds the lamp element to the scene but also turns on the light. However, the generation process is noticeably accelerated.
The multi-image fusion capability can also be used to generate creative images. For example, merging photos of a whale and a mountain creates a visually striking effect.
Experience link:
https://aistudio.google.com/apps/bundled/home_canvas?showPreview=true&showAssistant=true
Since the launch of Gemini 2.5 Flash Image, overseas users have already started experimenting with it. One user created a mooncake advertisement using it, claiming that the same prompts would require ten times the adjustments and fine-tuning in Midjourney to achieve similar results.
Another user shared a video they created using Gemini 2.5 Flash Image in conjunction with Veo 3. During this process, Gemini 2.5 Flash Image generated many different angles of shots, while Veo 3 turned them into a video. The final effect was stunning.
However, some users have complained about the strict censorship of this model, for example, it cannot generate images of people holding knives and axes.
Conclusion: Image Editing Evolves Again, May Become an Important Productivity Tool
In a sense, accurate image editing capabilities are one of the most critical abilities for image generation to enter real production scenarios. In e-commerce and other settings, this ability meets the demands of enterprise users for precise control; while in entertainment scenarios, it can provide users with rich experiences and gameplay.
Currently, several domestic and international large model manufacturers have launched image editing models, and the latest developments in this field are worth continuous attention.返回搜狐,查看更多