Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

AI Chatbots Are Feeding You More False Information Than Ever

Billionaire Tom Siebel’s C3 AI Was Supposed To Speed Up Policing. It’s Not Going Well.

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Amazon AWS AI

Build character consistent storyboards using Amazon Nova in Amazon Bedrock – Part 2

By Advanced AI EditorSeptember 4, 2025No Comments13 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Although careful prompt crafting can yield good results, achieving professional-grade visual consistency often requires adapting the underlying model itself. Building on the prompt engineering and character development approach covered in Part 1 of this two-part series, we now push the consistency level for specific characters by fine-tuning an Amazon Nova Canvas foundation model (FM). Through fine-tuning techniques, creators can instruct the model to maintain precise control over character appearances, expressions, and stylistic elements across multiple scenes.

In this post, we take an animated short film, Picchu, produced by FuzzyPixel from Amazon Web Services (AWS), prepare training data by extracting key character frames, and fine-tune a character-consistent model for the main character Mayu and her mother, so we can quickly generate storyboard concepts for new sequels like the following images.

Solution overview

To implement an automated workflow, we propose the following comprehensive solution architecture that uses AWS services for an end-to-end implementation.

The workflow consists of the following steps:

The user uploads a video asset to an Amazon Simple Storage Service (Amazon S3) bucket.
Amazon Elastic Container Service (Amazon ECS) is triggered to process the video asset.
Amazon ECS downsamples the frames, selects those containing the character, and then center-crops them to produce the final character images.
Amazon ECS invokes an Amazon Nova model (Amazon Nova Pro) from Amazon Bedrock to create captions from the images.
Amazon ECS writes the image captions and metadata to the S3 bucket.
The user uses a notebook environment in Amazon SageMaker AI to invoke the model training job.
The user fine-tunes a custom Amazon Nova Canvas model by invoking Amazon Bedrock create_model_customization_job and create_model_provisioned_throughput API calls to create a custom model available for inference.

This workflow is structured in two distinct phases. The initial phase, in Steps 1–5, focuses on preparing the training data. In this post, we walk through an automated pipeline to extract images from an input video and then generate labeled training data. The second phase, in Steps 6–7, focuses on fine-tuning the Amazon Nova Canvas model and performing test inference using the custom-trained model. For these latter steps, we provide the preprocessed image data and comprehensive example code in the following GitHub repository to guide you through the process.

Prepare the training data

Let’s begin with the first phase of our workflow. In our example, we build an automated video object/character extraction pipeline to extract high-resolution images with accurate caption labels using the following steps.

Creative character extraction

We recommend first sampling video frames at fixed intervals (for example, 1 frame per second). Then, apply Amazon Rekognition label detection and face collection search to identify frames and characters of interest. Label detection can identify over 2,000 unique labels and locate their positions within frames, making it ideal for initial detection of general character categories or non-human characters. To distinguish between different characters, we then use the Amazon Rekognition feature to search faces in a collection. This feature identifies and tracks characters by matching their faces against a pre-populated face collection. If these two approaches aren’t precise enough, we can use Amazon Rekognition Custom Labels to train a custom model for detecting specific characters. The following diagram illustrates this workflow.

After detection, we center-crop each character with appropriate pixel padding and then run a deduplication algorithm using the Amazon Titan Multimodal Embeddings model to remove semantically similar images above a threshold value. Doing so helps us build a diverse dataset because redundant or nearly identical frames could lead to model overfitting (when a model learns the training data too precisely, including its noise and fluctuations, making it perform poorly on new, unseen data). We can calibrate the similarity threshold to fine-tune what we consider to be identical images, so we can better control the balance between dataset diversity and redundancy elimination.

Data labeling

We generate captions for each image using Amazon Nova Pro in Amazon Bedrock and then upload the image and label manifest file to an Amazon S3 location. This process focuses on two critical aspects of prompt engineering: character description to help the FM identify and name the characters based on their unique attributes, and varied description generation that avoids repetitive patterns in the caption (for example, “an animated character”). The following is an example prompt template used during our data labeling process:

system_prompt = “””
You are an expert image description specialist who creates concise, natural alt
text that makes visual content accessible while maintaining clarity and focus.
Your task is to analyze the provided image and provide a creative description
(20-30 words) that emphasizes the Three main characters, capturing the essential
elements of their interaction while avoiding unnecessary details.
“””

prompt = “””

1. Identify the main characters in the image: Character 1, Character 2, and
Character 3 at least one will be in the picture so provide at a minimum a
description with at least one character name.
– “Character 1” describe the first character, key traits, background, attributes.
– “Character 2” describe the second character, key traits, background, attributes.
– “Character 3” describe the third character, key traits, background, attributes.
2. Just state their name WITHOUT adding any standard characteristics.
3. Only capture visual element outside the standard characteristics
4. Capture the core interaction between them
5. Include only contextual details that are crucial for understanding the scene
6. Create a natural, flowing description using everyday language

Here are some examples

…

[Identify the main characters]
[Assessment of their primary interaction]
[Selection of crucial contextual elements]
[Crafting of concise, natural description]

{
“alt_text”: “[Concise, natural description focusing on the main characters]”
}

Note: Provide only the JSON object as the final response.

The data labeling output is formatted as a JSONL file, where each line pairs an image reference Amazon S3 path with a caption generated by Amazon Nova Pro. This JSONL file is then uploaded to Amazon S3 for training. The following is an example of the file:

{“image_ref”: “s3://media-ip-dataset/characters/blue_character_01.jpg”, “alt_text”: “This
animated character features a round face with large expressive eyes. The character
has a distinctive blue color scheme with a small tuft of hair on top. The design is
stylized with clean lines and a minimalist approach typical of modern animation.”}
{“image_ref”: “s3://media-ip-dataset/props/iconic_prop_series1.jpg”, “alt_text”: “This
object appears to be an iconic prop from the franchise. It has a metallic appearance
with distinctive engravings and a unique shape that fans would immediately recognize.
The lighting highlights its dimensional qualities and fine details that make it
instantly identifiable.”}

Human verification

For enterprise use cases, we recommend incorporating a human-in-the-loop process to verify labeled data before proceeding with model training. This verification can be implemented using Amazon Augmented AI (Amazon A2I), a service that helps annotators verify both image and caption quality. For more details, refer to Get Started with Amazon Augmented AI.

Fine-tune Amazon Nova Canvas

Now that we have the training data, we can fine-tune the Amazon Nova Canvas model in Amazon Bedrock. Amazon Bedrock requires an AWS Identity and Access Management (IAM) service role to access the S3 bucket where you stored your model customization training data. For more details, see Model customization access and security. You can perform the fine-tuning task directly on the Amazon Bedrock console or use the Boto3 API. We explain both approaches in this post, and you can find the end-to-end code sample in picchu-finetuning.ipynb.

Create a fine-tuning job on the Amazon Bedrock console

Let’s start by creating an Amazon Nova Canvas fine-tuning job on the Amazon Bedrock console:

On the Amazon Bedrock console, in the navigation pane, choose Custom models under Foundation models.
Choose Customize model and then Create Fine-tuning job.

On the Create Fine-tuning job details page, choose the model you want to customize and enter a name for the fine-tuned model.
In the Job configuration section, enter a name for the job and optionally add tags to associate with it.
In the Input data section, enter the Amazon S3 location of the training dataset file.
In the Hyperparameters section, enter values for hyperparameters, as shown in the following screenshot.

In the Output data section, enter the Amazon S3 location where Amazon Bedrock should save the output of the job.
Choose Fine-tune model job to begin the fine-tuning process.

This hyperparameter combination yielded good results during our experimentation. In general, increasing the learning rate makes the model train more aggressively, which often presents an interesting trade-off: we might achieve character consistency more quickly, but it might impact overall image quality. We recommend a systematic approach to adjusting hyperparameters. Start with the suggested batch size and learning rate, and try increasing or decreasing the number of training steps first. If the model struggles to learn your dataset even after 20,000 steps (the maximum allowed in Amazon Bedrock), then we suggest either increasing the batch size or adjusting the learning rate upward. These adjustments, through subtle, can make a significant difference in our model’s performance. For more details about the hyperparameters, refer to Hyperparameters for Creative Content Generation models.

Create a fine-tuning job using the Python SDK

The following Python code snippet creates the same fine-tuning job using the create_model_customization_job API:

bedrock = boto3.client(‘bedrock’)
jobName = “picchu-canvas-v0”
# Set parameters
hyperParameters = {
        “stepCount”: “14000”,
        “batchSize”: “64”,
        “learningRate”: “0.000001”,
    }

# Create job
response_ft = bedrock.create_model_customization_job(
    jobName=jobName,
    customModelName=jobName,
    roleArn=roleArn,
    baseModelIdentifier=”amazon.nova-canvas-v1:0″,
    hyperParameters=hyperParameters,
    trainingDataConfig={“s3Uri”: training_path},
    outputDataConfig={“s3Uri”: f”s3://{bucket}/{prefix}”}
)

jobArn = response_ft.get(‘jobArn’)
print(jobArn)

When the job is complete, you can retrieve the new customModelARN using the following code:

custom_model_arn = bedrock.list_model_customization_jobs(
    nameContains=jobName
)[“modelCustomizationJobSummaries”][0][“customModelArn”]

Deploy the fine-tuned model

With the preceding hyperparameter configuration, this fine-tuning job might take up to 12 hours to complete. When it’s complete, you should see a new model in the custom models list. You can then create provisioned throughput to host the model. For more details on provisioned throughput and different commitment plans, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock.

Deploy the model on the Amazon Bedrock console

To deploy the model from the Amazon Bedrock console, complete the following steps:

On the Amazon Bedrock console, choose Custom models under Foundation models in the navigation pane.
Select the new custom model and choose Purchase provisioned throughput.

In the Provisioned Throughput details section, enter a name for the provisioned throughput.
Under Select model, choose the custom model you just created.
Then specify the commitment term and model units.

After you purchase provisioned throughput, a new model Amazon Resource Name (ARN) is created. You can invoke this ARN when the provisioned throughput is in service.

Deploy the model using the Python SDK

The following Python code snippet creates provisioned throughput using the create_provisioned_model_throughput API:

custom_model_name = “picchu-canvas-v0”

# Create the provision throughput job and retrieve the provisioned model id
provisioned_model_id = bedrock.create_provisioned_model_throughput(
    modelUnits=1,
    # create a name for your provisioned throughput model
    provisionedModelName=custom_model_name,
    modelId=custom_model_arn
)[‘provisionedModelArn’]

Test the fine-tuned model

When the provisioned throughput is live, we can use the following code snippet to test the custom model and experiment with generating some new images for a sequel to Picchu:

import json
import io
from PIL import Image
import base64

def decode_base64_image(img_b64):
return Image.open(io.BytesIO(base64.b64decode(img_b64)))

def generate_image(prompt,
negative_prompt=”text, ugly, blurry, distorted, low
quality, pixelated, watermark, text, deformed”,
num_of_images=3,
seed=1):
“””
Generate an image using Amazon Nova Canvas.
“””

image_gen_config = {
“numberOfImages”: num_of_images,
“quality”: “premium”,
“width”: 1024, # Maximum resolution 2048 x 2048
“height”: 1024, # 1:1 ratio
“cfgScale”: 8.0,
“seed”: seed,
}

# Prepare the request body
request_body = {
“taskType”: “TEXT_IMAGE”,
“textToImageParams”: {
“text”: prompt,
“negativeText”: negative_prompt, # List things to avoid
},
“imageGenerationConfig”: image_gen_config
}

response = bedrock_runtime.invoke_model(
modelId=provisioned_model_id,
body=json.dumps(request_body)
)

# Parse the response
response_body = json.loads(response[‘body’].read())

if “images” in response_body:
# Extract the image
return [decode_base64_image(img) for img in response_body[‘images’]]
else:
return
seed = random.randint(1, 858993459)
print(f”seed: {seed}”)

images = generate_image(prompt=prompt, seed=seed)



Mayu face shows a mix of nervousness and determination. Mommy kneels beside her, gently holder her. A landscape is visible in the background.
A steep cliff face with a long wooden ladder extending downwards. Halfway down the ladder is Mayu with a determined expression on her face. Mayu’s small hands grip the sides of the ladder tightly as she carefully places her feet on each rung. The surrounding environment shows a rugged, mountainous landscape.
Mayu standing proudly at the entrance of a simple school building. Her face beams with a wide smile, expressing pride and accomplishment.

Clean up

To avoid incurring AWS charges after you are done testing, complete the cleanup steps in picchu-finetuning.ipynb and delete the following resources:

Amazon SageMaker Studio domain
Fine-tuned Amazon Nova model and provision throughput endpoint

Conclusion

In this post, we demonstrated how to elevate character and style consistency in storyboarding from Part 1 by fine-tuning Amazon Nova Canvas in Amazon Bedrock. Our comprehensive workflow combines automated video processing, intelligent character extraction using Amazon Rekognition, and precise model customization using Amazon Bedrock to create a solution that maintains visual fidelity and dramatically accelerates the storyboarding process. By fine-tuning the Amazon Nova Canvas model on specific characters and styles, we’ve achieved a level of consistency that surpasses standard prompt engineering, so creative teams can produce high-quality storyboards in hours rather than weeks. Start experimenting with Nova Canvas fine-tuning today, so you can also elevate your storytelling with better character and style consistency.

About the authors

Dr. Achin Jain is a Senior Applied Scientist at Amazon AGI, where he works on building multi-modal foundation models. He brings over 10+ years of combined industry and academic research experience. He has led the development of several modules for Amazon Nova Canvas and Amazon Titan Image Generator, including supervised fine-tuning (SFT), model customization, instant customization, and guidance with color palette.

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Randy Ridgley is a Principal Solutions Architect focused on real-time analytics and AI. With expertise in designing data lakes and pipelines. Randy helps organizations transform diverse data streams into actionable insights. He specializes in IoT solutions, analytics, and infrastructure-as-code implementations. As an open-source contributor and technical leader, Randy provides deep technical knowledge to deliver scalable data solutions across enterprise environments.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleMistral AI Adds Memory System and MCP Connector Directory to Its Assistant
Next Article C3.ai’s Stock Tumbles on Q1 Loss, Revenues & Margin Down Y/Y
Advanced AI Editor
  • Website

Related Posts

Build character consistent storyboards using Amazon Nova in Amazon Bedrock – Part 1

September 4, 2025

Enhancing LLM accuracy with Coveo Passage Retrieval on Amazon Bedrock

September 3, 2025

Unlocking the future of professional services: How Proofpoint uses Amazon Q Business

September 3, 2025

Comments are closed.

Latest Posts

Fan Conventions Are Drawing The Line On AI ‘Slop’

Sculptor Who Defined Minimalism Dies at 88

Amy Sherald’s Canceled Smithsonian Show Goes to Baltimore

Rabkin Foundation Names 2025 Arts Journalism Grant Winners

Latest Posts

AI Chatbots Are Feeding You More False Information Than Ever

September 5, 2025

Billionaire Tom Siebel’s C3 AI Was Supposed To Speed Up Policing. It’s Not Going Well.

September 5, 2025

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation – Takara TLDR

September 5, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • AI Chatbots Are Feeding You More False Information Than Ever
  • Billionaire Tom Siebel’s C3 AI Was Supposed To Speed Up Policing. It’s Not Going Well.
  • Few-step Flow for 3D Generation via Marginal-Data Transport Distillation – Takara TLDR
  • DeepSeek And Meituan Emerge As China’s Boldest AI Challengers – Meituan (OTC:MPNGY)
  • How EmbeddingGemma Delivers On-Device AI with Multilingual Capabilities

Recent Comments

  1. Daniel8Nalay on Study: AI-Powered Research Prowess Now Outstrips Human Experts, Raising Bioweapon Risks
  2. hyperslot88 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. Abobus4Nalay on Marc Raibert: Boston Dynamics and the Future of Robotics | Lex Fridman Podcast #412
  4. Gerardsam on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Richardsmeap on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.