Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Josué Estrada Joins Center for AI Safety as Chief Operating Officer

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Tailoring foundation models for your business needs: A comprehensive guide to RAG, fine-tuning, and hybrid approaches
Amazon AWS AI

Tailoring foundation models for your business needs: A comprehensive guide to RAG, fine-tuning, and hybrid approaches

Advanced AI BotBy Advanced AI BotMay 29, 2025No Comments11 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Foundation models (FMs) have revolutionised AI capabilities, but adopting them for specific business needs can be challenging. Organizations often struggle with balancing model performance, cost-efficiency, and the need for domain-specific knowledge. This blog post explores three powerful techniques for tailoring FMs to your unique requirements: Retrieval Augmented Generation (RAG), fine-tuning, and a hybrid approach combining both methods. We dive into the advantages, limitations, and ideal use cases for each strategy.

AWS provides a suite of services and features to simplify the implementation of these techniques. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Knowledge Bases provides native support for RAG, streamlining the process of enhancing model outputs with domain-specific information. Amazon Bedrock also offers native features for model customizations through continued pre-training and fine-tuning. In addition, you can use Amazon Bedrock Custom Model Import to bring and use your customized models alongside existing FMs through a single serverless, unified API. Use Amazon Bedrock Model Distillation to use smaller, faster, more cost-effective models that deliver use-case specific accuracy that is comparable to the most advanced models in Amazon Bedrock.

For this post, we have used Amazon SageMaker AI for the fine-tuning and hybrid approach to maintain more control over the fine-tuning script and try different fine-tuning methods. In addition, we have used Amazon Bedrock Knowledge Bases for the RAG approach as shown in Figure 1.

To help you make informed decisions, we provide ready-to-use code in our Github repo, using these AWS services to experiment with RAG, fine-tuning, and hybrid approaches. You can evaluate their performance based on your specific use case and your dataset, and use the model that best fits to effectively customize FMs for your business needs.

Figure 1: Architecture diagram for RAG, fine-tuning and hybrid approaches

Retrieval Augmented Generation

RAG is a cost-effective way to enhance AI capabilities by connecting existing models to external knowledge sources. For example, an AI powered customer service chatbot using RAG can answer questions about current product features by first checking the product documentation knowledge base. If a customer asks a question, the system retrieves the specific details from the product knowledge base before composing its response, helping to make sure that the information is accurate and up-to-date.

A RAG approach gives AI models access to external knowledge sources for better responses and has two main steps: retrieval for finding the relevant information from connected data sources and generation using an FM to generate an answer based on the retrieved information.

Fine-tuning

Fine-tuning is a powerful way to customize FMs for specific tasks or domains using additional training data. In fine-tuning, you adjust the model’s parameters using a smaller, labelled dataset relevant to the target domain.

For example, to build an AI powered customer service chatbot, you can fine-tune an existing FM using your own dataset to handle questions about a company’s product features. By training the model on historical customer interactions and product specifications, the fine-tuned model learns the context and the company messaging tone to provide more accurate responses.

If the company launches a new product, the model should be fine-tuned again with new data to update its knowledge and maintain relevance. Fine-tuning helps make sure that the model can deliver precise, context-aware responses. However, it requires more computational resources and time compared to RAG, because the model itself needs to be retrained with the new data.

Hybrid approach

The hybrid approach combines the strengths of RAG and fine-tuning to deliver highly accurate, context-aware responses. Let’s consider an example, a company frequently updates the features of its products. They want to customize their FM using internal data, but keeping the model updated with changes in the product catalog is challenging. Because product features change monthly, keeping the model up to date would be costly and time-consuming.

By adopting a hybrid approach, the company can reduce costs and improve efficiency. They can fine-tune the model every couple of months to keep it aligned with the company’s overall tone. Meanwhile, RAG can retrieve the latest product information from the company’s knowledge base, helping to make sure that responses are up-to-date. Fine-tuning the model also enhances RAG’s performance during the generation phase, leading to more coherent and contextually relevant responses. If you want to further improve the retrieval phase, you can customize the embedding model, use a different search algorithm, or explore other retrieval optimization techniques.

The following sections provide the background for dataset creation and implementation of the three different approaches

Prerequisites

To deploy the solution, you need:

Dataset description

For the proof-of-concept, we created two synthetic datasets using Anthropic’s Claude 3 Sonnet on Amazon Bedrock.

Product catalog dataset

This dataset is your primary knowledge source in Amazon Bedrock. We created a product catalog which consists of 15 fictitious manufacturing products by prompting Anthropic’s Claude 3 Sonnet using example product catalogs. You should create your dataset in .txt format. The format in the example for this post has the following fields:

Product names
Product descriptions
Safety instructions
Configuration manuals
Operation instructions

Train and test the dataset

We use the same product catalog we created for the RAG approach as training data to run domain adaptation fine-tuning.

The test dataset consists of question-and-answer pairs about the product catalog dataset created earlier. We used this code in the Question-Answer Dataset Jupyter notebook section to generate the test dataset.

Implementation

We implemented three different approaches: RAG, fine-tuning, and hybrid. See the Readme file for instructions to deploy the whole solution.

RAG

The RAG approach uses Amazon Bedrock Knowledge Bases and consists of two main parts.

To set up the infrastructure:

Update the config file with your required data (details in the Readme)
Run the following commands in the infrastructure folder:

cd infrastructure
./prepare.sh
cdk bootstrap aws://<>/<>
cdk synth
cdk deploy –all

Context retrieval and response generation:

The system finds relevant information by searching the knowledge base with the user’s question
It then sends both the user’s question and the retrieved information to Meta LLama 3.1 8b LLM on Amazon Bedrock
The LLM will then generate a response based on the user’s question and retrieved information

Fine-tuning

We used Amazon SageMaker AI JumpStart to fine-tune the Meta Llama 3.1 8b Instruct model using domain adaptation method for 5 epochs. You can adjust the following parameters in the config.py file:

Fine-tuning method: You can change the fine-tuning method in the config file, the default is domain_adaptation.
Number of epochs: Adjust number of epochs in the config file according to your data size.
Fine-tuning template: Change the template based on your use-case. The current one prompts the LLM to answer a customer question.

Hybrid

The hybrid approach combines RAG and fine-tuning, and uses the following high-level steps:

Retrieve the most relevant context based on the user’s question from the Knowledge Base
The fine-tuned model generates answers using the retrieved context

You can customize the prompt template in the config.py file.

Evaluation

For this example, we use three evaluation metrics to measure performance. You can modify src/evaluation.py to implement your own metrics for your evaluation implementation.

Each metric helps you understand different aspects of how well each of the approaches works:

BERTScore: BERTScore tells you how similar the generated answers are to the correct answers using cosine similarities. It calculates precision, recall, and F1 measure. We used the F1 measure as the evaluation score.
LLM evaluator score: We use different language models from Amazon Bedrock to score the responses from RAG, fine-tuning, and Hybrid approaches. Each evaluation receives both the correct answers and the generated answers and gives a score between 0 and 1 (closer to 1 indicates higher similarity) for each generated answer. We then calculate the final score by averaging all the evaluation scores. The process is shown in the following figure.

Figure 2: LLM evaluator method

Inference latency: Response times are important in applications like chatbots, so depending on your use case, this metric might be important in your decision. For each approach, we averaged the time it took to receive a full response for each sample.
Cost analysis: To do a full cost analysis, we made the following assumptions:

We used one OpenSearch compute unit (OCU) for indexing and another for the search related to document indexing in RAG. See OpenSearch Serverless pricing for more details.
We assume an application that has 1,000 users, each of them conducting 10 requests per day with an average of 2,000 input tokens and 1,000 output tokens. See Amazon Bedrock pricing for more details.
We used ml.g5.12xlarge instance for fine-tuning and hosting the fine-tuned model. The fine-tuning job took 15 minutes to complete. See SageMaker AI pricing for more details.
For fine-tuning and the hybrid approach, we assume that the model instance is up 24/7, which might vary according to your use case.
The cost calculation is done for one month.

Based on those assumptions, the cost associated with each of the three approaches is calculated as follows:

For RAG: 

OpenSearch Serverless monthly costs = Cost of 1 OCU per hour * 2 OCUs * 24 hours * 30 days
Total invocations for Meta Llama 3.1 8b = 1000 user * 10 requests * (price per input token * 2,000 + price per output token * 1,000) * 30 days

For fine-tuning:

(Number of minutes used for the fine-tuning job / 60) * Hourly cost of an ml.g5.12xlarge instance
Hourly cost of an ml.g5.12xlarge instance hosting * 24 hours * 30 days

For hybrid:

OpenSearch Serverless monthly costs = Cost of 1 OCU per hour * 2 OCUs * 24 hours * 30 days
(Number of minutes used for the finetuning job / 60) * cost of an ml.g5.12xlarge instance
Hourly cost of ml.g5.12xlarge instance hosting * 24 hours * 30 days

Results

You can find detailed evaluation results in two places in the code repository. The individual scores for each sample are in the JSON files under data/output, and a summary of the results is in summary_results.csv in the same folder.

The results shown in the following table show:

How each approach (RAG, fine-tuning, and hybrid) performs
Their scores from both BERTScore and LLM evaluators
The cost analysis for each method calculated for the US East region

Approach
Average BERTScore
Average LLM evaluator score
Average inference time (in seconds)
Cost per month (US East region)

RAG
0.8999
0.8200
8.336
~=350 + 198 ~= 548$

Finetuning
0.8660
0.5556
4.159
~= 1.77 + 5105 ~= 5107$

Hybrid
0.8908
0.8556
17.700
~= 350 + 1.77 + 5105 ~= 5457$

Note that the costs for both the fine-tuning and hybrid approach can decrease significantly depending on the traffic pattern if you set the real-time inference endpoint from SageMaker to scaledown to zero instances when not in use. 

Clean up

Follow the cleanup section in the Readme file in order to avoid paying for unused resources.

Conclusion

In this post, we showed you how to implement and evaluate three powerful techniques for tailoring FMs to your business needs: RAG, fine-tuning, and a hybrid approach combining both methods. We provided ready-to-use code to help you experiment with these approaches and make informed decisions based on your specific use case and dataset.

The results in this example were specific to the dataset that we used. For that dataset, RAG outperformed fine-tuning and achieved comparable results to the hybrid approach with a lower cost, but fine-tuning led to the lowest latency. Your results will vary depending on your dataset.

We encourage you to test these approaches using our code as a starting point:

Add your own datasets in the data folder
Fill out the config.py file
Follow the rest of the readme instructions to run the full evaluation

About the Authors

Idil Yuksel is a Working Student Solutions Architect at AWS, pursuing her MSc. in Informatics with a focus on machine learning at the Technical University of Munich. She is passionate about exploring application areas of machine learning and natural language processing. Outside of work and studies, she enjoys spending time in nature and practicing yoga.

Karim Akhnoukh is a Senior Solutions Architect at AWS working with customers in the financial services and insurance industries in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleIt’s too expensive to fight every AI copyright battle, Getty CEO says
Next Article Paper page – Minute-Long Videos with Dual Parallelisms
Advanced AI Bot
  • Website

Related Posts

Bridging the gap between development and production: Seamless model lifecycle management with Amazon Bedrock

May 31, 2025

Using Amazon OpenSearch ML connector APIs

May 31, 2025

Architect a mature generative AI foundation on AWS

May 31, 2025
Leave A Reply Cancel Reply

Latest Posts

Bodytraffic At Avalon Hollywood June 5

National Parks Battle For Bragging Rights

Paley Museum In NY Celebrates Six-Season Run Of ‘The Handmaid’s Tale’

Tessa Hulls On The Weight Of History, The Power Of Comics, And Winning A Pulitzer Prize

Latest Posts

Josué Estrada Joins Center for AI Safety as Chief Operating Officer

June 1, 2025

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

June 1, 2025

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

June 1, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.