0 / 0

Optimizing your knowledge base for retrieval-augmented generation

Last updated: Jul 07, 2025
Optimizing your knowledge base for retrieval-augmented generation

You can adapt the content in your knowledge base to be more accessible to generative AI models in a retrieval-augmented generation (RAG) pattern. By adapting your content for generative AI, you can improve the quality of AI responses that are generated from your content. Depending on your content and your RAG solution, you might compensate for tooling limitations or eliminate the need for some types of processing.

You can adapt your knowledge base content by testing your content in your RAG solution and developing guidelines.

You can create proactive guidelines to prepare your content for AI and reactive guidelines to repair your content in response to inadequate answers from AI. The following table summarizes the differences in creating and implementing guidelines to prepare or repair your content.

Comparison of adapting content for AI methods
Purpose of guidelines Method of creating guidelines Scope of content updates Timing of content updates
Preparing your content for AI Test your content in your RAG solution. • All existing content or critical content
• All new content
• Before you put your RAG solution into production
• During new content authoring
Repair your content for AI Gather user feedback on your RAG solution • Specific topics or passages
• Similar topics or passages
After you receive negative user feedback on AI answers
Important: Foundation models are constantly improving. Guidelines that you create today might not be necessary in the future. Retest your guidelines after updates to your RAG solution.

Creating guidelines for preparing your content for AI

By creating and applying guidelines that prepare your content for AI, you can improve the quality of your content for both humans and AI.

To develop guidelines, test your content with generative AI. You obtain the most accurate results by testing with the RAG solution that consumes your knowledge base. If you test your content with a different system than your final RAG solution, your results might vary and you might need to retest and adjust your guidelines.

To create guidelines for preparing your content for AI:

  1. Collect representative questions and the topics that answer those questions. For the best results, collect questions that are asked by your users instead of guessing what your users might ask. Customer questions help you target the content that is most likely to be retrieved by your RAG solution. You don’t need to test every piece of content, every type of content, or every content format.
  2. Test whether the model can generate adequate answers to the questions based on your content. For the best results, test directly in your RAG solution. Alternatively, you can enter a question and the applicable text from your content into a prompt and check the generated answer.
  3. When AI answers are inadequate, try adapting your content until you obtain adequate answers. See Adaptation techniques.
  4. Create guidelines based on trends. See Example guidelines.

Adaptation techniques for preparing your content for AI

When you find that the RAG solution generates an inadequate answer to a question that is answered in your knowledge base, try adapting that content. For example, you can try these techniques to improve answers:

  • Change the formatting of the content or rearrange the content. For example, you might try bulleted lists instead of long paragraphs or simplifying tables.
  • Clarify concepts or improve definitions of key terms.
  • Add context to clarify the subject or scope of content. For example, you might add section headings.
  • Add a summary of long content.
  • Replace ambiguous pronouns with specific nouns. For example, make sure that each sentence with a pronoun contains the noun that it references.
Tip: Be cautious about planning to reformat of your content. For example, if your LLM does not handle tables well, do not plan to remove tables. Tables are very useful to human readers and by the time that you replace your tables, your RAG solution might be able to handle them. Make extensive changes only when they improve the overall quality of your content.

Example guidelines for preparing content for AI

The following example guidelines for preparing content for AI might be applicable to your content:

Explain conceptual graphics in text
By clearly explaining conceptual graphics in text, you can clarify ambiguities in the graphics and avoid the expense of an image to text model. Use graphics to illustrate text, but not to replace text. Graphics can overly simplify concepts because they omit information or do not clearly designate which items are optional. By explaining a process or concept in text as if you don’t have the graphic, you can prevent confusion for your readers and for the LLM.
Include the names of icons in text
By including the names of icons and other UI elements in the text, instead of displaying only their images, you provide complete sentences for AI. For example, the sentence "To edit an asset, click the image of the edit icon" is not complete without the image of the icon. However, the sentence "To edit an asset, click the Edit icon" is comprehensible without the image of the icon.
Summarize long procedures and tutorials
If you have a long procedure or tutorial, the LLM might not be able to fit the entire content in the answer. Adding a summary of the steps helps the LLM answer questions. The summary also sets expectations for users.
Add clear lead-in sentences for lists
LLMs can have trouble identifying the subject of a list without a lead-in sentence.
Eliminate very short topics
Very short topics might not provide enough information to for an LLM to generate an adequate answer to a question. For example, a very short parent topic can serve to organize child topics in the table of contents and contain very little valuable content. Very short topics can result in inadequate answers from AI and disappoint your users who land on them. You can either remove very short topics or add valuable content to them.

Creating guidelines for repairing your content for AI

The best way determine how to fix content that results in inadequate answers from AI implementing a feedback mechanism. Your human users can indicate when an answer is bad. Store the feedback, the question, the answer, and the topics that were retrieved.

To create guidelines:

  1. Collect negative feedback on AI answers from your users.
  2. Determine the cause of the inadequate answer.
  3. If appropriate, update the target topic until it produces better answers from AI. Test your changes with several variations of the original question. See Adaptation tequniques.
  4. When you see trends across topics that you update, create a guideline. See Example guidelines.

Adaptation techniques for repairing your content for AI

To repair your content, you can change it or add to it.

The following table describes some techniques for repairing content that results in inadequate answers from AI.

Solutions to inadequate answers
Problem Solution
The content doesn't exist. If appropriate, add content to document the subject.
Don't add content that doesn't belong in your knowledge base.
The content exists but the LLM didn't find it. Try updating the topic titles, section headings, and terminology in the topic.
The LLM didn't understand the content. Update the information in the topic to clarify the content.
Update the formatting of the information.
The LLM provides a partial answer. Try reformatting the information or providing a summary of long content.

When a user provides negative feedback on an AI answer, you can’t always address it by altering your content. You might find other causes for inadequate answers that are not easy or possible to solve. The questions that users ask might not be clear, complete, or well-formed enough for the LLM to understand. For example, user questions can have these types of problems:

  • Misspelled words
  • Vague questions without enough information
  • Incorrect grammar
  • Incorrect terminology
  • Subjects that are irrelevant to your knowledge base
Tip: Be cautious about changing your content. Don't try to address every inadequate answer or negative feedback. Create guidelines based on trends.

Example guidelines for repairing content for AI

The following example guidelines for repairing content for AI might be applicable to your content:

Clarify confusing content
You can clarify content that is vague, has too much detail, or is lacking context.
Add content for missing information
You can add content to close a gap in your docs or to mention alternatives to missing functionality. For example, suppose customers often ask a plant nursery about buying seeds for vegetables, which the nursery does not sell. The nursery chatbot answers questions about seeds with "I don't know" or "No". The nursery staff can add a sentence like this one to its knowledge base: "We don't sell seeds, but we do have a large selection of vegetable seedlings." The LLM can then provide a useful answer.
Add or change terminology
Your users might use different terms than you use in your docs. If you see a trend, you can mention the alternative term so that the LLM can find it. For example, say something like “an incorrect response from an LLM is sometimes known as a hallucination”.

Parent topic: Retrieval-augmented generation