Optimizing your knowledge base for retrieval-augmented generation
You can adapt the content in your knowledge base to be more accessible to generative AI models in a retrieval-augmented generation (RAG) pattern. By adapting your content for generative AI, you can improve the quality of AI responses that are generated from your content. Depending on your content and your RAG solution, you might compensate for tooling limitations or eliminate the need for some types of processing.
You can adapt your knowledge base content by testing your content in your RAG solution and developing guidelines.
You can create proactive guidelines to prepare your content for AI and reactive guidelines to repair your content in response to inadequate answers from AI. The following table summarizes the differences in creating and implementing guidelines to prepare or repair your content.
Purpose of guidelines | Method of creating guidelines | Scope of content updates | Timing of content updates |
---|---|---|---|
Preparing your content for AI | Test your content in your RAG solution. | • All existing content or critical content • All new content |
• Before you put your RAG solution into production • During new content authoring |
Repair your content for AI | Gather user feedback on your RAG solution | • Specific topics or passages • Similar topics or passages |
After you receive negative user feedback on AI answers |
Creating guidelines for preparing your content for AI
By creating and applying guidelines that prepare your content for AI, you can improve the quality of your content for both humans and AI.
To develop guidelines, test your content with generative AI. You obtain the most accurate results by testing with the RAG solution that consumes your knowledge base. If you test your content with a different system than your final RAG solution, your results might vary and you might need to retest and adjust your guidelines.
To create guidelines for preparing your content for AI:
- Collect representative questions and the topics that answer those questions. For the best results, collect questions that are asked by your users instead of guessing what your users might ask. Customer questions help you target the content that is most likely to be retrieved by your RAG solution. You don’t need to test every piece of content, every type of content, or every content format.
- Test whether the model can generate adequate answers to the questions based on your content. For the best results, test directly in your RAG solution. Alternatively, you can enter a question and the applicable text from your content into a prompt and check the generated answer.
- When AI answers are inadequate, try adapting your content until you obtain adequate answers. See Adaptation techniques.
- Create guidelines based on trends. See Example guidelines.
Adaptation techniques for preparing your content for AI
When you find that the RAG solution generates an inadequate answer to a question that is answered in your knowledge base, try adapting that content. For example, you can try these techniques to improve answers:
- Change the formatting of the content or rearrange the content. For example, you might try bulleted lists instead of long paragraphs or simplifying tables.
- Clarify concepts or improve definitions of key terms.
- Add context to clarify the subject or scope of content. For example, you might add section headings.
- Add a summary of long content.
- Replace ambiguous pronouns with specific nouns. For example, make sure that each sentence with a pronoun contains the noun that it references.
Example guidelines for preparing content for AI
The following example guidelines for preparing content for AI might be applicable to your content:
- Explain conceptual graphics in text
- By clearly explaining conceptual graphics in text, you can clarify ambiguities in the graphics and avoid the expense of an image to text model. Use graphics to illustrate text, but not to replace text. Graphics can overly simplify concepts because they omit information or do not clearly designate which items are optional. By explaining a process or concept in text as if you don’t have the graphic, you can prevent confusion for your readers and for the LLM.
- Include the names of icons in text
- By including the names of icons and other UI elements in the text, instead of displaying only their images, you provide complete sentences for AI. For example, the sentence "To edit an asset, click
" is not complete without the image of the icon. However, the sentence "To edit an asset, click the Edit icon" is comprehensible without the image of the icon.
- Summarize long procedures and tutorials
- If you have a long procedure or tutorial, the LLM might not be able to fit the entire content in the answer. Adding a summary of the steps helps the LLM answer questions. The summary also sets expectations for users.
- Add clear lead-in sentences for lists
- LLMs can have trouble identifying the subject of a list without a lead-in sentence.
- Eliminate very short topics
- Very short topics might not provide enough information to for an LLM to generate an adequate answer to a question. For example, a very short parent topic can serve to organize child topics in the table of contents and contain very little valuable content. Very short topics can result in inadequate answers from AI and disappoint your users who land on them. You can either remove very short topics or add valuable content to them.
Creating guidelines for repairing your content for AI
The best way determine how to fix content that results in inadequate answers from AI implementing a feedback mechanism. Your human users can indicate when an answer is bad. Store the feedback, the question, the answer, and the topics that were retrieved.
To create guidelines:
- Collect negative feedback on AI answers from your users.
- Determine the cause of the inadequate answer.
- If appropriate, update the target topic until it produces better answers from AI. Test your changes with several variations of the original question. See Adaptation tequniques.
- When you see trends across topics that you update, create a guideline. See Example guidelines.
Adaptation techniques for repairing your content for AI
To repair your content, you can change it or add to it.
The following table describes some techniques for repairing content that results in inadequate answers from AI.
Problem | Solution |
---|---|
The content doesn't exist. | If appropriate, add content to document the subject. Don't add content that doesn't belong in your knowledge base. |
The content exists but the LLM didn't find it. | Try updating the topic titles, section headings, and terminology in the topic. |
The LLM didn't understand the content. | Update the information in the topic to clarify the content. Update the formatting of the information. |
The LLM provides a partial answer. | Try reformatting the information or providing a summary of long content. |
When a user provides negative feedback on an AI answer, you can’t always address it by altering your content. You might find other causes for inadequate answers that are not easy or possible to solve. The questions that users ask might not be clear, complete, or well-formed enough for the LLM to understand. For example, user questions can have these types of problems:
- Misspelled words
- Vague questions without enough information
- Incorrect grammar
- Incorrect terminology
- Subjects that are irrelevant to your knowledge base
Example guidelines for repairing content for AI
The following example guidelines for repairing content for AI might be applicable to your content:
- Clarify confusing content
- You can clarify content that is vague, has too much detail, or is lacking context.
- Add content for missing information
- You can add content to close a gap in your docs or to mention alternatives to missing functionality. For example, suppose customers often ask a plant nursery about buying seeds for vegetables, which the nursery does not sell. The nursery chatbot answers questions about seeds with "I don't know" or "No". The nursery staff can add a sentence like this one to its knowledge base: "We don't sell seeds, but we do have a large selection of vegetable seedlings." The LLM can then provide a useful answer.
- Add or change terminology
- Your users might use different terms than you use in your docs. If you see a trend, you can mention the alternative term so that the LLM can find it. For example, say something like “an incorrect response from an LLM is sometimes known as a hallucination”.
Parent topic: Retrieval-augmented generation