First, not all RAGs are of the same caliber. The accuracy of the content in the custom database is crucial to solid output, but it’s not the only variable. “It’s not just the quality of the content,” says global head Joel Hron. AI at Thomson Reuters“It’s the quality of the search and the retrieval of the right content based on the query.” It’s important to master each step in the process because a single wrong step can completely ruin the model.
“Any lawyer who has ever tried to use natural language search in a research engine will notice that there are often instances where semantic similarity leads you to completely irrelevant materials,” says Daniel Ho, a Stanford professor and senior fellow. Human-Centered AI Institute. Ho’s research AI Legal Tools Companies that relied on RAG had higher rates of errors in output than companies that built models.
Which brings us to the most intriguing question in the discussion: how do you define hallucinations within a RAG implementation? Does it only occur when the chatbot generates output without any citations and fabricates information? Does it also occur when the tool might ignore relevant data or misinterpret aspects of the citation?
According to Lewis, hallucinations in RAG systems depend on whether the output is consistent with the results found by the model during data retrieval. However, Stanford research on AI tools for lawyers makes this definition a bit broader, by checking whether the output is based on the data provided as well as whether it is factually correct – Higher standards for legal professionals Who often analyze complex cases and understand the intricate hierarchy of precedent.
While a RAG system involved with legal issues is clearly better at answering questions on case law than OpenAI’s ChatGPT or Google’s Gemini, it can still overlook fine details and make random mistakes. All the AI experts I spoke to emphasized the constant need for thoughtful, human interaction throughout the process to double-check citations and verify the overall accuracy of results.
Law is one area where there’s a lot of activity around RAG-based AI tools, but the process likely isn’t limited to a single white-collar job. “Take any profession or any business. You have to get answers that are based on actual documents,” Arredondo says. “So, I think RAG is going to be a core tool that will be used in basically every professional application, at least in the near to mid-term.” Risk-averse executives are excited by the prospect of using AI tools to better understand their proprietary data, without having to upload sensitive information to a standard, public chatbot.
However, it is important for users to understand the limitations of these tools, and for AI-focused companies to avoid exaggerating about the accuracy of their answers. Anyone using an AI tool should still avoid relying solely on the output, and they should view its answers with a healthy sense of skepticism, even if the answer has been improved through RAG.
“Confusion is forever,” Ho says. “We don’t yet have a ready remedy to completely eliminate confusion.” While RAG reduces the prevalence of errors, human judgment still remains paramount. And that’s no lie.