Startup Contextual AI Enhances Retrieval-Augmented Generation for Businesses

[ad_1]

Before OpenAI transformed the tech landscape with its introduction of ChatGPT in late 2022, Douwe Kiela had already recognized that large language models (LLMs) alone could only provide incomplete solutions for essential enterprise challenges.

The young Dutch CEO of Contextual AI was significantly influenced by two groundbreaking research papers from Google and OpenAI, which laid out a framework for building fast, efficient transformer-based generative AI models and LLMs.

After the publication of these papers in 2017 and 2018, Kiela, along with his team of AI researchers at Facebook, where he was employed at the time, came to understand that LLMs would encounter significant data freshness challenges.

They recognized that the training of foundation models like LLMs, conducted on vast datasets, not only equipped the models with a metaphorical “brain” for “reasoning” across the data but that the training data itself constituted the total knowledge base from which the models could draw to generate responses to user inquiries.

Kiela’s team concluded that unless an LLM could efficiently and cost-effectively access pertinent real-time data, even the most advanced model would be limited in usefulness for various enterprise applications.

foundational paper of their own, which introduced retrieval-augmented generation (RAG) to the world. Known as RAG, this approach enables continual and cost-effective updates of foundation models with current, relevant information, sourced from user files and the internet. With RAG, an LLM’s knowledge isn’t restricted to its training data, enhancing models’ accuracy, relevance, and impact for enterprise users.

Today, Kiela and Amanpreet Singh, his former colleague at Facebook, serve as CEO and CTO of Contextual AI, a startup based in Silicon Valley that has recently secured an $80 million Series A funding round, which includes an investment from NVIDIA’s NVentures. Contextual AI is also part of NVIDIA Inception, a program aimed at supporting emerging startups. With a workforce of about 50, the company intends to double its size by the year’s end.

Contextual AI’s platform, referred to as RAG 2.0, represents a refined, product-focused evolution of the RAG methodology initially outlined by Kiela and Singh in 2020.

According to Kiela, RAG 2.0 can achieve about ten times better parameter accuracy and performance compared to competing solutions.

This means, for instance, that a model with 70 billion parameters, which would ordinarily demand substantial computing resources, could now operate on significantly smaller infrastructure designed for only 7 billion parameters without compromising accuracy. Such optimization facilitates edge applications on smaller systems that can deliver unexpectedly high performance.

“When ChatGPT emerged, we observed widespread frustration—everyone recognized the promise of LLMs but also understood that the technology was not yet fully developed,” Kiela explained. “We realized that RAG could resolve many of these issues, and we knew we could surpass what we described in our initial RAG paper in 2020.”

Enhanced Performance Through Integrated Retrievers and Language Models

The cornerstone of Contextual AI’s offerings lies in the seamless integration of its retrieval architecture, represented by the “R” in RAG, with the LLM’s framework, which constitutes the generator or “G.” The RAG process involves a retriever interpreting a user’s query, searching multiple sources for relevant documents or data, and returning this information to an LLM, which then uses it to formulate a response.

Since around 2020, RAG has emerged as the primary strategy for businesses deploying LLM-driven chatbots, resulting in the establishment of a flourishing ecosystem of RAG-focused startups.

Contextual AI distinguishes itself from competitors by refining and enhancing its retrievers through back propagation—a technique for adjusting the algorithms that govern its neural network architecture.

Rather than training and modifying two separate neural networks—the retriever and the LLM—Contextual AI provides a unified cutting-edge platform that aligns both components and tunes them through back propagation.

Coordinating and adjusting weights and biases across distinct neural networks can be complex, but Kiela asserts that this approach yields significant improvements in precision, response quality, and optimization. Moreover, because the retriever and generator are tightly integrated, the responses generated are based on shared data, drastically reducing the likelihood of “hallucinated” data—a phenomenon where a model fabricates information when it lacks a definite answer.

“Our methodology presents formidable technical challenges, but it fosters a much stronger connection between the retriever and the generator, enhancing the accuracy and efficiency of our system,” Kiela pointed out.

Addressing Complex Use Cases with Cutting-Edge Innovations

RAG 2.0 is designed to be agnostic toward LLMs, functioning with various open-source language models such as Mistral or Llama and accommodating clients’ model preferences. The startup’s retrievers were developed utilizing NVIDIA’s Megatron LM on a blend of NVIDIA H100 and A100 Tensor Core GPUs that are hosted in Google Cloud.

One major challenge facing any RAG solution is determining how to locate the most relevant data to fulfill a user’s query when such data is stored in various formats, including text, video, or PDF.

Contextual AI tackles this challenge using a “mixture of retrievers” approach, aligning different retriever specialties with the various formats in which data exists.

The company employs a combination of RAG types and a neural reranking algorithm to efficiently identify information stored across these diverse formats, ensuring optimal responsiveness to user inquiries.

For instance, if relevant information related to a query is saved in a video format, one of the RAGs deployed may be a Graph RAG, adept at understanding temporal relationships in unstructured data like video. Conversely, if other data is stored as text or in a PDF format, a vector-based RAG would also be utilized.

The neural reranker subsequently organizes the retrieved data, and the prioritized information is then passed to the LLM to formulate a response to the initial query.

“To maximize performance, we almost never use a single retrieval method—it’s typically a hybrid approach since they exhibit distinct and complementary strengths,” Kiela noted. “The optimal mixture depends on the specific use case, the underlying data, and the user’s inquiry.”

By effectively merging the RAG and LLM architectures, alongside providing multiple avenues for obtaining pertinent information, Contextual AI delivers customers a marked enhancement in performance. In addition to greater accuracy, its solution reduces latency by minimizing API calls between the retrievers and LLMs neural networks.

Thanks to its highly optimized framework and reduced computing requirements, RAG 2.0 can be deployed in the cloud, on-premises, or even in fully disconnected environments. This adaptability renders it applicable across a broad range of industries, from fintech and manufacturing to medical devices and robotics.

“The use cases we are concentrating on are the particularly challenging ones,” Kiela remarked. “We aim to address high-value, knowledge-intensive roles that can either significantly reduce costs or greatly enhance productivity for companies.”

[ad_2]