Back

Benchmarking R2R

Author photo
Owen Colegrove
Jul 15, 2024
4
min read
On this page

In the rapidly evolving space of Retrieval-Augmented Generation (RAG), developers face a critical decision: which tool will best serve their project's needs? As more options become available, one factor stands paramount—scalability. This article dives deep into a comparative analysis of R2R against four leading alternatives: LlamaIndex, Haystack, Langchain, and RagFlow, with a focus on performance and throughput.

Our goal is to provide developers with concrete data to inform their choice of RAG tool. By rigorously testing each system's ingestion capabilities, we hope to shed light on their potential to handle real-world, data-intensive applications.

Figure 1: Tokens Ingested vs. Time

The steeper the better, as the rate indicates faster ingestion

We employed a two-pronged approach to stress-test these systems:

  • Bulk Ingestion: We pushed the limits by firehosing approximately ten million tokens of data in batches of 256 files, simulating a high-volume data environment.
  • Individual File Processing: To mimic everyday use cases, we measured the ingestion times for single txt and PDF documents, both individually and in multi-file batches. This provides insight into the user experience when adding files to the system.

The results of our benchmarking reveal significant differences in performance across the tested solutions.

Scalable Ingestion

Our scalable ingestion test aimed to simulate a high-volume data environment, pushing each solution to its limits. This test provides crucial insights into how these tools might perform in data-intensive, real-world scenarios.

We leveraged the HuggingFace Legacy Datasets' Wikipedia corpus for this benchmark. This dataset offers a diverse range of articles, making it an ideal candidate for testing RAG systems. Each framework was tasked with ingesting articles as per their recommended practices, allowing us to measure their performance under optimal conditions.

Note: While R2R supports asynchronous ingestion out of the box, LlamaIndex requires additional configuration for this feature. To ensure a fair comparison, we included results for both the default and asynchronous configurations of LlamaIndex.

The results, summarized in both Table 1 and Figure 1, reveal significant variations in ingestion rates across the tested frameworks. R2R demonstrated the highest throughput, processing over 160,000 tokens per second, while other frameworks showed varying levels of performance.

Table 1: Ingestion Time over 10008026 Tokens

Solution Time Elapsed (s)
R2R 62.97
LlamaIndex (Async) 81.54
LlamaIndex 171.93
Haystack 276.27
LangChain 510.04

Individual File Ingestion

To complement our bulk ingestion test, we also examined how each solution handles individual file processing. This test simulates typical user interactions, such as uploading single documents or small batches of files.

Our test suite included two text files of similar size but different content, and two PDF files of identical size but varying complexity. We measured three key metrics for each file:

  • Ingestion Time
  • Tokens processed per second (for text files)
  • Megabytes processed per second

Additionally, we tested combined ingestion of identical file types to assess each solution's performance with multi-file uploads.

Table 2: Time taken to Ingest Files

Metric Shakespeare Churchill Combined txt University Physics Introductory Statistics Combined PDF
R2R 7.04 9.07 9.58 8.57 14.6 20.7
LlamaIndex 67.0 57.7 189 7.92 18.0 25.3
Haystack 18.8 16.8 37.8 8.17 17.0 24.9
Langchain 65.2 64.7 134 9.31 24.7 35.2
RagFlow 1630 3800

Table 3: Megabytes per Second Throughput

Metric Shakespeare Churchill Combined txt University Physics Introductory Statistics Combined PDF
R2R 0.767 0.628 1.159 3.127 1.833 2.593
LlamaIndex 0.081 0.099 0.056 3.384 1.490 2.121
Haystack 0.288 0.338 1.416 3.280 1.573 2.156
Langchain 0.083 0.088 0.082 2.879 1.086 1.523
RagFlow 0.003 0.001

We encourage developers to conduct their own tests using our publicly available benchmarking code.

Read also
Advanced RAG Techniques in R2R
CASE STUDY
Advanced rag
Hybrid search
Knowledge graphs
Rag

Advanced RAG Techniques in R2R

Covering the advanced RAG options in R2R

Author photo
Owen Colegrove
Oct 9, 2024
GraphRAG in R2R
CASE STUDY
Knowledge graphs
Rag

GraphRAG in R2R

Seamlessly design and deploy GraphRAG using R2R

Author photo
Owen Colegrove
Aug 9, 2024
Triplex — SOTA LLM for Knowledge Graph Construction
CASE STUDY
Knowledge graphs
Rag

Triplex — SOTA LLM for Knowledge Graph Construction

An open-source model for 10x cheaper knowledge graph construction

Author photo
Owen Colegrove
Jul 9, 2024