← Blog/RAG Is Evolving into GraphRAG
RAG

RAG Is Evolving into GraphRAG

Apr 18, 2026·13 min read

A legal-tech company has thousands of contracts, policies, and case notes. Classic vector RAG retrieves similar text chunks, but answers still miss cross-document relationships such as parties, obligations, jurisdiction links, and timeli...

RAG

RAG Is Evolving into GraphRAG

Scenario

A legal-tech company has thousands of contracts, policies, and case notes. Classic vector RAG retrieves similar text chunks, but answers still miss cross-document relationships such as parties, obligations, jurisdiction links, and timeline dependencies.

Why Classic RAG Breaks at Scale

Classic RAG is strong for local semantic similarity. It is weaker when the question depends on explicit relationships.

Typical failure patterns:

  • entity ambiguity ("Acme Holdings" vs "Acme Holdings LLC")
  • long-range reasoning across many documents
  • missing temporal/causal chains
  • weak explainability for why a given answer was produced

For legal and compliance contexts, these gaps directly affect trust and auditability.

GraphRAG Concept

GraphRAG combines:

  • vector retrieval for semantic recall
  • graph retrieval for relationship-aware context

The graph layer stores entities and edges (for example, COMPANY -> HAS_OBLIGATION -> CLAUSE). Retrieval becomes a two-stage process: semantic candidate generation + graph neighborhood expansion.

AWS Architecture Options

Option A: Vector-only RAG (lowest complexity)

  • S3 + chunking + vector index
  • Lower effort, lower relationship fidelity

Option B: GraphRAG with Neptune + vector store (recommended for this scenario)

  • S3 for raw docs
  • extraction pipeline (Lambda/Step Functions)
  • graph in Neptune
  • vector index in OpenSearch Serverless vector collection
  • answer orchestration in FastAPI

Option C: Hybrid with Aurora pgvector + Neptune

  • useful when SQL joins are already central
  • slightly more operational tuning
graph TD DOCS[Legal Documents in S3] --> ETL[Extraction Pipeline] ETL --> ENT[Entity + Relation Extractor] ENT --> G[(Amazon Neptune Graph)] ETL --> CHUNK[Text Chunker + Embeddings] CHUNK --> V[(OpenSearch Serverless Vector Collection)] API[FastAPI Retrieval API] --> V API --> G API --> LLM[LLM Inference Layer] API --> CW[CloudWatch Logs/Metrics]

Storage Design Choices

Graph store: Neptune

  • best for traversals, paths, and relationship-heavy retrieval
  • supports Gremlin/openCypher/SPARQL query styles

Vector store: OpenSearch Serverless vector collection

  • simple managed option for high-scale semantic lookup

Raw content: S3

  • source of truth for documents and extracted artifacts

Step-by-Step Tutorial

1) Create S3 bucket and prefixes

export AWS_REGION=us-east-1
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export PROJECT=legal-graphrag
export DOC_BUCKET=${PROJECT}-${ACCOUNT_ID}-${AWS_REGION}

aws s3api create-bucket --bucket "$DOC_BUCKET" --region "$AWS_REGION"
aws s3api put-object --bucket "$DOC_BUCKET" --key raw/
aws s3api put-object --bucket "$DOC_BUCKET" --key extracted/
aws s3api put-object --bucket "$DOC_BUCKET" --key chunks/
$env:AWS_REGION = "us-east-1"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:PROJECT = "legal-graphrag"
$env:DOC_BUCKET = "$($env:PROJECT)-$($env:ACCOUNT_ID)-$($env:AWS_REGION)"

aws s3api create-bucket --bucket $env:DOC_BUCKET --region $env:AWS_REGION
aws s3api put-object --bucket $env:DOC_BUCKET --key raw/
aws s3api put-object --bucket $env:DOC_BUCKET --key extracted/
aws s3api put-object --bucket $env:DOC_BUCKET --key chunks/

2) Create Neptune cluster (graph layer)

aws neptune create-db-subnet-group \
  --db-subnet-group-name legal-graphrag-subnets \
  --db-subnet-group-description "Subnet group for GraphRAG" \
  --subnet-ids subnet-aaaa1111 subnet-bbbb2222

aws neptune create-db-cluster \
  --db-cluster-identifier legal-graphrag-cluster \
  --engine neptune \
  --db-subnet-group-name legal-graphrag-subnets \
  --vpc-security-group-ids sg-0123456789abcdef0 \
  --backup-retention-period 7

aws neptune create-db-instance \
  --db-instance-identifier legal-graphrag-instance-1 \
  --db-instance-class db.r6g.large \
  --engine neptune \
  --db-cluster-identifier legal-graphrag-cluster

3) Create OpenSearch Serverless vector collection

aws opensearchserverless create-collection \
  --name legal-graphrag-vectors \
  --type VECTORSEARCH \
  --description "Vector collection for legal GraphRAG"

Then configure encryption, network, and data access policies for least privilege before indexing documents.

4) Entity and relationship extraction script

extract_entities.py

import json
import re
from dataclasses import dataclass

@dataclass
class Triple:
    source: str
    relation: str
    target: str


def naive_extract(text: str) -> list[Triple]:
    triples: list[Triple] = []
    orgs = re.findall(r"\b[A-Z][A-Za-z0-9& ]+(?:LLC|Inc|Ltd|Corp|Bank)\b", text)
    for i in range(len(orgs) - 1):
        triples.append(Triple(orgs[i].strip(), "RELATED_TO", orgs[i + 1].strip()))
    return triples


def main(in_path: str, out_path: str) -> None:
    with open(in_path, "r", encoding="utf-8") as f:
        docs = json.load(f)

    output = []
    for doc in docs:
        triples = naive_extract(doc["text"])
        output.append({
            "doc_id": doc["doc_id"],
            "triples": [t.__dict__ for t in triples]
        })

    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(output, f, indent=2)


if __name__ == "__main__":
    main("sample_docs.json", "graph_triples.json")

In production, replace naive extraction with an LLM-assisted extractor plus deterministic validation rules.

5) Load graph triples into Neptune (example)

load_to_neptune.py

import json
from gremlin_python.driver import client

NEPTUNE_ENDPOINT = "wss://your-neptune-endpoint:8182/gremlin"

g = client.Client(NEPTUNE_ENDPOINT, "g")

with open("graph_triples.json", "r", encoding="utf-8") as f:
    items = json.load(f)

for item in items:
    for t in item["triples"]:
        src = t["source"].replace("'", "")
        rel = t["relation"]
        tgt = t["target"].replace("'", "")

        q = f"""
        g.V().has('Entity','name','{src}').fold().coalesce(unfold(), addV('Entity').property('name','{src}')).as('a')
         .V().has('Entity','name','{tgt}').fold().coalesce(unfold(), addV('Entity').property('name','{tgt}')).as('b')
         .coalesce(__.select('a').outE('{rel}').where(inV().as('b')), __.addE('{rel}').from('a').to('b'))
        """
        g.submit(q).all().result()

print("Loaded triples into Neptune")

6) Build retrieval orchestrator (FastAPI)

retrieval_api.py

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="Legal GraphRAG API")

class Ask(BaseModel):
    question: str


def vector_search(question: str) -> list[str]:
    # Replace with OpenSearch vector query.
    return ["chunk_12", "chunk_87", "chunk_203"]


def graph_expand(seed_entities: list[str]) -> list[str]:
    # Replace with Neptune traversal query.
    return ["obligation_clause_45", "governing_law_section_9"]


@app.post("/ask")
def ask(req: Ask):
    chunk_ids = vector_search(req.question)
    seed_entities = ["Acme Holdings LLC"]
    related_nodes = graph_expand(seed_entities)

    context = {
        "vector_chunks": chunk_ids,
        "graph_context": related_nodes,
    }
    # Pass combined context to your model layer.
    return {"context_used": context, "answer": "Generated answer placeholder"}

7) Retrieval flow in production

  1. Normalize user question and run policy checks.
  2. Run vector search for semantic candidates.
  3. Extract seed entities from question and top chunks.
  4. Expand graph neighborhood with bounded depth.
  5. Merge, deduplicate, and rerank context.
  6. Generate answer with citations to chunks and graph facts.
  7. Log retrieval path for auditability.

Security and Governance

  • Use IAM roles for all services (no static keys).
  • Keep private graph/vector resources in VPC/private network policies.
  • Encrypt S3, Neptune snapshots, and vector data at rest.
  • Redact PII in extracted triples before indexing.
  • Maintain provenance fields (doc_id, section, version) in every node/chunk.

Monitoring and Operations

  • CloudWatch metrics:
  • retrieval latency
  • vector recall hit rate
  • graph expansion breadth
  • answer citation coverage
  • Alarm on graph load failures and retrieval timeout spikes.
  • Track index freshness lag from source documents to retrievable context.

Cost Optimization

  • Keep graph expansion depth small (for example depth 1-2 by default).
  • Cache frequent query subgraphs.
  • Use selective extraction for high-value documents first.
  • Batch ingestion pipelines with Step Functions and queue workers.
  • Prefer serverless/vector autoscaling where traffic is bursty.

Pricing reminder: verify live pricing before estimating TCO.

  • Neptune: https://aws.amazon.com/neptune/pricing/
  • OpenSearch Service: https://aws.amazon.com/opensearch-service/pricing/
  • S3: https://aws.amazon.com/s3/pricing/

When GraphRAG Is Worth the Extra Complexity

Use GraphRAG when at least two are true:

  • questions require explicit relationship reasoning
  • users demand explainable provenance chains
  • entity ambiguity is frequent
  • multi-hop retrieval materially improves decision quality

If questions are mostly local FAQ-style lookups, start with vector RAG and add graph components incrementally.

Production-readiness checklist

  • Data model for entities/relations approved by domain experts
  • Ingestion pipeline idempotent and replay-safe
  • Retrieval path logged with citations and graph hops
  • PII and legal sensitivity controls enforced
  • Quality eval suite includes relation-heavy queries
  • Cost dashboards and budgets configured
  • Incident runbooks for index lag and graph corruption tested

Final takeaway

GraphRAG is not a universal replacement for classic RAG. It is a targeted upgrade for relationship-heavy domains like legal-tech, where structure and traceability are part of the product value.